Add docs for formatters

This commit is contained in:
Chris Howell 2020-09-01 15:23:48 -07:00
parent 36f29c3058
commit d3cd5d1254
1 changed files with 312 additions and 247 deletions

View File

@ -158,6 +158,71 @@ transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
transcript = transcript_list.find_generated_transcript(['de', 'en'])
```
### Using Formatters
Formatters are meant to be an additional layer of processing of the transcript you pass it. The goal is to convert the transcript from its Python data type into a consistent string of a given "format". Such as a basic text (`.txt`) or even formats that have a defined specification such as JSON (`.json`), WebVTT format (`.vtt`), Comma-separated format (`.csv`), etc...
The `formatters` submodule provides a few basic formatters to wrap around you transcript data in cases where you might want to do something such as output a specific format then write that format to a file. Maybe to backup/store and run another script against at a later time.
We provided a few subclasses of formatters to use:
- JSONFormatter
- TextFormatter
- WebVTTFormatter (a basic implementation)
Here is how to import from the `formatters` module.
```python
# the base class to inherit from when creating your own formatter.
from youtube_transcript_api.formatters import Formatter
# some provided subclasses, each outputs a different string format.
from youtube_transcript_api.formatters import JSONFormatter
from youtube_transcript_api.formatters import TextFormatter
from youtube_transcript_api.formatters import WebVTTFormatter
```
### Provided Formatter Example
Lets say we wanted to retrieve a transcript and write that transcript as a JSON file in the same format as the API returned it as. That would look something like this:
```python
# your_custom_script.py
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import JSONFormatter
# Must be a single transcript.
transcript = YouTubeTranscriptApi.get_transcript(video_id)
# .format() turns the transcript into a JSON string.
json_formatted = JSONFormatter(transcript).format()
# Now we can write it out to a file.
with open('your_filename.json', 'w', encoding='utf-8') as json_file:
json_file.write(json_formatted)
# Now should have a new JSON file that you can easily read back into Python.
```
**Passing extra keyword arguments**
Since JSONFormatter leverages `json.dumps()` you can also forward keyword arguments into `.format()` such as making your file output prettier by forwarding the `indent=2` keyword argument.
```python
json_formatted = JSONFormatter(transcript).format(indent=2)
```
### Custom Formatter Example
You can implement your own formatter class. Just inherit from the `Formatter` base class and ensure you implement the `def format(self, **kwargs):` method which should ultimately return a string when called on your formatter instance.
```python
class MyCustomFormatter(Formatter):
def format(self, **kwargs):
# Do your custom work in here, but return a string.
return 'your processed output data as a string.'
```
## CLI
Execute the CLI script using the video ids as parameters and the results will be printed out to the command line: