Add docs for formatters

2020-09-01 15:23:48 -07:00 · 2020-09-01 15:23:48 -07:00 · d3cd5d1254
parent 36f29c3058
commit d3cd5d1254
1 changed files with 312 additions and 247 deletions
--- a/README.md
+++ b/README.md
@ -158,6 +158,71 @@ transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
 transcript = transcript_list.find_generated_transcript(['de', 'en'])
 ```

+### Using Formatters
+Formatters are meant to be an additional layer of processing of the transcript you pass it. The goal is to convert the transcript from its Python data type into a consistent string of a given "format". Such as a basic text (`.txt`) or even formats that have a defined specification such as JSON (`.json`), WebVTT format (`.vtt`), Comma-separated format (`.csv`), etc...
+
+The `formatters` submodule provides a few basic formatters to wrap around you transcript data in cases where you might want to do something such as output a specific format then write that format to a file. Maybe to backup/store and run another script against at a later time.
+
+We provided a few subclasses of formatters to use:
+
+- JSONFormatter
+- TextFormatter
+- WebVTTFormatter (a basic implementation)
+
+Here is how to import from the `formatters` module.
+
+```python
+# the base class to inherit from when creating your own formatter.
+from youtube_transcript_api.formatters import Formatter
+
+# some provided subclasses, each outputs a different string format.
+from youtube_transcript_api.formatters import JSONFormatter
+from youtube_transcript_api.formatters import TextFormatter
+from youtube_transcript_api.formatters import WebVTTFormatter
+```
+
+### Provided Formatter Example
+Lets say we wanted to retrieve a transcript and write that transcript as a JSON file in the same format as the API returned it as. That would look something like this:
+
+```python
+# your_custom_script.py
+
+from youtube_transcript_api import YouTubeTranscriptApi
+from youtube_transcript_api.formatters import JSONFormatter
+
+# Must be a single transcript.
+transcript = YouTubeTranscriptApi.get_transcript(video_id)
+
+# .format() turns the transcript into a JSON string.
+json_formatted = JSONFormatter(transcript).format()
+
+
+# Now we can write it out to a file.
+with open('your_filename.json', 'w', encoding='utf-8') as json_file:
+    json_file.write(json_formatted)
+
+# Now should have a new JSON file that you can easily read back into Python.
+```
+
+**Passing extra keyword arguments**
+
+Since JSONFormatter leverages `json.dumps()` you can also forward keyword arguments into `.format()` such as making your file output prettier by forwarding the `indent=2` keyword argument.
+
+```python
+json_formatted = JSONFormatter(transcript).format(indent=2)
+```
+
+### Custom Formatter Example
+You can implement your own formatter class. Just inherit from the `Formatter` base class and ensure you implement the `def format(self, **kwargs):` method which should ultimately return a string when called on your formatter instance.
+
+```python
+
+class MyCustomFormatter(Formatter):
+    def format(self, **kwargs):
+        # Do your custom work in here, but return a string.
+        return 'your processed output data as a string.'
+```
+
 ## CLI  
  
 Execute the CLI script using the video ids as parameters and the results will be printed out to the command line: