Add docs for formatters
This commit is contained in:
		
							parent
							
								
									36f29c3058
								
							
						
					
					
						commit
						d3cd5d1254
					
				
							
								
								
									
										65
									
								
								README.md
								
								
								
								
							
							
						
						
									
										65
									
								
								README.md
								
								
								
								
							|  | @ -158,6 +158,71 @@ transcript = transcript_list.find_manually_created_transcript(['de', 'en']) | |||
| transcript = transcript_list.find_generated_transcript(['de', 'en']) | ||||
| ``` | ||||
| 
 | ||||
| ### Using Formatters | ||||
| Formatters are meant to be an additional layer of processing of the transcript you pass it. The goal is to convert the transcript from its Python data type into a consistent string of a given "format". Such as a basic text (`.txt`) or even formats that have a defined specification such as JSON (`.json`), WebVTT format (`.vtt`), Comma-separated format (`.csv`), etc... | ||||
| 
 | ||||
| The `formatters` submodule provides a few basic formatters to wrap around you transcript data in cases where you might want to do something such as output a specific format then write that format to a file. Maybe to backup/store and run another script against at a later time. | ||||
| 
 | ||||
| We provided a few subclasses of formatters to use: | ||||
| 
 | ||||
| - JSONFormatter | ||||
| - TextFormatter | ||||
| - WebVTTFormatter (a basic implementation) | ||||
| 
 | ||||
| Here is how to import from the `formatters` module. | ||||
| 
 | ||||
| ```python | ||||
| # the base class to inherit from when creating your own formatter. | ||||
| from youtube_transcript_api.formatters import Formatter | ||||
| 
 | ||||
| # some provided subclasses, each outputs a different string format. | ||||
| from youtube_transcript_api.formatters import JSONFormatter | ||||
| from youtube_transcript_api.formatters import TextFormatter | ||||
| from youtube_transcript_api.formatters import WebVTTFormatter | ||||
| ``` | ||||
| 
 | ||||
| ### Provided Formatter Example | ||||
| Lets say we wanted to retrieve a transcript and write that transcript as a JSON file in the same format as the API returned it as. That would look something like this: | ||||
| 
 | ||||
| ```python | ||||
| # your_custom_script.py | ||||
| 
 | ||||
| from youtube_transcript_api import YouTubeTranscriptApi | ||||
| from youtube_transcript_api.formatters import JSONFormatter | ||||
| 
 | ||||
| # Must be a single transcript. | ||||
| transcript = YouTubeTranscriptApi.get_transcript(video_id) | ||||
| 
 | ||||
| # .format() turns the transcript into a JSON string. | ||||
| json_formatted = JSONFormatter(transcript).format() | ||||
| 
 | ||||
| 
 | ||||
| # Now we can write it out to a file. | ||||
| with open('your_filename.json', 'w', encoding='utf-8') as json_file: | ||||
|     json_file.write(json_formatted) | ||||
| 
 | ||||
| # Now should have a new JSON file that you can easily read back into Python. | ||||
| ``` | ||||
| 
 | ||||
| **Passing extra keyword arguments** | ||||
| 
 | ||||
| Since JSONFormatter leverages `json.dumps()` you can also forward keyword arguments into `.format()` such as making your file output prettier by forwarding the `indent=2` keyword argument. | ||||
| 
 | ||||
| ```python | ||||
| json_formatted = JSONFormatter(transcript).format(indent=2) | ||||
| ``` | ||||
| 
 | ||||
| ### Custom Formatter Example | ||||
| You can implement your own formatter class. Just inherit from the `Formatter` base class and ensure you implement the `def format(self, **kwargs):` method which should ultimately return a string when called on your formatter instance. | ||||
| 
 | ||||
| ```python | ||||
| 
 | ||||
| class MyCustomFormatter(Formatter): | ||||
|     def format(self, **kwargs): | ||||
|         # Do your custom work in here, but return a string. | ||||
|         return 'your processed output data as a string.' | ||||
| ``` | ||||
| 
 | ||||
| ## CLI   | ||||
|    | ||||
| Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:   | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue