Add docs for formatters
This commit is contained in:
		
							parent
							
								
									36f29c3058
								
							
						
					
					
						commit
						d3cd5d1254
					
				
							
								
								
									
										65
									
								
								README.md
								
								
								
								
							
							
						
						
									
										65
									
								
								README.md
								
								
								
								
							|  | @ -158,6 +158,71 @@ transcript = transcript_list.find_manually_created_transcript(['de', 'en']) | ||||||
| transcript = transcript_list.find_generated_transcript(['de', 'en']) | transcript = transcript_list.find_generated_transcript(['de', 'en']) | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
|  | ### Using Formatters | ||||||
|  | Formatters are meant to be an additional layer of processing of the transcript you pass it. The goal is to convert the transcript from its Python data type into a consistent string of a given "format". Such as a basic text (`.txt`) or even formats that have a defined specification such as JSON (`.json`), WebVTT format (`.vtt`), Comma-separated format (`.csv`), etc... | ||||||
|  | 
 | ||||||
|  | The `formatters` submodule provides a few basic formatters to wrap around you transcript data in cases where you might want to do something such as output a specific format then write that format to a file. Maybe to backup/store and run another script against at a later time. | ||||||
|  | 
 | ||||||
|  | We provided a few subclasses of formatters to use: | ||||||
|  | 
 | ||||||
|  | - JSONFormatter | ||||||
|  | - TextFormatter | ||||||
|  | - WebVTTFormatter (a basic implementation) | ||||||
|  | 
 | ||||||
|  | Here is how to import from the `formatters` module. | ||||||
|  | 
 | ||||||
|  | ```python | ||||||
|  | # the base class to inherit from when creating your own formatter. | ||||||
|  | from youtube_transcript_api.formatters import Formatter | ||||||
|  | 
 | ||||||
|  | # some provided subclasses, each outputs a different string format. | ||||||
|  | from youtube_transcript_api.formatters import JSONFormatter | ||||||
|  | from youtube_transcript_api.formatters import TextFormatter | ||||||
|  | from youtube_transcript_api.formatters import WebVTTFormatter | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ### Provided Formatter Example | ||||||
|  | Lets say we wanted to retrieve a transcript and write that transcript as a JSON file in the same format as the API returned it as. That would look something like this: | ||||||
|  | 
 | ||||||
|  | ```python | ||||||
|  | # your_custom_script.py | ||||||
|  | 
 | ||||||
|  | from youtube_transcript_api import YouTubeTranscriptApi | ||||||
|  | from youtube_transcript_api.formatters import JSONFormatter | ||||||
|  | 
 | ||||||
|  | # Must be a single transcript. | ||||||
|  | transcript = YouTubeTranscriptApi.get_transcript(video_id) | ||||||
|  | 
 | ||||||
|  | # .format() turns the transcript into a JSON string. | ||||||
|  | json_formatted = JSONFormatter(transcript).format() | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # Now we can write it out to a file. | ||||||
|  | with open('your_filename.json', 'w', encoding='utf-8') as json_file: | ||||||
|  |     json_file.write(json_formatted) | ||||||
|  | 
 | ||||||
|  | # Now should have a new JSON file that you can easily read back into Python. | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | **Passing extra keyword arguments** | ||||||
|  | 
 | ||||||
|  | Since JSONFormatter leverages `json.dumps()` you can also forward keyword arguments into `.format()` such as making your file output prettier by forwarding the `indent=2` keyword argument. | ||||||
|  | 
 | ||||||
|  | ```python | ||||||
|  | json_formatted = JSONFormatter(transcript).format(indent=2) | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ### Custom Formatter Example | ||||||
|  | You can implement your own formatter class. Just inherit from the `Formatter` base class and ensure you implement the `def format(self, **kwargs):` method which should ultimately return a string when called on your formatter instance. | ||||||
|  | 
 | ||||||
|  | ```python | ||||||
|  | 
 | ||||||
|  | class MyCustomFormatter(Formatter): | ||||||
|  |     def format(self, **kwargs): | ||||||
|  |         # Do your custom work in here, but return a string. | ||||||
|  |         return 'your processed output data as a string.' | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
| ## CLI   | ## CLI   | ||||||
|    |    | ||||||
| Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:   | Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:   | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue