updated README
This commit is contained in:
		
							parent
							
								
									936ef3c1d0
								
							
						
					
					
						commit
						aa34a2ceb3
					
				
							
								
								
									
										133
									
								
								README.md
								
								
								
								
							
							
						
						
									
										133
									
								
								README.md
								
								
								
								
							|  | @ -1,4 +1,5 @@ | |||
| # YouTube Transcript/Subtitle API (including automatically generated subtitles) | ||||
| 
 | ||||
| # YouTube Transcript/Subtitle API (including automatically generated subtitles and subtitle translations)   | ||||
|    | ||||
| [](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)   | ||||
| [](https://travis-ci.org/jdepoix/youtube-transcript-api)   | ||||
|  | @ -7,7 +8,7 @@ | |||
| [](https://pypi.org/project/youtube-transcript-api/)   | ||||
| [](https://pypi.org/project/youtube-transcript-api/)   | ||||
|    | ||||
| This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require a headless browser, like other selenium based solutions do! | ||||
| This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!   | ||||
|    | ||||
| ## Install   | ||||
|    | ||||
|  | @ -23,13 +24,11 @@ If you want to use it from source, you'll have to install the dependencies manua | |||
| pip install -r requirements.txt   | ||||
| ```   | ||||
| 
 | ||||
| ## How to use it | ||||
| You can either integrate this module [into an existing application](#api), or just use it via an [CLI](#cli). | ||||
|    | ||||
| You could either integrate this module into an existing application, or just use it via an CLI | ||||
| ## API | ||||
|    | ||||
| ### In code | ||||
| 
 | ||||
| To get a transcript for a given video you can do: | ||||
| The easiest way to get a transcript for a given video is to execute:   | ||||
|    | ||||
| ```python   | ||||
| from youtube_transcript_api import YouTubeTranscriptApi   | ||||
|  | @ -61,9 +60,9 @@ You can also add the `languages` param if you want to make sure the transcripts | |||
| YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])   | ||||
| ```   | ||||
|    | ||||
| It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. As I can't provide a complete list of all working language codes with full certainty, you may have to play around with the language codes a bit, to find the one which is working for you! | ||||
| It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out which languages are available first, [have a look at `list_transcripts()`](#list-available-transcripts) | ||||
|    | ||||
| To get transcripts for a list fo video ids you can call: | ||||
| To get transcripts for a list of video ids you can call:   | ||||
|    | ||||
| ```python   | ||||
| YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])   | ||||
|  | @ -71,7 +70,100 @@ YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en']) | |||
|    | ||||
| `languages` also is optional here.   | ||||
| 
 | ||||
| ### CLI | ||||
| ### List available transcripts | ||||
| 
 | ||||
| If you want to list all transcripts which are available for a given video you can call | ||||
| 
 | ||||
| ```python | ||||
| transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, languages=['de', 'en']) | ||||
| ``` | ||||
| 
 | ||||
| This will return a `TranscriptList` object  which is iterable and provides methods to filter the list of transcripts for specific languages and types, like: | ||||
| 
 | ||||
| ```python | ||||
| transcript = transcript_list.find_transcript(['de', 'en'])   | ||||
| ``` | ||||
| 
 | ||||
| By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this default behaviour by searching for specific transcript types: | ||||
| 
 | ||||
| ```python | ||||
| # filter for manually created transcripts   | ||||
| transcript = transcript_list.find_manually_created_transcript(['de', 'en'])   | ||||
|    | ||||
| # or automatically generated ones   | ||||
| transcript = transcript_list.find_generated_transcript(['de', 'en']) | ||||
| ``` | ||||
| 
 | ||||
| The methods `find_generated_transcript`, `find_manually_created_transcript`, `find_generated_transcript` return `Transcript` objects. They contain metadata regarding the transcript  | ||||
| 
 | ||||
| ```python | ||||
| print( | ||||
| 	transcript.video_id,  | ||||
| 	transcript.language,  | ||||
| 	transcript.language_code, | ||||
| 	# whether it has been manually created or generated by YouTube  | ||||
| 	transcript.is_generated, | ||||
| 	# whether this transcript can be translated or not | ||||
| 	transcript.is_translatable, | ||||
| 	# a list of languages the transcript can be translated to  | ||||
| 	transcript.translation_languages,  | ||||
| ) | ||||
| ``` | ||||
| 
 | ||||
| and provide the method, which allows you to fetch the actual transcript data: | ||||
| 
 | ||||
| ```python | ||||
| transcript.fetch() | ||||
| ``` | ||||
| 
 | ||||
| ### Translate transcript  | ||||
| 
 | ||||
| YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated `Transcript` object: | ||||
| 
 | ||||
| ```python | ||||
| transcript = transcript_list.find_transcript(['en'])  | ||||
| translated_transcript = transcript.translate('de') | ||||
| print(translated_transcript.fetch()) | ||||
| ``` | ||||
| 
 | ||||
| ### By example | ||||
| ```python | ||||
| # retrieve the available transcripts   | ||||
| transcript_list = YouTubeTranscriptApi.get('video_id')   | ||||
|    | ||||
| # iterate over all available transcripts   | ||||
| for transcript in transcript_list:   | ||||
| 
 | ||||
| 	# the Transcript object provides metadata properties  | ||||
| 	print( | ||||
| 		transcript.video_id,  | ||||
| 		transcript.language,  | ||||
| 		transcript.language_code, | ||||
| 		# whether it has been manually created or generated by YouTube  | ||||
| 		transcript.is_generated, | ||||
| 		# whether this transcript can be translated or not | ||||
| 		transcript.is_translatable, | ||||
| 		# a list of languages the transcript can be translated to  | ||||
| 		transcript.translation_languages,  | ||||
| 	) | ||||
| 	   | ||||
| 	# fetch the actual transcript data  | ||||
| 	print(transcript.fetch())   | ||||
| 	 | ||||
| 	# translating the transcript will return another transcript object | ||||
| 	print(transcript.translate('en').fetch())   | ||||
| 	 | ||||
| # you can also directly filter for the language you are looking for, using the transcript list | ||||
| transcript = transcript_list.find_transcript(['de', 'en'])   | ||||
|    | ||||
| # or just filter for manually created transcripts   | ||||
| transcript = transcript_list.find_manually_created_transcript(['de', 'en'])   | ||||
|    | ||||
| # or automatically generated ones   | ||||
| transcript = transcript_list.find_generated_transcript(['de', 'en']) | ||||
| ``` | ||||
|    | ||||
| ## CLI   | ||||
|    | ||||
| Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:   | ||||
|    | ||||
|  | @ -85,13 +177,32 @@ The CLI also gives you the option to provide a list of preferred languages: | |||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en   | ||||
| ``` | ||||
| 
 | ||||
| You can also specify if you want to exclude automatically generated or manually created subtitles: | ||||
| 
 | ||||
| ```   | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-generated | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-manually-created | ||||
| ``` | ||||
|    | ||||
| If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line:   | ||||
|    | ||||
| ```   | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json   | ||||
| ```   | ||||
| 
 | ||||
| ### Proxy | ||||
| Translating transcripts using the CLI is also possible: | ||||
| 
 | ||||
| ```   | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages en --translate de | ||||
| ```   | ||||
| 
 | ||||
| If you are not sure which languages are available for a given video you can call: | ||||
| 
 | ||||
| ```   | ||||
| youtube_transcript_api --list-transcripts <first_video_id> | ||||
| ```   | ||||
|    | ||||
| ## Proxy   | ||||
|    | ||||
| You can specify a https/http proxy, which will be used during the requests to YouTube:   | ||||
|    | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue