updated README
This commit is contained in:
		
							parent
							
								
									936ef3c1d0
								
							
						
					
					
						commit
						aa34a2ceb3
					
				
							
								
								
									
										287
									
								
								README.md
								
								
								
								
							
							
						
						
									
										287
									
								
								README.md
								
								
								
								
							|  | @ -1,121 +1,232 @@ | ||||||
| # YouTube Transcript/Subtitle API (including automatically generated subtitles) |  | ||||||
| 
 | 
 | ||||||
| [](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url) | # YouTube Transcript/Subtitle API (including automatically generated subtitles and subtitle translations)   | ||||||
| [](https://travis-ci.org/jdepoix/youtube-transcript-api) |    | ||||||
| [](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master) | [](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)   | ||||||
| [](http://opensource.org/licenses/MIT) | [](https://travis-ci.org/jdepoix/youtube-transcript-api)   | ||||||
| [](https://pypi.org/project/youtube-transcript-api/) | [](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master)   | ||||||
| [](https://pypi.org/project/youtube-transcript-api/) | [](http://opensource.org/licenses/MIT)   | ||||||
|  | [](https://pypi.org/project/youtube-transcript-api/)   | ||||||
|  | [](https://pypi.org/project/youtube-transcript-api/)   | ||||||
|  |    | ||||||
|  | This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!   | ||||||
|  |    | ||||||
|  | ## Install   | ||||||
|  |    | ||||||
|  | It is recommended to [install this module by using pip](https://pypi.org/project/youtube-transcript-api/):   | ||||||
|  |    | ||||||
|  | ```   | ||||||
|  | pip install youtube_transcript_api   | ||||||
|  | ```   | ||||||
|  |    | ||||||
|  | If you want to use it from source, you'll have to install the dependencies manually:   | ||||||
|  |    | ||||||
|  | ```   | ||||||
|  | pip install -r requirements.txt   | ||||||
|  | ```   | ||||||
| 
 | 
 | ||||||
| This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require a headless browser, like other selenium based solutions do! | You can either integrate this module [into an existing application](#api), or just use it via an [CLI](#cli). | ||||||
|  |    | ||||||
|  | ## API | ||||||
|  |    | ||||||
|  | The easiest way to get a transcript for a given video is to execute:   | ||||||
|  |    | ||||||
|  | ```python   | ||||||
|  | from youtube_transcript_api import YouTubeTranscriptApi   | ||||||
|  |    | ||||||
|  | YouTubeTranscriptApi.get_transcript(video_id)   | ||||||
|  | ```   | ||||||
|  |    | ||||||
|  | This will return a list of dictionaries looking somewhat like this:   | ||||||
|  |    | ||||||
|  | ```python   | ||||||
|  | [   | ||||||
|  |     {   | ||||||
|  |         'text': 'Hey there',   | ||||||
|  |         'start': 7.58,   | ||||||
|  |         'duration': 6.13   | ||||||
|  |   },   | ||||||
|  |     {   | ||||||
|  |         'text': 'how are you',   | ||||||
|  |         'start': 14.08,   | ||||||
|  |         'duration': 7.58   | ||||||
|  |   },   | ||||||
|  |     # ...   | ||||||
|  | ]   | ||||||
|  | ```   | ||||||
|  |    | ||||||
|  | You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it defaults to english).   | ||||||
|  |    | ||||||
|  | ```python   | ||||||
|  | YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])   | ||||||
|  | ```   | ||||||
|  |    | ||||||
|  | It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out which languages are available first, [have a look at `list_transcripts()`](#list-available-transcripts) | ||||||
|  |    | ||||||
|  | To get transcripts for a list of video ids you can call:   | ||||||
|  |    | ||||||
|  | ```python   | ||||||
|  | YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])   | ||||||
|  | ```   | ||||||
|  |    | ||||||
|  | `languages` also is optional here.   | ||||||
| 
 | 
 | ||||||
| ## Install | ### List available transcripts | ||||||
| 
 | 
 | ||||||
| It is recommended to [install this module by using pip](https://pypi.org/project/youtube-transcript-api/): | If you want to list all transcripts which are available for a given video you can call | ||||||
| 
 |  | ||||||
| ``` |  | ||||||
| pip install youtube_transcript_api |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| If you want to use it from source, you'll have to install the dependencies manually: |  | ||||||
| 
 |  | ||||||
| ``` |  | ||||||
| pip install -r requirements.txt |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| ## How to use it |  | ||||||
| 
 |  | ||||||
| You could either integrate this module into an existing application, or just use it via an CLI |  | ||||||
| 
 |  | ||||||
| ### In code |  | ||||||
| 
 |  | ||||||
| To get a transcript for a given video you can do: |  | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
| from youtube_transcript_api import YouTubeTranscriptApi | transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, languages=['de', 'en']) | ||||||
| 
 |  | ||||||
| YouTubeTranscriptApi.get_transcript(video_id) |  | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| This will return a list of dictionaries looking somewhat like this: | This will return a `TranscriptList` object  which is iterable and provides methods to filter the list of transcripts for specific languages and types, like: | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
| [ | transcript = transcript_list.find_transcript(['de', 'en'])   | ||||||
|     { |  | ||||||
|         'text': 'Hey there', |  | ||||||
|         'start': 7.58, |  | ||||||
|         'duration': 6.13 |  | ||||||
|     }, |  | ||||||
|     { |  | ||||||
|         'text': 'how are you', |  | ||||||
|         'start': 14.08, |  | ||||||
|         'duration': 7.58 |  | ||||||
|     }, |  | ||||||
|     # ... |  | ||||||
| ] |  | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it defaults to english). | By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this default behaviour by searching for specific transcript types: | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
| YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en']) | # filter for manually created transcripts   | ||||||
|  | transcript = transcript_list.find_manually_created_transcript(['de', 'en'])   | ||||||
|  |    | ||||||
|  | # or automatically generated ones   | ||||||
|  | transcript = transcript_list.find_generated_transcript(['de', 'en']) | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. As I can't provide a complete list of all working language codes with full certainty, you may have to play around with the language codes a bit, to find the one which is working for you! | The methods `find_generated_transcript`, `find_manually_created_transcript`, `find_generated_transcript` return `Transcript` objects. They contain metadata regarding the transcript  | ||||||
| 
 |  | ||||||
| To get transcripts for a list fo video ids you can call: |  | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
| YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en']) | print( | ||||||
|  | 	transcript.video_id,  | ||||||
|  | 	transcript.language,  | ||||||
|  | 	transcript.language_code, | ||||||
|  | 	# whether it has been manually created or generated by YouTube  | ||||||
|  | 	transcript.is_generated, | ||||||
|  | 	# whether this transcript can be translated or not | ||||||
|  | 	transcript.is_translatable, | ||||||
|  | 	# a list of languages the transcript can be translated to  | ||||||
|  | 	transcript.translation_languages,  | ||||||
|  | ) | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| `languages` also is optional here. | and provide the method, which allows you to fetch the actual transcript data: | ||||||
| 
 |  | ||||||
| ### CLI |  | ||||||
| 
 |  | ||||||
| Execute the CLI script using the video ids as parameters and the results will be printed out to the command line: |  | ||||||
| 
 |  | ||||||
| ``` |  | ||||||
| youtube_transcript_api <first_video_id> <second_video_id> ... |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| The CLI also gives you the option to provide a list of preferred languages: |  | ||||||
| 
 |  | ||||||
| ``` |  | ||||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line: |  | ||||||
| 
 |  | ||||||
| ``` |  | ||||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| ### Proxy |  | ||||||
| 
 |  | ||||||
| You can specify a https/http proxy, which will be used during the requests to YouTube: |  | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
| from youtube_transcript_api import YouTubeTranscriptApi | transcript.fetch() | ||||||
| 
 |  | ||||||
| YouTubeTranscriptApi.get_transcript(video_id, proxies={"http": "http://user:pass@domain:port", "https": "https://user:pass@domain:port"}) |  | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| As the `proxies` dict is passed on to the `requests.get(...)` call, it follows the [format used by the requests library](http://docs.python-requests.org/en/master/user/advanced/#proxies). | ### Translate transcript  | ||||||
| 
 | 
 | ||||||
| Using the CLI: | YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated `Transcript` object: | ||||||
| 
 | 
 | ||||||
| ``` | ```python | ||||||
| youtube_transcript_api <first_video_id> <second_video_id> --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port | transcript = transcript_list.find_transcript(['en'])  | ||||||
|  | translated_transcript = transcript.translate('de') | ||||||
|  | print(translated_transcript.fetch()) | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
|  | ### By example | ||||||
|  | ```python | ||||||
|  | # retrieve the available transcripts   | ||||||
|  | transcript_list = YouTubeTranscriptApi.get('video_id')   | ||||||
|  |    | ||||||
|  | # iterate over all available transcripts   | ||||||
|  | for transcript in transcript_list:   | ||||||
| 
 | 
 | ||||||
| ## Warning | 	# the Transcript object provides metadata properties  | ||||||
|  | 	print( | ||||||
|  | 		transcript.video_id,  | ||||||
|  | 		transcript.language,  | ||||||
|  | 		transcript.language_code, | ||||||
|  | 		# whether it has been manually created or generated by YouTube  | ||||||
|  | 		transcript.is_generated, | ||||||
|  | 		# whether this transcript can be translated or not | ||||||
|  | 		transcript.is_translatable, | ||||||
|  | 		# a list of languages the transcript can be translated to  | ||||||
|  | 		transcript.translation_languages,  | ||||||
|  | 	) | ||||||
|  | 	   | ||||||
|  | 	# fetch the actual transcript data  | ||||||
|  | 	print(transcript.fetch())   | ||||||
|  | 	 | ||||||
|  | 	# translating the transcript will return another transcript object | ||||||
|  | 	print(transcript.translate('en').fetch())   | ||||||
|  | 	 | ||||||
|  | # you can also directly filter for the language you are looking for, using the transcript list | ||||||
|  | transcript = transcript_list.find_transcript(['de', 'en'])   | ||||||
|  |    | ||||||
|  | # or just filter for manually created transcripts   | ||||||
|  | transcript = transcript_list.find_manually_created_transcript(['de', 'en'])   | ||||||
|  |    | ||||||
|  | # or automatically generated ones   | ||||||
|  | transcript = transcript_list.find_generated_transcript(['de', 'en']) | ||||||
|  | ``` | ||||||
|  |    | ||||||
|  | ## CLI   | ||||||
|  |    | ||||||
|  | Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:   | ||||||
|  |    | ||||||
|  | ```   | ||||||
|  | youtube_transcript_api <first_video_id> <second_video_id> ...   | ||||||
|  | ```   | ||||||
|  |    | ||||||
|  | The CLI also gives you the option to provide a list of preferred languages:   | ||||||
|  |    | ||||||
|  | ```   | ||||||
|  | youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en   | ||||||
|  | ``` | ||||||
| 
 | 
 | ||||||
|  This code uses an undocumented part of the YouTube API, which is called by the YouTube web-client. So there is no guarantee that it won't stop working tomorrow, if they change how things work. I will however do my best to make things working again as soon as possible if that happens. So if it stops working, let me know! | You can also specify if you want to exclude automatically generated or manually created subtitles: | ||||||
| 
 | 
 | ||||||
| ## Donation | ```   | ||||||
|  | youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-generated | ||||||
|  | youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-manually-created | ||||||
|  | ``` | ||||||
|  |    | ||||||
|  | If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line:   | ||||||
|  |    | ||||||
|  | ```   | ||||||
|  | youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json   | ||||||
|  | ```   | ||||||
| 
 | 
 | ||||||
| If this project makes you happy by reducing your development time, you can make me happy by treating me to a cup of coffee :) | Translating transcripts using the CLI is also possible: | ||||||
| 
 | 
 | ||||||
| [](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url) | ```   | ||||||
|  | youtube_transcript_api <first_video_id> <second_video_id> ... --languages en --translate de | ||||||
|  | ```   | ||||||
|  | 
 | ||||||
|  | If you are not sure which languages are available for a given video you can call: | ||||||
|  | 
 | ||||||
|  | ```   | ||||||
|  | youtube_transcript_api --list-transcripts <first_video_id> | ||||||
|  | ```   | ||||||
|  |    | ||||||
|  | ## Proxy   | ||||||
|  |    | ||||||
|  | You can specify a https/http proxy, which will be used during the requests to YouTube:   | ||||||
|  |    | ||||||
|  | ```python   | ||||||
|  | from youtube_transcript_api import YouTubeTranscriptApi   | ||||||
|  |    | ||||||
|  | YouTubeTranscriptApi.get_transcript(video_id, proxies={"http": "http://user:pass@domain:port", "https": "https://user:pass@domain:port"})   | ||||||
|  | ```   | ||||||
|  |    | ||||||
|  | As the `proxies` dict is passed on to the `requests.get(...)` call, it follows the [format used by the requests library](http://docs.python-requests.org/en/master/user/advanced/#proxies).   | ||||||
|  |    | ||||||
|  | Using the CLI:   | ||||||
|  |    | ||||||
|  | ```   | ||||||
|  | youtube_transcript_api <first_video_id> <second_video_id> --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port   | ||||||
|  | ```   | ||||||
|  |    | ||||||
|  |    | ||||||
|  | ## Warning   | ||||||
|  |    | ||||||
|  |  This code uses an undocumented part of the YouTube API, which is called by the YouTube web-client. So there is no guarantee that it won't stop working tomorrow, if they change how things work. I will however do my best to make things working again as soon as possible if that happens. So if it stops working, let me know!   | ||||||
|  |    | ||||||
|  | ## Donation   | ||||||
|  |    | ||||||
|  | If this project makes you happy by reducing your development time, you can make me happy by treating me to a cup of coffee :)   | ||||||
|  |    | ||||||
|  | [](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url) | ||||||
		Loading…
	
		Reference in New Issue