Merge pull request #42 from jdepoix/feature/translating-transcripts
Feature/translating transcripts
This commit is contained in:
		
						commit
						68951600d9
					
				
							
								
								
									
										287
									
								
								README.md
								
								
								
								
							
							
						
						
									
										287
									
								
								README.md
								
								
								
								
							|  | @ -1,121 +1,232 @@ | |||
| # YouTube Transcript/Subtitle API (including automatically generated subtitles) | ||||
| 
 | ||||
| [](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url) | ||||
| [](https://travis-ci.org/jdepoix/youtube-transcript-api) | ||||
| [](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master) | ||||
| [](http://opensource.org/licenses/MIT) | ||||
| [](https://pypi.org/project/youtube-transcript-api/) | ||||
| [](https://pypi.org/project/youtube-transcript-api/) | ||||
| # YouTube Transcript/Subtitle API (including automatically generated subtitles and subtitle translations)   | ||||
|    | ||||
| [](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)   | ||||
| [](https://travis-ci.org/jdepoix/youtube-transcript-api)   | ||||
| [](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master)   | ||||
| [](http://opensource.org/licenses/MIT)   | ||||
| [](https://pypi.org/project/youtube-transcript-api/)   | ||||
| [](https://pypi.org/project/youtube-transcript-api/)   | ||||
|    | ||||
| This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!   | ||||
|    | ||||
| ## Install   | ||||
|    | ||||
| It is recommended to [install this module by using pip](https://pypi.org/project/youtube-transcript-api/):   | ||||
|    | ||||
| ```   | ||||
| pip install youtube_transcript_api   | ||||
| ```   | ||||
|    | ||||
| If you want to use it from source, you'll have to install the dependencies manually:   | ||||
|    | ||||
| ```   | ||||
| pip install -r requirements.txt   | ||||
| ```   | ||||
| 
 | ||||
| This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require a headless browser, like other selenium based solutions do! | ||||
| You can either integrate this module [into an existing application](#api), or just use it via an [CLI](#cli). | ||||
|    | ||||
| ## API | ||||
|    | ||||
| The easiest way to get a transcript for a given video is to execute:   | ||||
|    | ||||
| ```python   | ||||
| from youtube_transcript_api import YouTubeTranscriptApi   | ||||
|    | ||||
| YouTubeTranscriptApi.get_transcript(video_id)   | ||||
| ```   | ||||
|    | ||||
| This will return a list of dictionaries looking somewhat like this:   | ||||
|    | ||||
| ```python   | ||||
| [   | ||||
|     {   | ||||
|         'text': 'Hey there',   | ||||
|         'start': 7.58,   | ||||
|         'duration': 6.13   | ||||
|   },   | ||||
|     {   | ||||
|         'text': 'how are you',   | ||||
|         'start': 14.08,   | ||||
|         'duration': 7.58   | ||||
|   },   | ||||
|     # ...   | ||||
| ]   | ||||
| ```   | ||||
|    | ||||
| You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it defaults to english).   | ||||
|    | ||||
| ```python   | ||||
| YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])   | ||||
| ```   | ||||
|    | ||||
| It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out which languages are available first, [have a look at `list_transcripts()`](#list-available-transcripts) | ||||
|    | ||||
| To get transcripts for a list of video ids you can call:   | ||||
|    | ||||
| ```python   | ||||
| YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])   | ||||
| ```   | ||||
|    | ||||
| `languages` also is optional here.   | ||||
| 
 | ||||
| ## Install | ||||
| ### List available transcripts | ||||
| 
 | ||||
| It is recommended to [install this module by using pip](https://pypi.org/project/youtube-transcript-api/): | ||||
| 
 | ||||
| ``` | ||||
| pip install youtube_transcript_api | ||||
| ``` | ||||
| 
 | ||||
| If you want to use it from source, you'll have to install the dependencies manually: | ||||
| 
 | ||||
| ``` | ||||
| pip install -r requirements.txt | ||||
| ``` | ||||
| 
 | ||||
| ## How to use it | ||||
| 
 | ||||
| You could either integrate this module into an existing application, or just use it via an CLI | ||||
| 
 | ||||
| ### In code | ||||
| 
 | ||||
| To get a transcript for a given video you can do: | ||||
| If you want to list all transcripts which are available for a given video you can call: | ||||
| 
 | ||||
| ```python | ||||
| from youtube_transcript_api import YouTubeTranscriptApi | ||||
| 
 | ||||
| YouTubeTranscriptApi.get_transcript(video_id) | ||||
| transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, languages=['de', 'en']) | ||||
| ``` | ||||
| 
 | ||||
| This will return a list of dictionaries looking somewhat like this: | ||||
| This will return a `TranscriptList` object  which is iterable and provides methods to filter the list of transcripts for specific languages and types, like: | ||||
| 
 | ||||
| ```python | ||||
| [ | ||||
|     { | ||||
|         'text': 'Hey there', | ||||
|         'start': 7.58, | ||||
|         'duration': 6.13 | ||||
|     }, | ||||
|     { | ||||
|         'text': 'how are you', | ||||
|         'start': 14.08, | ||||
|         'duration': 7.58 | ||||
|     }, | ||||
|     # ... | ||||
| ] | ||||
| transcript = transcript_list.find_transcript(['de', 'en'])   | ||||
| ``` | ||||
| 
 | ||||
| You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it defaults to english). | ||||
| By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this default behaviour by searching for specific transcript types: | ||||
| 
 | ||||
| ```python | ||||
| YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en']) | ||||
| # filter for manually created transcripts   | ||||
| transcript = transcript_list.find_manually_created_transcript(['de', 'en'])   | ||||
|    | ||||
| # or automatically generated ones   | ||||
| transcript = transcript_list.find_generated_transcript(['de', 'en']) | ||||
| ``` | ||||
| 
 | ||||
| It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. As I can't provide a complete list of all working language codes with full certainty, you may have to play around with the language codes a bit, to find the one which is working for you! | ||||
| 
 | ||||
| To get transcripts for a list fo video ids you can call: | ||||
| The methods `find_generated_transcript`, `find_manually_created_transcript`, `find_generated_transcript` return `Transcript` objects. They contain metadata regarding the transcript: | ||||
| 
 | ||||
| ```python | ||||
| YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en']) | ||||
| print( | ||||
| 	transcript.video_id,  | ||||
| 	transcript.language,  | ||||
| 	transcript.language_code, | ||||
| 	# whether it has been manually created or generated by YouTube  | ||||
| 	transcript.is_generated, | ||||
| 	# whether this transcript can be translated or not | ||||
| 	transcript.is_translatable, | ||||
| 	# a list of languages the transcript can be translated to  | ||||
| 	transcript.translation_languages,  | ||||
| ) | ||||
| ``` | ||||
| 
 | ||||
| `languages` also is optional here. | ||||
| 
 | ||||
| ### CLI | ||||
| 
 | ||||
| Execute the CLI script using the video ids as parameters and the results will be printed out to the command line: | ||||
| 
 | ||||
| ``` | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... | ||||
| ``` | ||||
| 
 | ||||
| The CLI also gives you the option to provide a list of preferred languages: | ||||
| 
 | ||||
| ``` | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en | ||||
| ``` | ||||
| 
 | ||||
| If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line: | ||||
| 
 | ||||
| ``` | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json | ||||
| ``` | ||||
| 
 | ||||
| ### Proxy | ||||
| 
 | ||||
| You can specify a https/http proxy, which will be used during the requests to YouTube: | ||||
| and provide the method, which allows you to fetch the actual transcript data: | ||||
| 
 | ||||
| ```python | ||||
| from youtube_transcript_api import YouTubeTranscriptApi | ||||
| 
 | ||||
| YouTubeTranscriptApi.get_transcript(video_id, proxies={"http": "http://user:pass@domain:port", "https": "https://user:pass@domain:port"}) | ||||
| transcript.fetch() | ||||
| ``` | ||||
| 
 | ||||
| As the `proxies` dict is passed on to the `requests.get(...)` call, it follows the [format used by the requests library](http://docs.python-requests.org/en/master/user/advanced/#proxies). | ||||
| ### Translate transcript  | ||||
| 
 | ||||
| Using the CLI: | ||||
| YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated `Transcript` object: | ||||
| 
 | ||||
| ``` | ||||
| youtube_transcript_api <first_video_id> <second_video_id> --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port | ||||
| ```python | ||||
| transcript = transcript_list.find_transcript(['en'])  | ||||
| translated_transcript = transcript.translate('de') | ||||
| print(translated_transcript.fetch()) | ||||
| ``` | ||||
| 
 | ||||
| ### By example | ||||
| ```python | ||||
| # retrieve the available transcripts   | ||||
| transcript_list = YouTubeTranscriptApi.get('video_id')   | ||||
|    | ||||
| # iterate over all available transcripts | ||||
| for transcript in transcript_list: | ||||
| 
 | ||||
| ## Warning | ||||
|     # the Transcript object provides metadata properties | ||||
| 	print( | ||||
|         transcript.video_id, | ||||
|         transcript.language, | ||||
| 		transcript.language_code, | ||||
| 		# whether it has been manually created or generated by YouTube  | ||||
| 		transcript.is_generated, | ||||
| 		# whether this transcript can be translated or not | ||||
| 		transcript.is_translatable, | ||||
| 		# a list of languages the transcript can be translated to  | ||||
| 		transcript.translation_languages,  | ||||
| 	) | ||||
| 	   | ||||
| 	# fetch the actual transcript data  | ||||
| 	print(transcript.fetch())   | ||||
| 	 | ||||
| 	# translating the transcript will return another transcript object | ||||
| 	print(transcript.translate('en').fetch())   | ||||
| 	 | ||||
| # you can also directly filter for the language you are looking for, using the transcript list | ||||
| transcript = transcript_list.find_transcript(['de', 'en'])   | ||||
|    | ||||
| # or just filter for manually created transcripts   | ||||
| transcript = transcript_list.find_manually_created_transcript(['de', 'en'])   | ||||
|    | ||||
| # or automatically generated ones   | ||||
| transcript = transcript_list.find_generated_transcript(['de', 'en']) | ||||
| ``` | ||||
|    | ||||
| ## CLI   | ||||
|    | ||||
| Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:   | ||||
|    | ||||
| ```   | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ...   | ||||
| ```   | ||||
|    | ||||
| The CLI also gives you the option to provide a list of preferred languages:   | ||||
|    | ||||
| ```   | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en   | ||||
| ``` | ||||
| 
 | ||||
|  This code uses an undocumented part of the YouTube API, which is called by the YouTube web-client. So there is no guarantee that it won't stop working tomorrow, if they change how things work. I will however do my best to make things working again as soon as possible if that happens. So if it stops working, let me know! | ||||
| You can also specify if you want to exclude automatically generated or manually created subtitles: | ||||
| 
 | ||||
| ## Donation | ||||
| ```   | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-generated | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-manually-created | ||||
| ``` | ||||
|    | ||||
| If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line:   | ||||
|    | ||||
| ```   | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json   | ||||
| ```   | ||||
| 
 | ||||
| If this project makes you happy by reducing your development time, you can make me happy by treating me to a cup of coffee :) | ||||
| Translating transcripts using the CLI is also possible: | ||||
| 
 | ||||
| [](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url) | ||||
| ```   | ||||
| youtube_transcript_api <first_video_id> <second_video_id> ... --languages en --translate de | ||||
| ```   | ||||
| 
 | ||||
| If you are not sure which languages are available for a given video you can call, to list all available transcripts: | ||||
| 
 | ||||
| ```   | ||||
| youtube_transcript_api --list-transcripts <first_video_id> | ||||
| ```   | ||||
|    | ||||
| ## Proxy   | ||||
|    | ||||
| You can specify a https/http proxy, which will be used during the requests to YouTube:   | ||||
|    | ||||
| ```python   | ||||
| from youtube_transcript_api import YouTubeTranscriptApi   | ||||
|    | ||||
| YouTubeTranscriptApi.get_transcript(video_id, proxies={"http": "http://user:pass@domain:port", "https": "https://user:pass@domain:port"})   | ||||
| ```   | ||||
|    | ||||
| As the `proxies` dict is passed on to the `requests.get(...)` call, it follows the [format used by the requests library](http://docs.python-requests.org/en/master/user/advanced/#proxies).   | ||||
|    | ||||
| Using the CLI:   | ||||
|    | ||||
| ```   | ||||
| youtube_transcript_api <first_video_id> <second_video_id> --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port   | ||||
| ```   | ||||
|    | ||||
|    | ||||
| ## Warning   | ||||
|    | ||||
|  This code uses an undocumented part of the YouTube API, which is called by the YouTube web-client. So there is no guarantee that it won't stop working tomorrow, if they change how things work. I will however do my best to make things working again as soon as possible if that happens. So if it stops working, let me know!   | ||||
|    | ||||
| ## Donation   | ||||
|    | ||||
| If this project makes you happy by reducing your development time, you can make me happy by treating me to a cup of coffee :)   | ||||
|    | ||||
| [](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url) | ||||
|  | @ -1,3 +1,11 @@ | |||
| from ._api import YouTubeTranscriptApi | ||||
| from ._transcripts import TranscriptList, Transcript | ||||
| from ._errors import TranscriptsDisabled, NoTranscriptFound, CouldNotRetrieveTranscript, VideoUnavailable | ||||
| from ._errors import ( | ||||
|     TranscriptsDisabled, | ||||
|     NoTranscriptFound, | ||||
|     CouldNotRetrieveTranscript, | ||||
|     VideoUnavailable, | ||||
|     NotTranslatable, | ||||
|     TranslationLanguageNotAvailable, | ||||
|     NoTranscriptAvailable, | ||||
| ) | ||||
|  |  | |||
|  | @ -4,17 +4,68 @@ from ._transcripts import TranscriptListFetcher | |||
| 
 | ||||
| 
 | ||||
| class YouTubeTranscriptApi(): | ||||
|     @classmethod | ||||
|     def list_transcripts(cls, video_id, proxies=None): | ||||
|         """ | ||||
|         Retrieves the list of transcripts which are available for a given video. It returns a `TranscriptList` object | ||||
|         which is iterable and provides methods to filter the list of transcripts for specific languages. While iterating | ||||
|         over the `TranscriptList` the individual transcripts are represented by `Transcript` objects, which provide | ||||
|         metadata and can either be fetched by calling `transcript.fetch()` or translated by calling | ||||
|         `transcript.translate('en')`. Example:: | ||||
| 
 | ||||
|             # retrieve the available transcripts | ||||
|             transcript_list = YouTubeTranscriptApi.get('video_id') | ||||
| 
 | ||||
|             # iterate over all available transcripts | ||||
|             for transcript in transcript_list: | ||||
|                 # the Transcript object provides metadata properties | ||||
|                 print( | ||||
|                     transcript.video_id, | ||||
|                     transcript.language, | ||||
|                     transcript.language_code, | ||||
|                     # whether it has been manually created or generated by YouTube | ||||
|                     transcript.is_generated, | ||||
|                     # a list of languages the transcript can be translated to | ||||
|                     transcript.translation_languages, | ||||
|                 ) | ||||
| 
 | ||||
|                 # fetch the actual transcript data | ||||
|                 print(transcript.fetch()) | ||||
| 
 | ||||
|                 # translating the transcript will return another transcript object | ||||
|                 print(transcript.translate('en').fetch()) | ||||
| 
 | ||||
|             # you can also directly filter for the language you are looking for, using the transcript list | ||||
|             transcript = transcript_list.find_transcript(['de', 'en']) | ||||
| 
 | ||||
|             # or just filter for manually created transcripts | ||||
|             transcript = transcript_list.find_manually_created_transcript(['de', 'en']) | ||||
| 
 | ||||
|             # or automatically generated ones | ||||
|             transcript = transcript_list.find_generated_transcript(['de', 'en']) | ||||
| 
 | ||||
|         :param video_id: the youtube video id | ||||
|         :type video_id: str | ||||
|         :param proxies: a dictionary mapping of http and https proxies to be used for the network requests | ||||
|         :type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies | ||||
|         :return: the list of available transcripts | ||||
|         :rtype TranscriptList: | ||||
|         """ | ||||
|         with requests.Session() as http_client: | ||||
|             http_client.proxies = proxies if proxies else {} | ||||
|             return TranscriptListFetcher(http_client).fetch(video_id) | ||||
| 
 | ||||
|     @classmethod | ||||
|     def get_transcripts(cls, video_ids, languages=('en',), continue_after_error=False, proxies=None): | ||||
|         """ | ||||
|         Retrieves the transcripts for a list of videos. | ||||
| 
 | ||||
|         :param video_ids: a list of youtube video ids | ||||
|         :type video_ids: [str] | ||||
|         :type video_ids: list[str] | ||||
|         :param languages: A list of language codes in a descending priority. For example, if this is set to ['de', 'en'] | ||||
|         it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails to | ||||
|         do so. | ||||
|         :type languages: [str] | ||||
|         :type languages: list[str] | ||||
|         :param continue_after_error: if this is set the execution won't be stopped, if an error occurs while retrieving | ||||
|         one of the video transcripts | ||||
|         :type continue_after_error: bool | ||||
|  | @ -22,7 +73,7 @@ class YouTubeTranscriptApi(): | |||
|         :type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies | ||||
|         :return: a tuple containing a dictionary mapping video ids onto their corresponding transcripts, and a list of | ||||
|         video ids, which could not be retrieved | ||||
|         :rtype: ({str: [{'text': str, 'start': float, 'end': float}]}, [str]}) | ||||
|         :rtype ({str: [{'text': str, 'start': float, 'end': float}]}, [str]}): | ||||
|         """ | ||||
|         data = {} | ||||
|         unretrievable_videos = [] | ||||
|  | @ -41,19 +92,19 @@ class YouTubeTranscriptApi(): | |||
|     @classmethod | ||||
|     def get_transcript(cls, video_id, languages=('en',), proxies=None): | ||||
|         """ | ||||
|         Retrieves the transcript for a single video. | ||||
|         Retrieves the transcript for a single video. This is just a shortcut for calling:: | ||||
| 
 | ||||
|             YouTubeTranscriptApi.list_transcripts(video_id, proxies).find_transcript(languages).fetch() | ||||
| 
 | ||||
|         :param video_id: the youtube video id | ||||
|         :type video_id: str | ||||
|         :param languages: A list of language codes in a descending priority. For example, if this is set to ['de', 'en'] | ||||
|         it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails to | ||||
|         do so. | ||||
|         :type languages: [str] | ||||
|         :type languages: list[str] | ||||
|         :param proxies: a dictionary mapping of http and https proxies to be used for the network requests | ||||
|         :type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies | ||||
|         :return: a list of dictionaries containing the 'text', 'start' and 'duration' keys | ||||
|         :rtype: [{'text': str, 'start': float, 'end': float}] | ||||
|         :rtype [{'text': str, 'start': float, 'end': float}]: | ||||
|         """ | ||||
|         with requests.Session() as http_client: | ||||
|             http_client.proxies = proxies if proxies else {} | ||||
|             return TranscriptListFetcher(http_client).fetch(video_id).find_transcript(languages).fetch() | ||||
|         return cls.list_transcripts(video_id, proxies).find_transcript(languages).fetch() | ||||
|  |  | |||
|  | @ -14,22 +14,45 @@ class YouTubeTranscriptCli(): | |||
|     def run(self): | ||||
|         parsed_args = self._parse_args() | ||||
| 
 | ||||
|         if parsed_args.exclude_manually_created and parsed_args.exclude_generated: | ||||
|             return '' | ||||
| 
 | ||||
|         proxies = None | ||||
|         if parsed_args.http_proxy != '' or parsed_args.https_proxy != '': | ||||
|             proxies = {"http": parsed_args.http_proxy, "https": parsed_args.https_proxy} | ||||
| 
 | ||||
|         transcripts, unretrievable_videos = YouTubeTranscriptApi.get_transcripts( | ||||
|             parsed_args.video_ids, | ||||
|             languages=parsed_args.languages, | ||||
|             continue_after_error=True, | ||||
|             proxies=proxies | ||||
|         ) | ||||
|         transcripts = [] | ||||
|         exceptions = [] | ||||
| 
 | ||||
|         for video_id in parsed_args.video_ids: | ||||
|             try: | ||||
|                 transcripts.append(self._fetch_transcript(parsed_args, proxies, video_id)) | ||||
|             except Exception as exception: | ||||
|                 exceptions.append(exception) | ||||
| 
 | ||||
|         return '\n\n'.join( | ||||
|             [str(YouTubeTranscriptApi.CouldNotRetrieveTranscript(video_id)) for video_id in unretrievable_videos] | ||||
|             [str(exception) for exception in exceptions] | ||||
|             + ([json.dumps(transcripts) if parsed_args.json else pprint.pformat(transcripts)] if transcripts else []) | ||||
|         ) | ||||
| 
 | ||||
|     def _fetch_transcript(self, parsed_args, proxies, video_id): | ||||
|         transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, proxies=proxies) | ||||
| 
 | ||||
|         if parsed_args.list_transcripts: | ||||
|             return str(transcript_list) | ||||
| 
 | ||||
|         if parsed_args.exclude_manually_created: | ||||
|             transcript = transcript_list.find_generated_transcript(parsed_args.languages) | ||||
|         elif parsed_args.exclude_generated: | ||||
|             transcript = transcript_list.find_manually_created_transcript(parsed_args.languages) | ||||
|         else: | ||||
|             transcript = transcript_list.find_transcript(parsed_args.languages) | ||||
| 
 | ||||
|         if parsed_args.translate: | ||||
|             transcript = transcript.translate(parsed_args.translate) | ||||
| 
 | ||||
|         return transcript.fetch() | ||||
| 
 | ||||
|     def _parse_args(self): | ||||
|         parser = argparse.ArgumentParser( | ||||
|             description=( | ||||
|  | @ -38,6 +61,13 @@ class YouTubeTranscriptCli(): | |||
|                 'other selenium based solutions do!' | ||||
|             ) | ||||
|         ) | ||||
|         parser.add_argument( | ||||
|             '--list-transcripts', | ||||
|             action='store_const', | ||||
|             const=True, | ||||
|             default=False, | ||||
|             help='This will list the languages in which the given videos are available in.', | ||||
|         ) | ||||
|         parser.add_argument('video_ids', nargs='+', type=str, help='List of YouTube video IDs.') | ||||
|         parser.add_argument( | ||||
|             '--languages', | ||||
|  | @ -46,11 +76,25 @@ class YouTubeTranscriptCli(): | |||
|             type=str, | ||||
|             help=( | ||||
|                 'A list of language codes in a descending priority. For example, if this is set to "de en" it will ' | ||||
|                 'first try to fetch the german transcript (de) and then fetch the english transcipt (en) if it fails ' | ||||
|                 'first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails ' | ||||
|                 'to do so. As I can\'t provide a complete list of all working language codes with full certainty, you ' | ||||
|                 'may have to play around with the language codes a bit, to find the one which is working for you!' | ||||
|             ), | ||||
|         ) | ||||
|         parser.add_argument( | ||||
|             '--exclude-generated', | ||||
|             action='store_const', | ||||
|             const=True, | ||||
|             default=False, | ||||
|             help='If this flag is set transcripts which have been generated by YouTube will not be retrieved.', | ||||
|         ) | ||||
|         parser.add_argument( | ||||
|             '--exclude-manually-created', | ||||
|             action='store_const', | ||||
|             const=True, | ||||
|             default=False, | ||||
|             help='If this flag is set transcripts which have been manually created will not be retrieved.', | ||||
|         ) | ||||
|         parser.add_argument( | ||||
|             '--json', | ||||
|             action='store_const', | ||||
|  | @ -59,13 +103,24 @@ class YouTubeTranscriptCli(): | |||
|             help='If this flag is set the output will be JSON formatted.', | ||||
|         ) | ||||
|         parser.add_argument( | ||||
|             '--http-proxy', dest='http_proxy', | ||||
|             default='', metavar='URL', | ||||
|             '--translate', | ||||
|             default='', | ||||
|             help=( | ||||
|                 'The language code for the language you want this transcript to be translated to. Use the ' | ||||
|                 '--list-transcripts feature to find out which languages are translatable and which translation ' | ||||
|                 'languages are available.' | ||||
|             ) | ||||
|         ) | ||||
|         parser.add_argument( | ||||
|             '--http-proxy', | ||||
|             default='', | ||||
|             metavar='URL', | ||||
|             help='Use the specified HTTP proxy.' | ||||
|         ) | ||||
|         parser.add_argument( | ||||
|             '--https-proxy', dest='https_proxy', | ||||
|             default='', metavar='URL', | ||||
|             '--https-proxy', | ||||
|             default='', | ||||
|             metavar='URL', | ||||
|             help='Use the specified HTTPS proxy.' | ||||
|         ) | ||||
| 
 | ||||
|  |  | |||
|  | @ -11,7 +11,7 @@ class CouldNotRetrieveTranscript(Exception): | |||
|     GITHUB_REFERRAL = ( | ||||
|         '\n\nIf you are sure that the described cause is not responsible for this error ' | ||||
|         'and that a transcript should be retrievable, please create an issue at ' | ||||
|         'https://github.com/jdepoix/youtube-transcript-api/issues.' | ||||
|         'https://github.com/jdepoix/youtube-transcript-api/issues. ' | ||||
|         'Please add which version of youtube_transcript_api you are using ' | ||||
|         'and provide the information needed to replicate the error. ' | ||||
|         'Also make sure that there are no open issues which already describe your problem!' | ||||
|  | @ -43,6 +43,18 @@ class TranscriptsDisabled(CouldNotRetrieveTranscript): | |||
|     CAUSE_MESSAGE = 'Subtitles are disabled for this video' | ||||
| 
 | ||||
| 
 | ||||
| class NoTranscriptAvailable(CouldNotRetrieveTranscript): | ||||
|     CAUSE_MESSAGE = 'No transcripts are available for this video' | ||||
| 
 | ||||
| 
 | ||||
| class NotTranslatable(CouldNotRetrieveTranscript): | ||||
|     CAUSE_MESSAGE = 'The requested language is not translatable' | ||||
| 
 | ||||
| 
 | ||||
| class TranslationLanguageNotAvailable(CouldNotRetrieveTranscript): | ||||
|     CAUSE_MESSAGE = 'The requested translation language is not available' | ||||
| 
 | ||||
| 
 | ||||
| class NoTranscriptFound(CouldNotRetrieveTranscript): | ||||
|     CAUSE_MESSAGE = ( | ||||
|         'No transcripts were found for any of the requested language codes: {requested_language_codes}\n\n' | ||||
|  |  | |||
|  | @ -12,7 +12,14 @@ from xml.etree import ElementTree | |||
| import re | ||||
| 
 | ||||
| from ._html_unescaping import unescape | ||||
| from ._errors import VideoUnavailable, NoTranscriptFound, TranscriptsDisabled | ||||
| from ._errors import ( | ||||
|     VideoUnavailable, | ||||
|     NoTranscriptFound, | ||||
|     TranscriptsDisabled, | ||||
|     NotTranslatable, | ||||
|     TranslationLanguageNotAvailable, | ||||
|     NoTranscriptAvailable, | ||||
| ) | ||||
| from ._settings import WATCH_URL | ||||
| 
 | ||||
| 
 | ||||
|  | @ -36,9 +43,14 @@ class TranscriptListFetcher(): | |||
| 
 | ||||
|             raise TranscriptsDisabled(video_id) | ||||
| 
 | ||||
|         return json.loads(splitted_html[1].split(',"videoDetails')[0].replace('\n', ''))[ | ||||
|             'playerCaptionsTracklistRenderer' | ||||
|         ] | ||||
|         captions_json = json.loads( | ||||
|             splitted_html[1].split(',"videoDetails')[0].replace('\n', '') | ||||
|         )['playerCaptionsTracklistRenderer'] | ||||
| 
 | ||||
|         if 'captionTracks' not in captions_json: | ||||
|             raise NoTranscriptAvailable(video_id) | ||||
| 
 | ||||
|         return captions_json | ||||
| 
 | ||||
|     def _fetch_html(self, video_id): | ||||
|         return self._http_client.get(WATCH_URL.format(video_id=video_id)).text.replace( | ||||
|  | @ -53,10 +65,7 @@ class TranscriptList(): | |||
|     This object represents a list of transcripts. It can be iterated over to list all transcripts which are available | ||||
|     for a given YouTube video. Also it provides functionality to search for a transcript in a given language. | ||||
|     """ | ||||
| 
 | ||||
|     # TODO implement iterator | ||||
| 
 | ||||
|     def __init__(self, video_id, manually_created_transcripts, generated_transcripts): | ||||
|     def __init__(self, video_id, manually_created_transcripts, generated_transcripts, translation_languages): | ||||
|         """ | ||||
|         The constructor is only for internal use. Use the static build method instead. | ||||
| 
 | ||||
|  | @ -66,10 +75,13 @@ class TranscriptList(): | |||
|         :type manually_created_transcripts: dict[str, Transcript] | ||||
|         :param generated_transcripts: dict mapping language codes to the generated transcripts | ||||
|         :type generated_transcripts: dict[str, Transcript] | ||||
|         :param translation_languages: list of languages which can be used for translatable languages | ||||
|         :type translation_languages: list[dict[str, str]] | ||||
|         """ | ||||
|         self.video_id = video_id | ||||
|         self._manually_created_transcripts = manually_created_transcripts | ||||
|         self._generated_transcripts = generated_transcripts | ||||
|         self._translation_languages = translation_languages | ||||
| 
 | ||||
|     @staticmethod | ||||
|     def build(http_client, video_id, captions_json): | ||||
|  | @ -83,7 +95,7 @@ class TranscriptList(): | |||
|         :param captions_json: the JSON parsed from the YouTube pages static HTML | ||||
|         :type captions_json: dict | ||||
|         :return: the created TranscriptList | ||||
|         :rtype TranscriptList | ||||
|         :rtype TranscriptList: | ||||
|         """ | ||||
|         translation_languages = [ | ||||
|             { | ||||
|  | @ -108,15 +120,19 @@ class TranscriptList(): | |||
|                 caption['name']['simpleText'], | ||||
|                 caption['languageCode'], | ||||
|                 caption.get('kind', '') == 'asr', | ||||
|                 translation_languages if caption['isTranslatable'] else [] | ||||
|                 translation_languages if caption.get('isTranslatable', False) else [] | ||||
|             ) | ||||
| 
 | ||||
|         return TranscriptList( | ||||
|             video_id, | ||||
|             manually_created_transcripts, | ||||
|             generated_transcripts, | ||||
|             translation_languages, | ||||
|         ) | ||||
| 
 | ||||
|     def __iter__(self): | ||||
|         return iter(list(self._manually_created_transcripts.values()) + list(self._generated_transcripts.values())) | ||||
| 
 | ||||
|     def find_transcript(self, language_codes): | ||||
|         """ | ||||
|         Finds a transcript for a given language code. Manually created transcripts are returned first and only if none | ||||
|  | @ -126,9 +142,9 @@ class TranscriptList(): | |||
|         :param language_codes: A list of language codes in a descending priority. For example, if this is set to | ||||
|         ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if | ||||
|         it fails to do so. | ||||
|         :type languages: [str] | ||||
|         :type languages: list[str] | ||||
|         :return: the found Transcript | ||||
|         :rtype: Transcript | ||||
|         :rtype Transcript: | ||||
|         :raises: NoTranscriptFound | ||||
|         """ | ||||
|         return self._find_transcript(language_codes, [self._manually_created_transcripts, self._generated_transcripts]) | ||||
|  | @ -140,9 +156,9 @@ class TranscriptList(): | |||
|         :param language_codes: A list of language codes in a descending priority. For example, if this is set to | ||||
|         ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if | ||||
|         it fails to do so. | ||||
|         :type languages: [str] | ||||
|         :type languages: list[str] | ||||
|         :return: the found Transcript | ||||
|         :rtype: Transcript | ||||
|         :rtype Transcript: | ||||
|         :raises: NoTranscriptFound | ||||
|         """ | ||||
|         return self._find_transcript(language_codes, [self._generated_transcripts,]) | ||||
|  | @ -154,9 +170,9 @@ class TranscriptList(): | |||
|         :param language_codes: A list of language codes in a descending priority. For example, if this is set to | ||||
|         ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if | ||||
|         it fails to do so. | ||||
|         :type languages: [str] | ||||
|         :type languages: list[str] | ||||
|         :return: the found Transcript | ||||
|         :rtype: Transcript | ||||
|         :rtype Transcript: | ||||
|         :raises: NoTranscriptFound | ||||
|         """ | ||||
|         return self._find_transcript(language_codes, [self._manually_created_transcripts,]) | ||||
|  | @ -179,22 +195,28 @@ class TranscriptList(): | |||
|             '(MANUALLY CREATED)\n' | ||||
|             '{available_manually_created_transcript_languages}\n\n' | ||||
|             '(GENERATED)\n' | ||||
|             '{available_generated_transcripts}' | ||||
|             '{available_generated_transcripts}\n\n' | ||||
|             '(TRANSLATION LANGUAGES)\n' | ||||
|             '{available_translation_languages}' | ||||
|         ).format( | ||||
|             video_id=self.video_id, | ||||
|             available_manually_created_transcript_languages=self._get_language_description( | ||||
|                 self._manually_created_transcripts.values() | ||||
|                 str(transcript) for transcript in self._manually_created_transcripts.values() | ||||
|             ), | ||||
|             available_generated_transcripts=self._get_language_description( | ||||
|                 self._generated_transcripts.values() | ||||
|                 str(transcript) for transcript in self._generated_transcripts.values() | ||||
|             ), | ||||
|             available_translation_languages=self._get_language_description( | ||||
|                 '{language_code} ("{language}")'.format( | ||||
|                     language=translation_language['language'], | ||||
|                     language_code=translation_language['language_code'], | ||||
|                 ) for translation_language in self._translation_languages | ||||
|             ) | ||||
|         ) | ||||
| 
 | ||||
|     def _get_language_description(self, transcripts): | ||||
|         return '\n'.join( | ||||
|             ' - {transcript}'.format(transcript=str(transcript)) | ||||
|             for transcript in transcripts | ||||
|         ) if transcripts else 'None' | ||||
|     def _get_language_description(self, transcript_strings): | ||||
|         description = '\n'.join(' - {transcript}'.format(transcript=transcript) for transcript in transcript_strings) | ||||
|         return description if description else 'None' | ||||
| 
 | ||||
| 
 | ||||
| class Transcript(): | ||||
|  | @ -220,45 +242,49 @@ class Transcript(): | |||
|         self.language_code = language_code | ||||
|         self.is_generated = is_generated | ||||
|         self.translation_languages = translation_languages | ||||
|         self._translation_languages_dict = { | ||||
|             translation_language['language_code']: translation_language['language'] | ||||
|             for translation_language in translation_languages | ||||
|         } | ||||
| 
 | ||||
|     def fetch(self): | ||||
|         """ | ||||
|         Loads the actual transcript data. | ||||
| 
 | ||||
|         :return: a list of dictionaries containing the 'text', 'start' and 'duration' keys | ||||
|         :rtype: [{'text': str, 'start': float, 'end': float}] | ||||
|         :rtype [{'text': str, 'start': float, 'end': float}]: | ||||
|         """ | ||||
|         return _TranscriptParser().parse( | ||||
|             self._http_client.get(self._url).text | ||||
|         ) | ||||
| 
 | ||||
|     def __str__(self): | ||||
|         return '{language_code} ("{language}")'.format( | ||||
|         return '{language_code} ("{language}"){translation_description}'.format( | ||||
|             language=self.language, | ||||
|             language_code=self.language_code, | ||||
|             translation_description='[TRANSLATABLE]' if self.is_translatable else '' | ||||
|         ) | ||||
| 
 | ||||
| # TODO integrate translations in future release | ||||
| #     @property | ||||
| #     def is_translatable(self): | ||||
| #         return len(self.translation_languages) > 0 | ||||
| # | ||||
| # | ||||
| # class TranslatableTranscript(Transcript): | ||||
| #     def __init__(self, http_client, url, translation_languages): | ||||
| #         super(TranslatableTranscript, self).__init__(http_client, url) | ||||
| #         self._translation_languages = translation_languages | ||||
| #         self._translation_language_codes = {language['language_code'] for language in translation_languages} | ||||
| # | ||||
| # | ||||
| #     def translate(self, language_code): | ||||
| #         if language_code not in self._translation_language_codes: | ||||
| #             raise TranslatableTranscript.TranslationLanguageNotAvailable() | ||||
| # | ||||
| #         return Transcript( | ||||
| #             self._http_client, | ||||
| #             '{url}&tlang={language_code}'.format(url=self._url, language_code=language_code) | ||||
| #         ) | ||||
|     @property | ||||
|     def is_translatable(self): | ||||
|         return len(self.translation_languages) > 0 | ||||
| 
 | ||||
|     def translate(self, language_code): | ||||
|         if not self.is_translatable: | ||||
|             raise NotTranslatable(self.video_id) | ||||
| 
 | ||||
|         if language_code not in self._translation_languages_dict: | ||||
|             raise TranslationLanguageNotAvailable(self.video_id) | ||||
| 
 | ||||
|         return Transcript( | ||||
|             self._http_client, | ||||
|             self.video_id, | ||||
|             '{url}&tlang={language_code}'.format(url=self._url, language_code=language_code), | ||||
|             self._translation_languages_dict[language_code], | ||||
|             language_code, | ||||
|             True, | ||||
|             [], | ||||
|         ) | ||||
| 
 | ||||
| 
 | ||||
| class _TranscriptParser(): | ||||
|  | @ -269,7 +295,7 @@ class _TranscriptParser(): | |||
|             { | ||||
|                 'text': re.sub(self.HTML_TAG_REGEX, '', unescape(xml_element.text)), | ||||
|                 'start': float(xml_element.attrib['start']), | ||||
|                 'duration': float(xml_element.attrib['dur']), | ||||
|                 'duration': float(xml_element.attrib.get('dur', '0.0')), | ||||
|             } | ||||
|             for xml_element in ElementTree.fromstring(plain_data) | ||||
|             if xml_element.text is not None | ||||
|  |  | |||
										
											
												File diff suppressed because one or more lines are too long
											
										
									
								
							|  | @ -5,7 +5,15 @@ import os | |||
| 
 | ||||
| import httpretty | ||||
| 
 | ||||
| from youtube_transcript_api import YouTubeTranscriptApi, VideoUnavailable, NoTranscriptFound, TranscriptsDisabled | ||||
| from youtube_transcript_api import ( | ||||
|     YouTubeTranscriptApi, | ||||
|     TranscriptsDisabled, | ||||
|     NoTranscriptFound, | ||||
|     VideoUnavailable, | ||||
|     NoTranscriptAvailable, | ||||
|     NotTranslatable, | ||||
|     TranslationLanguageNotAvailable, | ||||
| ) | ||||
| 
 | ||||
| 
 | ||||
| def load_asset(filename): | ||||
|  | @ -42,6 +50,51 @@ class TestYouTubeTranscriptApi(TestCase): | |||
|             ] | ||||
|         ) | ||||
| 
 | ||||
|     def test_list_transcripts(self): | ||||
|         transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8') | ||||
| 
 | ||||
|         language_codes = {transcript.language_code for transcript in transcript_list} | ||||
| 
 | ||||
|         self.assertEqual(language_codes, {'zh', 'de', 'en', 'hi', 'ja', 'ko', 'es', 'cs', 'en'}) | ||||
| 
 | ||||
|     def test_list_transcripts__find_manually_created(self): | ||||
|         transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8') | ||||
|         transcript = transcript_list.find_manually_created_transcript(['cs']) | ||||
| 
 | ||||
|         self.assertFalse(transcript.is_generated) | ||||
| 
 | ||||
| 
 | ||||
|     def test_list_transcripts__find_generated(self): | ||||
|         transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8') | ||||
| 
 | ||||
|         with self.assertRaises(NoTranscriptFound): | ||||
|             transcript_list.find_generated_transcript(['cs']) | ||||
| 
 | ||||
|         transcript = transcript_list.find_generated_transcript(['en']) | ||||
| 
 | ||||
|         self.assertTrue(transcript.is_generated) | ||||
| 
 | ||||
|     def test_translate_transcript(self): | ||||
|         transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en']) | ||||
| 
 | ||||
|         translated_transcript = transcript.translate('af') | ||||
| 
 | ||||
|         self.assertEqual(translated_transcript.language_code, 'af') | ||||
|         self.assertIn('&tlang=af', translated_transcript._url) | ||||
| 
 | ||||
|     def test_translate_transcript__translation_language_not_available(self): | ||||
|         transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en']) | ||||
| 
 | ||||
|         with self.assertRaises(TranslationLanguageNotAvailable): | ||||
|             transcript.translate('xyz') | ||||
| 
 | ||||
|     def test_translate_transcript__not_translatable(self): | ||||
|         transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en']) | ||||
|         transcript.translation_languages = [] | ||||
| 
 | ||||
|         with self.assertRaises(NotTranslatable): | ||||
|             transcript.translate('af') | ||||
| 
 | ||||
|     def test_get_transcript__correct_language_is_used(self): | ||||
|         YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8', ['de', 'en']) | ||||
|         query_string = httpretty.last_request().querystring | ||||
|  | @ -88,6 +141,16 @@ class TestYouTubeTranscriptApi(TestCase): | |||
|         with self.assertRaises(NoTranscriptFound): | ||||
|             YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8', languages=['cz']) | ||||
| 
 | ||||
|     def test_get_transcript__exception_if_no_transcript_available(self): | ||||
|         httpretty.register_uri( | ||||
|             httpretty.GET, | ||||
|             'https://www.youtube.com/watch', | ||||
|             body=load_asset('youtube_no_transcript_available.html.static') | ||||
|         ) | ||||
| 
 | ||||
|         with self.assertRaises(NoTranscriptAvailable): | ||||
|             YouTubeTranscriptApi.get_transcript('MwBPvcYFY2E') | ||||
| 
 | ||||
|     def test_get_transcripts(self): | ||||
|         video_id_1 = 'video_id_1' | ||||
|         video_id_2 = 'video_id_2' | ||||
|  |  | |||
|  | @ -3,10 +3,27 @@ from mock import MagicMock | |||
| 
 | ||||
| import json | ||||
| 
 | ||||
| from youtube_transcript_api._cli import YouTubeTranscriptCli, YouTubeTranscriptApi | ||||
| from youtube_transcript_api import YouTubeTranscriptApi, VideoUnavailable | ||||
| from youtube_transcript_api._cli import YouTubeTranscriptCli | ||||
| 
 | ||||
| 
 | ||||
| class TestYouTubeTranscriptCli(TestCase): | ||||
|     def setUp(self): | ||||
|         self.transcript_mock = MagicMock() | ||||
|         self.transcript_mock.fetch = MagicMock(return_value=[ | ||||
|             {'text': 'Hey, this is just a test', 'start': 0.0, 'duration': 1.54}, | ||||
|             {'text': 'this is not the original transcript', 'start': 1.54, 'duration': 4.16}, | ||||
|             {'text': 'just something shorter, I made up for testing', 'start': 5.7, 'duration': 3.239} | ||||
|         ]) | ||||
|         self.transcript_mock.translate = MagicMock(return_value=self.transcript_mock) | ||||
| 
 | ||||
|         self.transcript_list_mock = MagicMock() | ||||
|         self.transcript_list_mock.find_generated_transcript = MagicMock(return_value=self.transcript_mock) | ||||
|         self.transcript_list_mock.find_manually_created_transcript = MagicMock(return_value=self.transcript_mock) | ||||
|         self.transcript_list_mock.find_transcript = MagicMock(return_value=self.transcript_mock) | ||||
| 
 | ||||
|         YouTubeTranscriptApi.list_transcripts = MagicMock(return_value=self.transcript_list_mock) | ||||
| 
 | ||||
|     def test_argument_parsing(self): | ||||
|         parsed_args = YouTubeTranscriptCli('v1 v2 --json --languages de en'.split())._parse_args() | ||||
|         self.assertEqual(parsed_args.video_ids, ['v1', 'v2']) | ||||
|  | @ -106,32 +123,107 @@ class TestYouTubeTranscriptCli(TestCase): | |||
|         self.assertEqual(parsed_args.http_proxy, '') | ||||
|         self.assertEqual(parsed_args.https_proxy, '') | ||||
| 
 | ||||
|     def test_argument_parsing__list_transcripts(self): | ||||
|         parsed_args = YouTubeTranscriptCli('--list-transcripts v1 v2'.split())._parse_args() | ||||
|         self.assertEqual(parsed_args.video_ids, ['v1', 'v2']) | ||||
|         self.assertTrue(parsed_args.list_transcripts) | ||||
| 
 | ||||
|         parsed_args = YouTubeTranscriptCli('v1 v2 --list-transcripts'.split())._parse_args() | ||||
|         self.assertEqual(parsed_args.video_ids, ['v1', 'v2']) | ||||
|         self.assertTrue(parsed_args.list_transcripts) | ||||
| 
 | ||||
|     def test_argument_parsing__translate(self): | ||||
|         parsed_args = YouTubeTranscriptCli('v1 v2 --languages de en --translate cz'.split())._parse_args() | ||||
|         self.assertEqual(parsed_args.video_ids, ['v1', 'v2']) | ||||
|         self.assertEqual(parsed_args.json, False) | ||||
|         self.assertEqual(parsed_args.languages, ['de', 'en']) | ||||
|         self.assertEqual(parsed_args.translate, 'cz') | ||||
| 
 | ||||
|         parsed_args = YouTubeTranscriptCli('v1 v2 --translate cz --languages de en'.split())._parse_args() | ||||
|         self.assertEqual(parsed_args.video_ids, ['v1', 'v2']) | ||||
|         self.assertEqual(parsed_args.json, False) | ||||
|         self.assertEqual(parsed_args.languages, ['de', 'en']) | ||||
|         self.assertEqual(parsed_args.translate, 'cz') | ||||
| 
 | ||||
|     def test_argument_parsing__manually_or_generated(self): | ||||
|         parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-manually-created'.split())._parse_args() | ||||
|         self.assertEqual(parsed_args.video_ids, ['v1', 'v2']) | ||||
|         self.assertTrue(parsed_args.exclude_manually_created) | ||||
|         self.assertFalse(parsed_args.exclude_generated) | ||||
| 
 | ||||
|         parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-generated'.split())._parse_args() | ||||
|         self.assertEqual(parsed_args.video_ids, ['v1', 'v2']) | ||||
|         self.assertFalse(parsed_args.exclude_manually_created) | ||||
|         self.assertTrue(parsed_args.exclude_generated) | ||||
| 
 | ||||
|         parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-manually-created --exclude-generated'.split())._parse_args() | ||||
|         self.assertEqual(parsed_args.video_ids, ['v1', 'v2']) | ||||
|         self.assertTrue(parsed_args.exclude_manually_created) | ||||
|         self.assertTrue(parsed_args.exclude_generated) | ||||
| 
 | ||||
|     def test_run(self): | ||||
|         YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([], [])) | ||||
|         YouTubeTranscriptCli('v1 v2 --languages de en'.split()).run() | ||||
| 
 | ||||
|         YouTubeTranscriptApi.get_transcripts.assert_called_once_with( | ||||
|             ['v1', 'v2'], | ||||
|             languages=['de', 'en'], | ||||
|             continue_after_error=True, | ||||
|             proxies=None | ||||
|         YouTubeTranscriptApi.list_transcripts.assert_any_call('v1', proxies=None) | ||||
|         YouTubeTranscriptApi.list_transcripts.assert_any_call('v2', proxies=None) | ||||
| 
 | ||||
|         self.transcript_list_mock.find_transcript.assert_any_call(['de', 'en']) | ||||
| 
 | ||||
|     def test_run__failing_transcripts(self): | ||||
|         YouTubeTranscriptApi.list_transcripts = MagicMock(side_effect=VideoUnavailable('video_id')) | ||||
| 
 | ||||
|         output = YouTubeTranscriptCli('v1 --languages de en'.split()).run() | ||||
| 
 | ||||
|         self.assertEqual(output, str(VideoUnavailable('video_id'))) | ||||
| 
 | ||||
|     def test_run__exclude_generated(self): | ||||
|         YouTubeTranscriptCli('v1 v2 --languages de en --exclude-generated'.split()).run() | ||||
| 
 | ||||
|         self.transcript_list_mock.find_manually_created_transcript.assert_any_call(['de', 'en']) | ||||
| 
 | ||||
|     def test_run__exclude_manually_created(self): | ||||
|         YouTubeTranscriptCli('v1 v2 --languages de en --exclude-manually-created'.split()).run() | ||||
| 
 | ||||
|         self.transcript_list_mock.find_generated_transcript.assert_any_call(['de', 'en']) | ||||
| 
 | ||||
|     def test_run__exclude_manually_created_and_generated(self): | ||||
|         self.assertEqual( | ||||
|             YouTubeTranscriptCli('v1 v2 --languages de en --exclude-manually-created --exclude-generated'.split()).run(), | ||||
|             '' | ||||
|         ) | ||||
| 
 | ||||
|     def test_run__translate(self): | ||||
|         YouTubeTranscriptCli('v1 v2 --languages de en --translate cz'.split()).run(), | ||||
| 
 | ||||
|         self.transcript_mock.translate.assert_any_call('cz') | ||||
| 
 | ||||
|     def test_run__list_transcripts(self): | ||||
|         YouTubeTranscriptCli('--list-transcripts v1 v2'.split()).run() | ||||
| 
 | ||||
|         YouTubeTranscriptApi.list_transcripts.assert_any_call('v1', proxies=None) | ||||
|         YouTubeTranscriptApi.list_transcripts.assert_any_call('v2', proxies=None) | ||||
| 
 | ||||
|     def test_run__json_output(self): | ||||
|         YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([{'boolean': True}], [])) | ||||
|         output = YouTubeTranscriptCli('v1 v2 --languages de en --json'.split()).run() | ||||
| 
 | ||||
|         # will fail if output is not valid json | ||||
|         json.loads(output) | ||||
| 
 | ||||
|     def test_run__proxies(self): | ||||
|         YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([], [])) | ||||
|         YouTubeTranscriptCli( | ||||
|             'v1 v2 --languages de en --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port'.split()).run() | ||||
|             ( | ||||
|                 'v1 v2 --languages de en ' | ||||
|                 '--http-proxy http://user:pass@domain:port ' | ||||
|                 '--https-proxy https://user:pass@domain:port' | ||||
|             ).split() | ||||
|         ).run() | ||||
| 
 | ||||
|         YouTubeTranscriptApi.get_transcripts.assert_called_once_with( | ||||
|             ['v1', 'v2'], | ||||
|             languages=['de', 'en'], | ||||
|             continue_after_error=True, | ||||
|         YouTubeTranscriptApi.list_transcripts.assert_any_call( | ||||
|             'v1', | ||||
|             proxies={'http': 'http://user:pass@domain:port', 'https': 'https://user:pass@domain:port'} | ||||
|         ) | ||||
| 
 | ||||
|         YouTubeTranscriptApi.list_transcripts.assert_any_call( | ||||
|             'v2', | ||||
|             proxies={'http': 'http://user:pass@domain:port', 'https': 'https://user:pass@domain:port'} | ||||
|         ) | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue