Merge pull request #42 from jdepoix/feature/translating-transcripts

Feature/translating transcripts
2019-12-30 23:49:20 +01:00 · 2019-12-30 23:49:20 +01:00 · 68951600d9
parent a1b1e001fe 889cedcbf0
commit 68951600d9
9 changed files with 1941 additions and 174 deletions
--- a/README.md
+++ b/README.md
@ -1,4 +1,5 @@
-# YouTube Transcript/Subtitle API (including automatically generated subtitles)
+
 # YouTube Transcript/Subtitle API (including automatically generated subtitles and subtitle translations)  
 [![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)  
 [![Build Status](https://travis-ci.org/jdepoix/youtube-transcript-api.svg)](https://travis-ci.org/jdepoix/youtube-transcript-api)  
@ -7,7 +8,7 @@
 [![image](https://img.shields.io/pypi/v/youtube-transcript-api.svg)](https://pypi.org/project/youtube-transcript-api/)  
 [![image](https://img.shields.io/pypi/pyversions/youtube-transcript-api.svg)](https://pypi.org/project/youtube-transcript-api/)  
-This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require a headless browser, like other selenium based solutions do!
+This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!  
 ## Install  
@ -23,13 +24,11 @@ If you want to use it from source, you'll have to install the dependencies manua
 pip install -r requirements.txt  
 ```  
-## How to use it
+You can either integrate this module [into an existing application](#api), or just use it via an [CLI](#cli).
-You could either integrate this module into an existing application, or just use it via an CLI
+## API
-### In code
+The easiest way to get a transcript for a given video is to execute:  
 To get a transcript for a given video you can do:
 ```python  
 from youtube_transcript_api import YouTubeTranscriptApi  
@ -61,9 +60,9 @@ You can also add the `languages` param if you want to make sure the transcripts
 YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])  
 ```  
-It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. As I can't provide a complete list of all working language codes with full certainty, you may have to play around with the language codes a bit, to find the one which is working for you!
+It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out which languages are available first, [have a look at `list_transcripts()`](#list-available-transcripts)
-To get transcripts for a list fo video ids you can call:
+To get transcripts for a list of video ids you can call:  
 ```python  
 YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])  
@ -71,7 +70,100 @@ YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
 `languages` also is optional here.  
-### CLI
+### List available transcripts
 If you want to list all transcripts which are available for a given video you can call:
 ```python
 transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, languages=['de', 'en'])
 ```
 This will return a `TranscriptList` object  which is iterable and provides methods to filter the list of transcripts for specific languages and types, like:
 ```python
 transcript = transcript_list.find_transcript(['de', 'en'])  
 ```
 By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this default behaviour by searching for specific transcript types:
 ```python
 # filter for manually created transcripts  
 transcript = transcript_list.find_manually_created_transcript(['de', 'en'])  
 # or automatically generated ones  
 transcript = transcript_list.find_generated_transcript(['de', 'en'])
 ```
 The methods `find_generated_transcript`, `find_manually_created_transcript`, `find_generated_transcript` return `Transcript` objects. They contain metadata regarding the transcript:
 ```python
 print(
 	transcript.video_id, 
 	transcript.language, 
 	transcript.language_code,
 	# whether it has been manually created or generated by YouTube 
 	transcript.is_generated,
 	# whether this transcript can be translated or not
 	transcript.is_translatable,
 	# a list of languages the transcript can be translated to 
 	transcript.translation_languages, 
 )
 ```
 and provide the method, which allows you to fetch the actual transcript data:
 ```python
 transcript.fetch()
 ```
 ### Translate transcript 
 YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated `Transcript` object:
 ```python
 transcript = transcript_list.find_transcript(['en']) 
 translated_transcript = transcript.translate('de')
 print(translated_transcript.fetch())
 ```
 ### By example
 ```python
 # retrieve the available transcripts  
 transcript_list = YouTubeTranscriptApi.get('video_id')  
 # iterate over all available transcripts
 for transcript in transcript_list:
    # the Transcript object provides metadata properties
 	print(
        transcript.video_id,
        transcript.language,
 		transcript.language_code,
 		# whether it has been manually created or generated by YouTube 
 		transcript.is_generated,
 		# whether this transcript can be translated or not
 		transcript.is_translatable,
 		# a list of languages the transcript can be translated to 
 		transcript.translation_languages, 
 	)
 	# fetch the actual transcript data 
 	print(transcript.fetch())  
 	# translating the transcript will return another transcript object
 	print(transcript.translate('en').fetch())  
 # you can also directly filter for the language you are looking for, using the transcript list
 transcript = transcript_list.find_transcript(['de', 'en'])  
 # or just filter for manually created transcripts  
 transcript = transcript_list.find_manually_created_transcript(['de', 'en'])  
 # or automatically generated ones  
 transcript = transcript_list.find_generated_transcript(['de', 'en'])
 ```
 ## CLI  
 Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:  
@ -85,13 +177,32 @@ The CLI also gives you the option to provide a list of preferred languages:
 youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en  
 ```
 You can also specify if you want to exclude automatically generated or manually created subtitles:
 ```  
 youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-generated
 youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-manually-created
 ```
 If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line:  
 ```  
 youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json  
 ```  
-### Proxy
+Translating transcripts using the CLI is also possible:
 ```  
 youtube_transcript_api <first_video_id> <second_video_id> ... --languages en --translate de
 ```  
 If you are not sure which languages are available for a given video you can call, to list all available transcripts:
 ```  
 youtube_transcript_api --list-transcripts <first_video_id>
 ```  
 ## Proxy  
 You can specify a https/http proxy, which will be used during the requests to YouTube:  
--- a/youtube_transcript_api/init.py
+++ b/youtube_transcript_api/init.py
@ -1,3 +1,11 @@
 from ._api import YouTubeTranscriptApi
 from ._transcripts import TranscriptList, Transcript
-from ._errors import TranscriptsDisabled, NoTranscriptFound, CouldNotRetrieveTranscript, VideoUnavailable
+from ._errors import (
    TranscriptsDisabled,
    NoTranscriptFound,
    CouldNotRetrieveTranscript,
    VideoUnavailable,
    NotTranslatable,
    TranslationLanguageNotAvailable,
    NoTranscriptAvailable,
 )
--- a/youtube_transcript_api/_api.py
+++ b/youtube_transcript_api/_api.py
@ -4,17 +4,68 @@ from ._transcripts import TranscriptListFetcher
 class YouTubeTranscriptApi():
    @classmethod
    def list_transcripts(cls, video_id, proxies=None):
        """
        Retrieves the list of transcripts which are available for a given video. It returns a `TranscriptList` object
        which is iterable and provides methods to filter the list of transcripts for specific languages. While iterating
        over the `TranscriptList` the individual transcripts are represented by `Transcript` objects, which provide
        metadata and can either be fetched by calling `transcript.fetch()` or translated by calling
        `transcript.translate('en')`. Example::
            # retrieve the available transcripts
            transcript_list = YouTubeTranscriptApi.get('video_id')
            # iterate over all available transcripts
            for transcript in transcript_list:
                # the Transcript object provides metadata properties
                print(
                    transcript.video_id,
                    transcript.language,
                    transcript.language_code,
                    # whether it has been manually created or generated by YouTube
                    transcript.is_generated,
                    # a list of languages the transcript can be translated to
                    transcript.translation_languages,
                )
                # fetch the actual transcript data
                print(transcript.fetch())
                # translating the transcript will return another transcript object
                print(transcript.translate('en').fetch())
            # you can also directly filter for the language you are looking for, using the transcript list
            transcript = transcript_list.find_transcript(['de', 'en'])
            # or just filter for manually created transcripts
            transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
            # or automatically generated ones
            transcript = transcript_list.find_generated_transcript(['de', 'en'])
        :param video_id: the youtube video id
        :type video_id: str
        :param proxies: a dictionary mapping of http and https proxies to be used for the network requests
        :type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies
        :return: the list of available transcripts
        :rtype TranscriptList:
        """
        with requests.Session() as http_client:
            http_client.proxies = proxies if proxies else {}
            return TranscriptListFetcher(http_client).fetch(video_id)
    @classmethod
    def get_transcripts(cls, video_ids, languages=('en',), continue_after_error=False, proxies=None):
        """
        Retrieves the transcripts for a list of videos.
        :param video_ids: a list of youtube video ids
-        :type video_ids: [str]
+        :type video_ids: list[str]
        :param languages: A list of language codes in a descending priority. For example, if this is set to ['de', 'en']
        it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails to
        do so.
-        :type languages: [str]
+        :type languages: list[str]
        :param continue_after_error: if this is set the execution won't be stopped, if an error occurs while retrieving
        one of the video transcripts
        :type continue_after_error: bool
@ -22,7 +73,7 @@ class YouTubeTranscriptApi():
        :type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies
        :return: a tuple containing a dictionary mapping video ids onto their corresponding transcripts, and a list of
        video ids, which could not be retrieved
-        :rtype: ({str: [{'text': str, 'start': float, 'end': float}]}, [str]})
+        :rtype ({str: [{'text': str, 'start': float, 'end': float}]}, [str]}):
        """
        data = {}
        unretrievable_videos = []
@ -41,19 +92,19 @@ class YouTubeTranscriptApi():
    @classmethod
    def get_transcript(cls, video_id, languages=('en',), proxies=None):
        """
-        Retrieves the transcript for a single video.
+        Retrieves the transcript for a single video. This is just a shortcut for calling::
            YouTubeTranscriptApi.list_transcripts(video_id, proxies).find_transcript(languages).fetch()
        :param video_id: the youtube video id
        :type video_id: str
        :param languages: A list of language codes in a descending priority. For example, if this is set to ['de', 'en']
        it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails to
        do so.
-        :type languages: [str]
+        :type languages: list[str]
        :param proxies: a dictionary mapping of http and https proxies to be used for the network requests
        :type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies
        :return: a list of dictionaries containing the 'text', 'start' and 'duration' keys
-        :rtype: [{'text': str, 'start': float, 'end': float}]
+        :rtype [{'text': str, 'start': float, 'end': float}]:
        """
-        with requests.Session() as http_client:
+        return cls.list_transcripts(video_id, proxies).find_transcript(languages).fetch()
            http_client.proxies = proxies if proxies else {}
            return TranscriptListFetcher(http_client).fetch(video_id).find_transcript(languages).fetch()
--- a/youtube_transcript_api/_cli.py
+++ b/youtube_transcript_api/_cli.py
@ -14,22 +14,45 @@ class YouTubeTranscriptCli():
    def run(self):
        parsed_args = self._parse_args()
        if parsed_args.exclude_manually_created and parsed_args.exclude_generated:
            return ''
        proxies = None
        if parsed_args.http_proxy != '' or parsed_args.https_proxy != '':
            proxies = {"http": parsed_args.http_proxy, "https": parsed_args.https_proxy}
-        transcripts, unretrievable_videos = YouTubeTranscriptApi.get_transcripts(
+        transcripts = []
-            parsed_args.video_ids,
+        exceptions = []
-            languages=parsed_args.languages,
+
-            continue_after_error=True,
+        for video_id in parsed_args.video_ids:
-            proxies=proxies
+            try:
-        )
+                transcripts.append(self._fetch_transcript(parsed_args, proxies, video_id))
            except Exception as exception:
                exceptions.append(exception)
        return '\n\n'.join(
-            [str(YouTubeTranscriptApi.CouldNotRetrieveTranscript(video_id)) for video_id in unretrievable_videos]
+            [str(exception) for exception in exceptions]
            + ([json.dumps(transcripts) if parsed_args.json else pprint.pformat(transcripts)] if transcripts else [])
        )
    def _fetch_transcript(self, parsed_args, proxies, video_id):
        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, proxies=proxies)
        if parsed_args.list_transcripts:
            return str(transcript_list)
        if parsed_args.exclude_manually_created:
            transcript = transcript_list.find_generated_transcript(parsed_args.languages)
        elif parsed_args.exclude_generated:
            transcript = transcript_list.find_manually_created_transcript(parsed_args.languages)
        else:
            transcript = transcript_list.find_transcript(parsed_args.languages)
        if parsed_args.translate:
            transcript = transcript.translate(parsed_args.translate)
        return transcript.fetch()
    def _parse_args(self):
        parser = argparse.ArgumentParser(
            description=(
@ -38,6 +61,13 @@ class YouTubeTranscriptCli():
                'other selenium based solutions do!'
            )
        )
        parser.add_argument(
            '--list-transcripts',
            action='store_const',
            const=True,
            default=False,
            help='This will list the languages in which the given videos are available in.',
        )
        parser.add_argument('video_ids', nargs='+', type=str, help='List of YouTube video IDs.')
        parser.add_argument(
            '--languages',
@ -46,11 +76,25 @@ class YouTubeTranscriptCli():
            type=str,
            help=(
                'A list of language codes in a descending priority. For example, if this is set to "de en" it will '
-                'first try to fetch the german transcript (de) and then fetch the english transcipt (en) if it fails '
+                'first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails '
                'to do so. As I can\'t provide a complete list of all working language codes with full certainty, you '
                'may have to play around with the language codes a bit, to find the one which is working for you!'
            ),
        )
        parser.add_argument(
            '--exclude-generated',
            action='store_const',
            const=True,
            default=False,
            help='If this flag is set transcripts which have been generated by YouTube will not be retrieved.',
        )
        parser.add_argument(
            '--exclude-manually-created',
            action='store_const',
            const=True,
            default=False,
            help='If this flag is set transcripts which have been manually created will not be retrieved.',
        )
        parser.add_argument(
            '--json',
            action='store_const',
@ -59,13 +103,24 @@ class YouTubeTranscriptCli():
            help='If this flag is set the output will be JSON formatted.',
        )
        parser.add_argument(
-            '--http-proxy', dest='http_proxy',
+            '--translate',
-            default='', metavar='URL',
+            default='',
            help=(
                'The language code for the language you want this transcript to be translated to. Use the '
                '--list-transcripts feature to find out which languages are translatable and which translation '
                'languages are available.'
            )
        )
        parser.add_argument(
            '--http-proxy',
            default='',
            metavar='URL',
            help='Use the specified HTTP proxy.'
        )
        parser.add_argument(
-            '--https-proxy', dest='https_proxy',
+            '--https-proxy',
-            default='', metavar='URL',
+            default='',
            metavar='URL',
            help='Use the specified HTTPS proxy.'
        )
--- a/youtube_transcript_api/_errors.py
+++ b/youtube_transcript_api/_errors.py
@ -43,6 +43,18 @@ class TranscriptsDisabled(CouldNotRetrieveTranscript):
    CAUSE_MESSAGE = 'Subtitles are disabled for this video'
 class NoTranscriptAvailable(CouldNotRetrieveTranscript):
    CAUSE_MESSAGE = 'No transcripts are available for this video'
 class NotTranslatable(CouldNotRetrieveTranscript):
    CAUSE_MESSAGE = 'The requested language is not translatable'
 class TranslationLanguageNotAvailable(CouldNotRetrieveTranscript):
    CAUSE_MESSAGE = 'The requested translation language is not available'
 class NoTranscriptFound(CouldNotRetrieveTranscript):
    CAUSE_MESSAGE = (
        'No transcripts were found for any of the requested language codes: {requested_language_codes}\n\n'
--- a/youtube_transcript_api/_transcripts.py
+++ b/youtube_transcript_api/_transcripts.py
@ -12,7 +12,14 @@ from xml.etree import ElementTree
 import re
 from ._html_unescaping import unescape
-from ._errors import VideoUnavailable, NoTranscriptFound, TranscriptsDisabled
+from ._errors import (
    VideoUnavailable,
    NoTranscriptFound,
    TranscriptsDisabled,
    NotTranslatable,
    TranslationLanguageNotAvailable,
    NoTranscriptAvailable,
 )
 from ._settings import WATCH_URL
@ -36,9 +43,14 @@ class TranscriptListFetcher():
            raise TranscriptsDisabled(video_id)
-        return json.loads(splitted_html[1].split(',"videoDetails')[0].replace('\n', ''))[
+        captions_json = json.loads(
-            'playerCaptionsTracklistRenderer'
+            splitted_html[1].split(',"videoDetails')[0].replace('\n', '')
-        ]
+        )['playerCaptionsTracklistRenderer']
        if 'captionTracks' not in captions_json:
            raise NoTranscriptAvailable(video_id)
        return captions_json
    def _fetch_html(self, video_id):
        return self._http_client.get(WATCH_URL.format(video_id=video_id)).text.replace(
@ -53,10 +65,7 @@ class TranscriptList():
    This object represents a list of transcripts. It can be iterated over to list all transcripts which are available
    for a given YouTube video. Also it provides functionality to search for a transcript in a given language.
    """
-
+    def __init__(self, video_id, manually_created_transcripts, generated_transcripts, translation_languages):
    # TODO implement iterator
    def __init__(self, video_id, manually_created_transcripts, generated_transcripts):
        """
        The constructor is only for internal use. Use the static build method instead.
@ -66,10 +75,13 @@ class TranscriptList():
        :type manually_created_transcripts: dict[str, Transcript]
        :param generated_transcripts: dict mapping language codes to the generated transcripts
        :type generated_transcripts: dict[str, Transcript]
        :param translation_languages: list of languages which can be used for translatable languages
        :type translation_languages: list[dict[str, str]]
        """
        self.video_id = video_id
        self._manually_created_transcripts = manually_created_transcripts
        self._generated_transcripts = generated_transcripts
        self._translation_languages = translation_languages
    @staticmethod
    def build(http_client, video_id, captions_json):
@ -83,7 +95,7 @@ class TranscriptList():
        :param captions_json: the JSON parsed from the YouTube pages static HTML
        :type captions_json: dict
        :return: the created TranscriptList
-        :rtype TranscriptList
+        :rtype TranscriptList:
        """
        translation_languages = [
            {
@ -108,15 +120,19 @@ class TranscriptList():
                caption['name']['simpleText'],
                caption['languageCode'],
                caption.get('kind', '') == 'asr',
-                translation_languages if caption['isTranslatable'] else []
+                translation_languages if caption.get('isTranslatable', False) else []
            )
        return TranscriptList(
            video_id,
            manually_created_transcripts,
            generated_transcripts,
            translation_languages,
        )
    def __iter__(self):
        return iter(list(self._manually_created_transcripts.values()) + list(self._generated_transcripts.values()))
    def find_transcript(self, language_codes):
        """
        Finds a transcript for a given language code. Manually created transcripts are returned first and only if none
@ -126,9 +142,9 @@ class TranscriptList():
        :param language_codes: A list of language codes in a descending priority. For example, if this is set to
        ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if
        it fails to do so.
-        :type languages: [str]
+        :type languages: list[str]
        :return: the found Transcript
-        :rtype: Transcript
+        :rtype Transcript:
        :raises: NoTranscriptFound
        """
        return self._find_transcript(language_codes, [self._manually_created_transcripts, self._generated_transcripts])
@ -140,9 +156,9 @@ class TranscriptList():
        :param language_codes: A list of language codes in a descending priority. For example, if this is set to
        ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if
        it fails to do so.
-        :type languages: [str]
+        :type languages: list[str]
        :return: the found Transcript
-        :rtype: Transcript
+        :rtype Transcript:
        :raises: NoTranscriptFound
        """
        return self._find_transcript(language_codes, [self._generated_transcripts,])
@ -154,9 +170,9 @@ class TranscriptList():
        :param language_codes: A list of language codes in a descending priority. For example, if this is set to
        ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if
        it fails to do so.
-        :type languages: [str]
+        :type languages: list[str]
        :return: the found Transcript
-        :rtype: Transcript
+        :rtype Transcript:
        :raises: NoTranscriptFound
        """
        return self._find_transcript(language_codes, [self._manually_created_transcripts,])
@ -179,22 +195,28 @@ class TranscriptList():
            '(MANUALLY CREATED)\n'
            '{available_manually_created_transcript_languages}\n\n'
            '(GENERATED)\n'
-            '{available_generated_transcripts}'
+            '{available_generated_transcripts}\n\n'
            '(TRANSLATION LANGUAGES)\n'
            '{available_translation_languages}'
        ).format(
            video_id=self.video_id,
            available_manually_created_transcript_languages=self._get_language_description(
-                self._manually_created_transcripts.values()
+                str(transcript) for transcript in self._manually_created_transcripts.values()
            ),
            available_generated_transcripts=self._get_language_description(
-                self._generated_transcripts.values()
+                str(transcript) for transcript in self._generated_transcripts.values()
            ),
            available_translation_languages=self._get_language_description(
                '{language_code} ("{language}")'.format(
                    language=translation_language['language'],
                    language_code=translation_language['language_code'],
                ) for translation_language in self._translation_languages
            )
        )
-    def _get_language_description(self, transcripts):
+    def _get_language_description(self, transcript_strings):
-        return '\n'.join(
+        description = '\n'.join(' - {transcript}'.format(transcript=transcript) for transcript in transcript_strings)
-            ' - {transcript}'.format(transcript=str(transcript))
+        return description if description else 'None'
            for transcript in transcripts
        ) if transcripts else 'None'
 class Transcript():
@ -220,45 +242,49 @@ class Transcript():
        self.language_code = language_code
        self.is_generated = is_generated
        self.translation_languages = translation_languages
        self._translation_languages_dict = {
            translation_language['language_code']: translation_language['language']
            for translation_language in translation_languages
        }
    def fetch(self):
        """
        Loads the actual transcript data.
        :return: a list of dictionaries containing the 'text', 'start' and 'duration' keys
-        :rtype: [{'text': str, 'start': float, 'end': float}]
+        :rtype [{'text': str, 'start': float, 'end': float}]:
        """
        return _TranscriptParser().parse(
            self._http_client.get(self._url).text
        )
    def __str__(self):
-        return '{language_code} ("{language}")'.format(
+        return '{language_code} ("{language}"){translation_description}'.format(
            language=self.language,
            language_code=self.language_code,
            translation_description='[TRANSLATABLE]' if self.is_translatable else ''
        )
-# TODO integrate translations in future release
+    @property
-#     @property
+    def is_translatable(self):
-#     def is_translatable(self):
+        return len(self.translation_languages) > 0
-#         return len(self.translation_languages) > 0
+
-#
+    def translate(self, language_code):
-#
+        if not self.is_translatable:
-# class TranslatableTranscript(Transcript):
+            raise NotTranslatable(self.video_id)
-#     def __init__(self, http_client, url, translation_languages):
+
-#         super(TranslatableTranscript, self).__init__(http_client, url)
+        if language_code not in self._translation_languages_dict:
-#         self._translation_languages = translation_languages
+            raise TranslationLanguageNotAvailable(self.video_id)
-#         self._translation_language_codes = {language['language_code'] for language in translation_languages}
+
-#
+        return Transcript(
-#
+            self._http_client,
-#     def translate(self, language_code):
+            self.video_id,
-#         if language_code not in self._translation_language_codes:
+            '{url}&tlang={language_code}'.format(url=self._url, language_code=language_code),
-#             raise TranslatableTranscript.TranslationLanguageNotAvailable()
+            self._translation_languages_dict[language_code],
-#
+            language_code,
-#         return Transcript(
+            True,
-#             self._http_client,
+            [],
-#             '{url}&tlang={language_code}'.format(url=self._url, language_code=language_code)
+        )
 #         )
 class _TranscriptParser():
@ -269,7 +295,7 @@ class _TranscriptParser():
            {
                'text': re.sub(self.HTML_TAG_REGEX, '', unescape(xml_element.text)),
                'start': float(xml_element.attrib['start']),
-                'duration': float(xml_element.attrib['dur']),
+                'duration': float(xml_element.attrib.get('dur', '0.0')),
            }
            for xml_element in ElementTree.fromstring(plain_data)
            if xml_element.text is not None
--- a/youtube_transcript_api/test/assets/youtube_no_transcript_available.html.static
+++ b/youtube_transcript_api/test/assets/youtube_no_transcript_available.html.static
--- a/youtube_transcript_api/test/test_api.py
+++ b/youtube_transcript_api/test/test_api.py
@ -5,7 +5,15 @@ import os
 import httpretty
-from youtube_transcript_api import YouTubeTranscriptApi, VideoUnavailable, NoTranscriptFound, TranscriptsDisabled
+from youtube_transcript_api import (
    YouTubeTranscriptApi,
    TranscriptsDisabled,
    NoTranscriptFound,
    VideoUnavailable,
    NoTranscriptAvailable,
    NotTranslatable,
    TranslationLanguageNotAvailable,
 )
 def load_asset(filename):
@ -42,6 +50,51 @@ class TestYouTubeTranscriptApi(TestCase):
            ]
        )
    def test_list_transcripts(self):
        transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8')
        language_codes = {transcript.language_code for transcript in transcript_list}
        self.assertEqual(language_codes, {'zh', 'de', 'en', 'hi', 'ja', 'ko', 'es', 'cs', 'en'})
    def test_list_transcripts__find_manually_created(self):
        transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8')
        transcript = transcript_list.find_manually_created_transcript(['cs'])
        self.assertFalse(transcript.is_generated)
    def test_list_transcripts__find_generated(self):
        transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8')
        with self.assertRaises(NoTranscriptFound):
            transcript_list.find_generated_transcript(['cs'])
        transcript = transcript_list.find_generated_transcript(['en'])
        self.assertTrue(transcript.is_generated)
    def test_translate_transcript(self):
        transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en'])
        translated_transcript = transcript.translate('af')
        self.assertEqual(translated_transcript.language_code, 'af')
        self.assertIn('&tlang=af', translated_transcript._url)
    def test_translate_transcript__translation_language_not_available(self):
        transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en'])
        with self.assertRaises(TranslationLanguageNotAvailable):
            transcript.translate('xyz')
    def test_translate_transcript__not_translatable(self):
        transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en'])
        transcript.translation_languages = []
        with self.assertRaises(NotTranslatable):
            transcript.translate('af')
    def test_get_transcript__correct_language_is_used(self):
        YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8', ['de', 'en'])
        query_string = httpretty.last_request().querystring
@ -88,6 +141,16 @@ class TestYouTubeTranscriptApi(TestCase):
        with self.assertRaises(NoTranscriptFound):
            YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8', languages=['cz'])
    def test_get_transcript__exception_if_no_transcript_available(self):
        httpretty.register_uri(
            httpretty.GET,
            'https://www.youtube.com/watch',
            body=load_asset('youtube_no_transcript_available.html.static')
        )
        with self.assertRaises(NoTranscriptAvailable):
            YouTubeTranscriptApi.get_transcript('MwBPvcYFY2E')
    def test_get_transcripts(self):
        video_id_1 = 'video_id_1'
        video_id_2 = 'video_id_2'
--- a/youtube_transcript_api/test/test_cli.py
+++ b/youtube_transcript_api/test/test_cli.py
@ -3,10 +3,27 @@ from mock import MagicMock
 import json
-from youtube_transcript_api._cli import YouTubeTranscriptCli, YouTubeTranscriptApi
+from youtube_transcript_api import YouTubeTranscriptApi, VideoUnavailable
 from youtube_transcript_api._cli import YouTubeTranscriptCli
 class TestYouTubeTranscriptCli(TestCase):
    def setUp(self):
        self.transcript_mock = MagicMock()
        self.transcript_mock.fetch = MagicMock(return_value=[
            {'text': 'Hey, this is just a test', 'start': 0.0, 'duration': 1.54},
            {'text': 'this is not the original transcript', 'start': 1.54, 'duration': 4.16},
            {'text': 'just something shorter, I made up for testing', 'start': 5.7, 'duration': 3.239}
        ])
        self.transcript_mock.translate = MagicMock(return_value=self.transcript_mock)
        self.transcript_list_mock = MagicMock()
        self.transcript_list_mock.find_generated_transcript = MagicMock(return_value=self.transcript_mock)
        self.transcript_list_mock.find_manually_created_transcript = MagicMock(return_value=self.transcript_mock)
        self.transcript_list_mock.find_transcript = MagicMock(return_value=self.transcript_mock)
        YouTubeTranscriptApi.list_transcripts = MagicMock(return_value=self.transcript_list_mock)
    def test_argument_parsing(self):
        parsed_args = YouTubeTranscriptCli('v1 v2 --json --languages de en'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
@ -106,32 +123,107 @@ class TestYouTubeTranscriptCli(TestCase):
        self.assertEqual(parsed_args.http_proxy, '')
        self.assertEqual(parsed_args.https_proxy, '')
    def test_argument_parsing__list_transcripts(self):
        parsed_args = YouTubeTranscriptCli('--list-transcripts v1 v2'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertTrue(parsed_args.list_transcripts)
        parsed_args = YouTubeTranscriptCli('v1 v2 --list-transcripts'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertTrue(parsed_args.list_transcripts)
    def test_argument_parsing__translate(self):
        parsed_args = YouTubeTranscriptCli('v1 v2 --languages de en --translate cz'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertEqual(parsed_args.json, False)
        self.assertEqual(parsed_args.languages, ['de', 'en'])
        self.assertEqual(parsed_args.translate, 'cz')
        parsed_args = YouTubeTranscriptCli('v1 v2 --translate cz --languages de en'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertEqual(parsed_args.json, False)
        self.assertEqual(parsed_args.languages, ['de', 'en'])
        self.assertEqual(parsed_args.translate, 'cz')
    def test_argument_parsing__manually_or_generated(self):
        parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-manually-created'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertTrue(parsed_args.exclude_manually_created)
        self.assertFalse(parsed_args.exclude_generated)
        parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-generated'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertFalse(parsed_args.exclude_manually_created)
        self.assertTrue(parsed_args.exclude_generated)
        parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-manually-created --exclude-generated'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertTrue(parsed_args.exclude_manually_created)
        self.assertTrue(parsed_args.exclude_generated)
    def test_run(self):
        YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([], []))
        YouTubeTranscriptCli('v1 v2 --languages de en'.split()).run()
-        YouTubeTranscriptApi.get_transcripts.assert_called_once_with(
+        YouTubeTranscriptApi.list_transcripts.assert_any_call('v1', proxies=None)
-            ['v1', 'v2'],
+        YouTubeTranscriptApi.list_transcripts.assert_any_call('v2', proxies=None)
-            languages=['de', 'en'],
+
-            continue_after_error=True,
+        self.transcript_list_mock.find_transcript.assert_any_call(['de', 'en'])
-            proxies=None
+
    def test_run__failing_transcripts(self):
        YouTubeTranscriptApi.list_transcripts = MagicMock(side_effect=VideoUnavailable('video_id'))
        output = YouTubeTranscriptCli('v1 --languages de en'.split()).run()
        self.assertEqual(output, str(VideoUnavailable('video_id')))
    def test_run__exclude_generated(self):
        YouTubeTranscriptCli('v1 v2 --languages de en --exclude-generated'.split()).run()
        self.transcript_list_mock.find_manually_created_transcript.assert_any_call(['de', 'en'])
    def test_run__exclude_manually_created(self):
        YouTubeTranscriptCli('v1 v2 --languages de en --exclude-manually-created'.split()).run()
        self.transcript_list_mock.find_generated_transcript.assert_any_call(['de', 'en'])
    def test_run__exclude_manually_created_and_generated(self):
        self.assertEqual(
            YouTubeTranscriptCli('v1 v2 --languages de en --exclude-manually-created --exclude-generated'.split()).run(),
            ''
        )
    def test_run__translate(self):
        YouTubeTranscriptCli('v1 v2 --languages de en --translate cz'.split()).run(),
        self.transcript_mock.translate.assert_any_call('cz')
    def test_run__list_transcripts(self):
        YouTubeTranscriptCli('--list-transcripts v1 v2'.split()).run()
        YouTubeTranscriptApi.list_transcripts.assert_any_call('v1', proxies=None)
        YouTubeTranscriptApi.list_transcripts.assert_any_call('v2', proxies=None)
    def test_run__json_output(self):
        YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([{'boolean': True}], []))
        output = YouTubeTranscriptCli('v1 v2 --languages de en --json'.split()).run()
        # will fail if output is not valid json
        json.loads(output)
    def test_run__proxies(self):
        YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([], []))
        YouTubeTranscriptCli(
-            'v1 v2 --languages de en --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port'.split()).run()
+            (
                'v1 v2 --languages de en '
                '--http-proxy http://user:pass@domain:port '
                '--https-proxy https://user:pass@domain:port'
            ).split()
        ).run()
-        YouTubeTranscriptApi.get_transcripts.assert_called_once_with(
+        YouTubeTranscriptApi.list_transcripts.assert_any_call(
-            ['v1', 'v2'],
+            'v1',
-            languages=['de', 'en'],
+            proxies={'http': 'http://user:pass@domain:port', 'https': 'https://user:pass@domain:port'}
-            continue_after_error=True,
+        )
        YouTubeTranscriptApi.list_transcripts.assert_any_call(
            'v2',
            proxies={'http': 'http://user:pass@domain:port', 'https': 'https://user:pass@domain:port'}
        )