Merge pull request #1 from jdepoix/master

Updating to current master
2020-01-09 18:56:45 -08:00 · 2020-01-09 18:56:45 -08:00 · edefeeaf1d
parent eee2b9ad01 7dfe20fde4
commit edefeeaf1d
14 changed files with 6707 additions and 210 deletions
--- a/README.md
+++ b/README.md
@ -1,13 +1,9 @@
-# YouTube Transcript/Subtitle API (including automatically generated subtitles)

-[![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)
-[![Build Status](https://travis-ci.org/jdepoix/youtube-transcript-api.svg)](https://travis-ci.org/jdepoix/youtube-transcript-api)
-[![Coverage Status](https://coveralls.io/repos/github/jdepoix/youtube-transcript-api/badge.svg?branch=master)](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master)
-[![MIT license](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](http://opensource.org/licenses/MIT)
-[![image](https://img.shields.io/pypi/v/youtube-transcript-api.svg)](https://pypi.org/project/youtube-transcript-api/)
-[![image](https://img.shields.io/pypi/pyversions/youtube-transcript-api.svg)](https://pypi.org/project/youtube-transcript-api/)
+# YouTube Transcript/Subtitle API (including automatically generated subtitles and subtitle translations)  
  
-This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require a headless browser, like other selenium based solutions do!
+[![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url) [![Build Status](https://travis-ci.org/jdepoix/youtube-transcript-api.svg)](https://travis-ci.org/jdepoix/youtube-transcript-api) [![Coverage Status](https://coveralls.io/repos/github/jdepoix/youtube-transcript-api/badge.svg?branch=master)](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master) [![MIT license](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](http://opensource.org/licenses/MIT) [![image](https://img.shields.io/pypi/v/youtube-transcript-api.svg)](https://pypi.org/project/youtube-transcript-api/) [![image](https://img.shields.io/pypi/pyversions/youtube-transcript-api.svg)](https://pypi.org/project/youtube-transcript-api/)
+
+This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!

 ## Install

@ -23,13 +19,11 @@ If you want to use it from source, you'll have to install the dependencies manua
 pip install -r requirements.txt
 ```

-## How to use it
+You can either integrate this module [into an existing application](#api), or just use it via an [CLI](#cli).

-You could either integrate this module into an existing application, or just use it via an CLI
+## API

-### In code
-
-To get a transcript for a given video you can do:
+The easiest way to get a transcript for a given video is to execute:

 ```python
 from youtube_transcript_api import YouTubeTranscriptApi
@ -55,15 +49,15 @@ This will return a list of dictionaries looking somewhat like this:
 ]
 ```

-You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it usually defaults to english).
+You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it defaults to english).

 ```python
 YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
 ```

-It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. As I can't provide a complete list of all working language codes with full certainty, you may have to play around with the language codes a bit, to find the one which is working for you!
+It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out which languages are available first, [have a look at `list_transcripts()`](#list-available-transcripts)

-To get transcripts for a list fo video ids you can call:
+To get transcripts for a list of video ids you can call:

 ```python
 YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
@ -71,7 +65,100 @@ YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])

 `languages` also is optional here.

-### CLI
+### List available transcripts
+
+If you want to list all transcripts which are available for a given video you can call:
+
+```python
+transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, languages=['de', 'en'])
+```
+
+This will return a `TranscriptList` object  which is iterable and provides methods to filter the list of transcripts for specific languages and types, like:
+
+```python
+transcript = transcript_list.find_transcript(['de', 'en'])
+```
+
+By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this default behaviour by searching for specific transcript types:
+
+```python
+# filter for manually created transcripts
+transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
+
+# or automatically generated ones
+transcript = transcript_list.find_generated_transcript(['de', 'en'])
+```
+
+The methods `find_generated_transcript`, `find_manually_created_transcript`, `find_generated_transcript` return `Transcript` objects. They contain metadata regarding the transcript:
+
+```python
+print(
+    transcript.video_id,
+    transcript.language,
+    transcript.language_code,
+    # whether it has been manually created or generated by YouTube
+    transcript.is_generated,
+    # whether this transcript can be translated or not
+    transcript.is_translatable,
+    # a list of languages the transcript can be translated to
+    transcript.translation_languages,
+)
+```
+
+and provide the method, which allows you to fetch the actual transcript data:
+
+```python
+transcript.fetch()
+```
+
+### Translate transcript
+
+YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated `Transcript` object:
+
+```python
+transcript = transcript_list.find_transcript(['en'])
+translated_transcript = transcript.translate('de')
+print(translated_transcript.fetch())
+```
+
+### By example
+```python
+# retrieve the available transcripts
+transcript_list = YouTubeTranscriptApi.get('video_id')
+
+# iterate over all available transcripts
+for transcript in transcript_list:
+
+    # the Transcript object provides metadata properties
+    print(
+        transcript.video_id,
+        transcript.language,
+        transcript.language_code,
+        # whether it has been manually created or generated by YouTube
+        transcript.is_generated,
+        # whether this transcript can be translated or not
+        transcript.is_translatable,
+        # a list of languages the transcript can be translated to
+        transcript.translation_languages,
+    )
+
+    # fetch the actual transcript data
+    print(transcript.fetch())
+
+    # translating the transcript will return another transcript object
+    print(transcript.translate('en').fetch())
+	
+# you can also directly filter for the language you are looking for, using the transcript list
+transcript = transcript_list.find_transcript(['de', 'en'])  
+  
+# or just filter for manually created transcripts  
+transcript = transcript_list.find_manually_created_transcript(['de', 'en'])  
+  
+# or automatically generated ones  
+transcript = transcript_list.find_generated_transcript(['de', 'en'])
+```
+  
+## CLI  
  
 Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:  
  
@ -85,13 +172,32 @@ The CLI also gives you the option to provide a list of preferred languages:
 youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en  
 ```

+You can also specify if you want to exclude automatically generated or manually created subtitles:
+
+```  
+youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-generated
+youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-manually-created
+```
+  
 If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line:  
  
 ```  
 youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json  
 ```  

-### Proxy
+Translating transcripts using the CLI is also possible:
+
+```  
+youtube_transcript_api <first_video_id> <second_video_id> ... --languages en --translate de
+```  
+
+If you are not sure which languages are available for a given video you can call, to list all available transcripts:
+
+```  
+youtube_transcript_api --list-transcripts <first_video_id>
+```  
+  
+## Proxy  
  
 You can specify a https/http proxy, which will be used during the requests to YouTube:  
  
--- a/setup.py
+++ b/setup.py
@ -24,7 +24,7 @@ def get_test_suite():

 setuptools.setup(
    name="youtube_transcript_api",
-    version="0.1.9",
+    version="0.2.1",
    author="Jonas Depoix",
    author_email="jonas.depoix@web.de",
    description="This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require a headless browser, like other selenium based solutions do!",
--- a/youtube_transcript_api/init.py
+++ b/youtube_transcript_api/init.py
@ -1 +1,11 @@
 from ._api import YouTubeTranscriptApi
+from ._transcripts import TranscriptList, Transcript
+from ._errors import (
+    TranscriptsDisabled,
+    NoTranscriptFound,
+    CouldNotRetrieveTranscript,
+    VideoUnavailable,
+    NotTranslatable,
+    TranslationLanguageNotAvailable,
+    NoTranscriptAvailable,
+)
--- a/youtube_transcript_api/_api.py
+++ b/youtube_transcript_api/_api.py
@ -1,56 +1,71 @@
-import sys
-
-# This can only be tested by using different python versions, therefore it is not covered by coverage.py
-if sys.version_info.major == 2: # pragma: no cover
-    reload(sys)
-    sys.setdefaultencoding('utf-8')
-
-from xml.etree import ElementTree
-
-import re
-
 import requests

-from ._html_unescaping import unescape
+from ._transcripts import TranscriptListFetcher


 class YouTubeTranscriptApi():
-    class CouldNotRetrieveTranscript(Exception):
+    @classmethod
+    def list_transcripts(cls, video_id, proxies=None):
        """
-        Raised if a transcript could not be retrieved.
+        Retrieves the list of transcripts which are available for a given video. It returns a `TranscriptList` object
+        which is iterable and provides methods to filter the list of transcripts for specific languages. While iterating
+        over the `TranscriptList` the individual transcripts are represented by `Transcript` objects, which provide
+        metadata and can either be fetched by calling `transcript.fetch()` or translated by calling
+        `transcript.translate('en')`. Example::
+
+            # retrieve the available transcripts
+            transcript_list = YouTubeTranscriptApi.get('video_id')
+
+            # iterate over all available transcripts
+            for transcript in transcript_list:
+                # the Transcript object provides metadata properties
+                print(
+                    transcript.video_id,
+                    transcript.language,
+                    transcript.language_code,
+                    # whether it has been manually created or generated by YouTube
+                    transcript.is_generated,
+                    # a list of languages the transcript can be translated to
+                    transcript.translation_languages,
+                )
+
+                # fetch the actual transcript data
+                print(transcript.fetch())
+
+                # translating the transcript will return another transcript object
+                print(transcript.translate('en').fetch())
+
+            # you can also directly filter for the language you are looking for, using the transcript list
+            transcript = transcript_list.find_transcript(['de', 'en'])
+
+            # or just filter for manually created transcripts
+            transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
+
+            # or automatically generated ones
+            transcript = transcript_list.find_generated_transcript(['de', 'en'])
+
+        :param video_id: the youtube video id
+        :type video_id: str
+        :param proxies: a dictionary mapping of http and https proxies to be used for the network requests
+        :type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies
+        :return: the list of available transcripts
+        :rtype TranscriptList:
        """
-
-        ERROR_MESSAGE = (
-            'Could not get the transcript for the video {video_url}! '
-            'This usually happens if one of the following things is the case:\n'
-            ' - subtitles have been disabled by the uploader\n'
-            ' - none of the language codes you provided are valid\n'
-            ' - none of the languages you provided are supported by the video\n'
-            ' - the video is no longer available.\n\n'
-            'If none of these things is the case, please create an issue at '
-            'https://github.com/jdepoix/youtube-transcript-api/issues.'
-            'Please add which version of youtube_transcript_api you are using and make sure that there '
-            'are no open issues which already describe your problem!'
-        )
-
-        def __init__(self, video_id):
-            super(YouTubeTranscriptApi.CouldNotRetrieveTranscript, self).__init__(
-                self.ERROR_MESSAGE.format(video_url=_TranscriptFetcher.WATCH_URL.format(video_id=video_id))
-            )
-            self.video_id = video_id
+        with requests.Session() as http_client:
+            http_client.proxies = proxies if proxies else {}
+            return TranscriptListFetcher(http_client).fetch(video_id)

    @classmethod
-    def get_transcripts(cls, video_ids, languages=None, continue_after_error=False, proxies=None):
+    def get_transcripts(cls, video_ids, languages=('en',), continue_after_error=False, proxies=None):
        """
        Retrieves the transcripts for a list of videos.

        :param video_ids: a list of youtube video ids
-        :type video_ids: [str]
+        :type video_ids: list[str]
        :param languages: A list of language codes in a descending priority. For example, if this is set to ['de', 'en']
-        it will first try to fetch the german transcript (de) and then fetch the english transcipt (en) if it fails to
-        do so. As I can't provide a complete list of all working language codes with full certainty, you may have to
-        play around with the language codes a bit, to find the one which is working for you!
-        :type languages: [str]
+        it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails to
+        do so.
+        :type languages: list[str]
        :param continue_after_error: if this is set the execution won't be stopped, if an error occurs while retrieving
        one of the video transcripts
        :type continue_after_error: bool
@ -58,7 +73,7 @@ class YouTubeTranscriptApi():
        :type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies
        :return: a tuple containing a dictionary mapping video ids onto their corresponding transcripts, and a list of
        video ids, which could not be retrieved
-        :rtype: ({str: [{'text': str, 'start': float, 'end': float}]}, [str]}
+        :rtype ({str: [{'text': str, 'start': float, 'end': float}]}, [str]}):
        """
        data = {}
        unretrievable_videos = []
@ -75,90 +90,21 @@ class YouTubeTranscriptApi():
        return data, unretrievable_videos

    @classmethod
-    def get_transcript(cls, video_id, languages=None, proxies=None):
+    def get_transcript(cls, video_id, languages=('en',), proxies=None):
        """
-        Retrieves the transcript for a single video.
+        Retrieves the transcript for a single video. This is just a shortcut for calling::
+
+            YouTubeTranscriptApi.list_transcripts(video_id, proxies).find_transcript(languages).fetch()

        :param video_id: the youtube video id
        :type video_id: str
        :param languages: A list of language codes in a descending priority. For example, if this is set to ['de', 'en']
        it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails to
-        do so. As I can't provide a complete list of all working language codes with full certainty, you may have to
-        play around with the language codes a bit, to find the one which is working for you!
-        :type languages: [str]
+        do so.
+        :type languages: list[str]
        :param proxies: a dictionary mapping of http and https proxies to be used for the network requests
        :type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies
        :return: a list of dictionaries containing the 'text', 'start' and 'duration' keys
-        :rtype: [{'text': str, 'start': float, 'end': float}]
+        :rtype [{'text': str, 'start': float, 'end': float}]:
        """
-        try:
-            return _TranscriptParser(_TranscriptFetcher(video_id, languages, proxies).fetch()).parse()
-        except Exception:
-            raise YouTubeTranscriptApi.CouldNotRetrieveTranscript(video_id)
-
-
-class _TranscriptFetcher():
-    WATCH_URL = 'https://www.youtube.com/watch?v={video_id}'
-    API_BASE_URL = 'https://www.youtube.com/api/{api_url}'
-    LANGUAGE_REGEX = re.compile(r'(&lang=.*&)|(&lang=.*)')
-    TIMEDTEXT_STRING = 'timedtext?v='
-
-    def __init__(self, video_id, languages, proxies):
-        self.video_id = video_id
-        self.languages = languages
-        self.proxies = proxies
-
-    def fetch(self):
-        if self.proxies:
-            fetched_site = requests.get(self.WATCH_URL.format(video_id=self.video_id), proxies=self.proxies).text
-        else:
-            fetched_site = requests.get(self.WATCH_URL.format(video_id=self.video_id)).text
-        timedtext_splits = fetched_site.split(self.TIMEDTEXT_STRING)
-        timedtext_url_start = (
-            timedtext_splits[2].find(self.TIMEDTEXT_STRING)
-            + len(timedtext_splits[0])
-            + len(timedtext_splits[1])
-            + len(self.TIMEDTEXT_STRING) + 1
-        )
-
-        for language in (self.languages if self.languages else [None,]):
-            response = self._execute_api_request(fetched_site, timedtext_url_start, language)
-            if response:
-                return response
-
-        return None
-
-    def _execute_api_request(self, fetched_site, timedtext_url_start, language):
-        url = self.API_BASE_URL.format(
-            api_url=fetched_site[
-                timedtext_url_start:timedtext_url_start + fetched_site[timedtext_url_start:].find('"')
-            ].replace(
-                '\\u0026', '&'
-            ).replace(
-                '\\', ''
-            )
-        )
-        if language:
-            url = re.sub(self.LANGUAGE_REGEX, '&lang={language}&'.format(language=language), url)
-        if self.proxies:
-            return requests.get(url, proxies=self.proxies).text
-        else:
-            return requests.get(url).text
-
-
-class _TranscriptParser():
-    HTML_TAG_REGEX = re.compile(r'<[^>]*>', re.IGNORECASE)
-
-    def __init__(self, plain_data):
-        self.plain_data = plain_data
-
-    def parse(self):
-        return [
-            {
-                'text': re.sub(self.HTML_TAG_REGEX, '', unescape(xml_element.text)),
-                'start': float(xml_element.attrib['start']),
-                'duration': float(xml_element.attrib['dur']),
-            }
-            for xml_element in ElementTree.fromstring(self.plain_data)
-            if xml_element.text is not None
-        ]
+        return cls.list_transcripts(video_id, proxies).find_transcript(languages).fetch()
--- a/youtube_transcript_api/_cli.py
+++ b/youtube_transcript_api/_cli.py
@ -14,22 +14,45 @@ class YouTubeTranscriptCli():
    def run(self):
        parsed_args = self._parse_args()

+        if parsed_args.exclude_manually_created and parsed_args.exclude_generated:
+            return ''
+
        proxies = None
        if parsed_args.http_proxy != '' or parsed_args.https_proxy != '':
            proxies = {"http": parsed_args.http_proxy, "https": parsed_args.https_proxy}

-        transcripts, unretrievable_videos = YouTubeTranscriptApi.get_transcripts(
-            parsed_args.video_ids,
-            languages=parsed_args.languages,
-            continue_after_error=True,
-            proxies=proxies
-        )
+        transcripts = []
+        exceptions = []
+
+        for video_id in parsed_args.video_ids:
+            try:
+                transcripts.append(self._fetch_transcript(parsed_args, proxies, video_id))
+            except Exception as exception:
+                exceptions.append(exception)

        return '\n\n'.join(
-            [str(YouTubeTranscriptApi.CouldNotRetrieveTranscript(video_id)) for video_id in unretrievable_videos]
+            [str(exception) for exception in exceptions]
            + ([json.dumps(transcripts) if parsed_args.json else pprint.pformat(transcripts)] if transcripts else [])
        )

+    def _fetch_transcript(self, parsed_args, proxies, video_id):
+        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, proxies=proxies)
+
+        if parsed_args.list_transcripts:
+            return str(transcript_list)
+
+        if parsed_args.exclude_manually_created:
+            transcript = transcript_list.find_generated_transcript(parsed_args.languages)
+        elif parsed_args.exclude_generated:
+            transcript = transcript_list.find_manually_created_transcript(parsed_args.languages)
+        else:
+            transcript = transcript_list.find_transcript(parsed_args.languages)
+
+        if parsed_args.translate:
+            transcript = transcript.translate(parsed_args.translate)
+
+        return transcript.fetch()
+
    def _parse_args(self):
        parser = argparse.ArgumentParser(
            description=(
@ -38,19 +61,40 @@ class YouTubeTranscriptCli():
                'other selenium based solutions do!'
            )
        )
+        parser.add_argument(
+            '--list-transcripts',
+            action='store_const',
+            const=True,
+            default=False,
+            help='This will list the languages in which the given videos are available in.',
+        )
        parser.add_argument('video_ids', nargs='+', type=str, help='List of YouTube video IDs.')
        parser.add_argument(
            '--languages',
            nargs='*',
-            default=[],
+            default=['en',],
            type=str,
            help=(
                'A list of language codes in a descending priority. For example, if this is set to "de en" it will '
-                'first try to fetch the german transcript (de) and then fetch the english transcipt (en) if it fails '
+                'first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails '
                'to do so. As I can\'t provide a complete list of all working language codes with full certainty, you '
                'may have to play around with the language codes a bit, to find the one which is working for you!'
            ),
        )
+        parser.add_argument(
+            '--exclude-generated',
+            action='store_const',
+            const=True,
+            default=False,
+            help='If this flag is set transcripts which have been generated by YouTube will not be retrieved.',
+        )
+        parser.add_argument(
+            '--exclude-manually-created',
+            action='store_const',
+            const=True,
+            default=False,
+            help='If this flag is set transcripts which have been manually created will not be retrieved.',
+        )
        parser.add_argument(
            '--json',
            action='store_const',
@ -59,13 +103,24 @@ class YouTubeTranscriptCli():
            help='If this flag is set the output will be JSON formatted.',
        )
        parser.add_argument(
-            '--http-proxy', dest='http_proxy',
-            default='', metavar='URL',
+            '--translate',
+            default='',
+            help=(
+                'The language code for the language you want this transcript to be translated to. Use the '
+                '--list-transcripts feature to find out which languages are translatable and which translation '
+                'languages are available.'
+            )
+        )
+        parser.add_argument(
+            '--http-proxy',
+            default='',
+            metavar='URL',
            help='Use the specified HTTP proxy.'
        )
        parser.add_argument(
-            '--https-proxy', dest='https_proxy',
-            default='', metavar='URL',
+            '--https-proxy',
+            default='',
+            metavar='URL',
            help='Use the specified HTTPS proxy.'
        )

--- a/youtube_transcript_api/_errors.py
+++ b/youtube_transcript_api/_errors.py
@ -0,0 +1,74 @@
+from ._settings import WATCH_URL
+
+
+class CouldNotRetrieveTranscript(Exception):
+    """
+    Raised if a transcript could not be retrieved.
+    """
+    ERROR_MESSAGE = '\nCould not retrieve a transcript for the video {video_url}!'
+    CAUSE_MESSAGE_INTRO = ' This is most likely caused by:\n\n{cause}'
+    CAUSE_MESSAGE = ''
+    GITHUB_REFERRAL = (
+        '\n\nIf you are sure that the described cause is not responsible for this error '
+        'and that a transcript should be retrievable, please create an issue at '
+        'https://github.com/jdepoix/youtube-transcript-api/issues. '
+        'Please add which version of youtube_transcript_api you are using '
+        'and provide the information needed to replicate the error. '
+        'Also make sure that there are no open issues which already describe your problem!'
+    )
+
+    def __init__(self, video_id):
+        self.video_id = video_id
+        super(CouldNotRetrieveTranscript, self).__init__(self._build_error_message())
+
+    def _build_error_message(self):
+        cause = self.cause
+        error_message = self.ERROR_MESSAGE.format(video_url=WATCH_URL.format(video_id=self.video_id))
+
+        if cause:
+            error_message += self.CAUSE_MESSAGE_INTRO.format(cause=cause) + self.GITHUB_REFERRAL
+
+        return error_message
+
+    @property
+    def cause(self):
+        return self.CAUSE_MESSAGE
+
+
+class VideoUnavailable(CouldNotRetrieveTranscript):
+    CAUSE_MESSAGE = 'The video is no longer available'
+
+
+class TranscriptsDisabled(CouldNotRetrieveTranscript):
+    CAUSE_MESSAGE = 'Subtitles are disabled for this video'
+
+
+class NoTranscriptAvailable(CouldNotRetrieveTranscript):
+    CAUSE_MESSAGE = 'No transcripts are available for this video'
+
+
+class NotTranslatable(CouldNotRetrieveTranscript):
+    CAUSE_MESSAGE = 'The requested language is not translatable'
+
+
+class TranslationLanguageNotAvailable(CouldNotRetrieveTranscript):
+    CAUSE_MESSAGE = 'The requested translation language is not available'
+
+
+class NoTranscriptFound(CouldNotRetrieveTranscript):
+    CAUSE_MESSAGE = (
+        'No transcripts were found for any of the requested language codes: {requested_language_codes}\n\n'
+        '{transcript_data}'
+    )
+
+    def __init__(self, video_id, requested_language_codes, transcript_data):
+        self._requested_language_codes = requested_language_codes
+        self._transcript_data = transcript_data
+        super(NoTranscriptFound, self).__init__(video_id)
+
+    @property
+    def cause(self):
+        return self.CAUSE_MESSAGE.format(
+            requested_language_codes=self._requested_language_codes,
+            transcript_data=str(self._transcript_data),
+        )
--- a/youtube_transcript_api/_settings.py
+++ b/youtube_transcript_api/_settings.py
@ -0,0 +1 @@
+WATCH_URL = 'https://www.youtube.com/watch?v={video_id}'
--- a/youtube_transcript_api/_transcripts.py
+++ b/youtube_transcript_api/_transcripts.py
@ -0,0 +1,302 @@
+import sys
+
+# This can only be tested by using different python versions, therefore it is not covered by coverage.py
+if sys.version_info.major == 2: # pragma: no cover
+    reload(sys)
+    sys.setdefaultencoding('utf-8')
+
+import json
+
+from xml.etree import ElementTree
+
+import re
+
+from ._html_unescaping import unescape
+from ._errors import (
+    VideoUnavailable,
+    NoTranscriptFound,
+    TranscriptsDisabled,
+    NotTranslatable,
+    TranslationLanguageNotAvailable,
+    NoTranscriptAvailable,
+)
+from ._settings import WATCH_URL
+
+
+class TranscriptListFetcher():
+    def __init__(self, http_client):
+        self._http_client = http_client
+
+    def fetch(self, video_id):
+        return TranscriptList.build(
+            self._http_client,
+            video_id,
+            self._extract_captions_json(self._fetch_html(video_id), video_id)
+        )
+
+    def _extract_captions_json(self, html, video_id):
+        splitted_html = html.split('"captions":')
+
+        if len(splitted_html) <= 1:
+            if '"playabilityStatus":' not in html:
+                raise VideoUnavailable(video_id)
+
+            raise TranscriptsDisabled(video_id)
+
+        captions_json = json.loads(
+            splitted_html[1].split(',"videoDetails')[0].replace('\n', '')
+        )['playerCaptionsTracklistRenderer']
+
+        if 'captionTracks' not in captions_json:
+            raise NoTranscriptAvailable(video_id)
+
+        return captions_json
+
+    def _fetch_html(self, video_id):
+        return self._http_client.get(WATCH_URL.format(video_id=video_id)).text.replace(
+            '\\u0026', '&'
+        ).replace(
+            '\\', ''
+        )
+
+
+class TranscriptList():
+    """
+    This object represents a list of transcripts. It can be iterated over to list all transcripts which are available
+    for a given YouTube video. Also it provides functionality to search for a transcript in a given language.
+    """
+    def __init__(self, video_id, manually_created_transcripts, generated_transcripts, translation_languages):
+        """
+        The constructor is only for internal use. Use the static build method instead.
+
+        :param video_id: the id of the video this TranscriptList is for
+        :type video_id: str
+        :param manually_created_transcripts: dict mapping language codes to the manually created transcripts
+        :type manually_created_transcripts: dict[str, Transcript]
+        :param generated_transcripts: dict mapping language codes to the generated transcripts
+        :type generated_transcripts: dict[str, Transcript]
+        :param translation_languages: list of languages which can be used for translatable languages
+        :type translation_languages: list[dict[str, str]]
+        """
+        self.video_id = video_id
+        self._manually_created_transcripts = manually_created_transcripts
+        self._generated_transcripts = generated_transcripts
+        self._translation_languages = translation_languages
+
+    @staticmethod
+    def build(http_client, video_id, captions_json):
+        """
+        Factory method for TranscriptList.
+
+        :param http_client: http client which is used to make the transcript retrieving http calls
+        :type http_client: requests.Session
+        :param video_id: the id of the video this TranscriptList is for
+        :type video_id: str
+        :param captions_json: the JSON parsed from the YouTube pages static HTML
+        :type captions_json: dict
+        :return: the created TranscriptList
+        :rtype TranscriptList:
+        """
+        translation_languages = [
+            {
+                'language': translation_language['languageName']['simpleText'],
+                'language_code': translation_language['languageCode'],
+            } for translation_language in captions_json['translationLanguages']
+        ]
+
+        manually_created_transcripts = {}
+        generated_transcripts = {}
+
+        for caption in captions_json['captionTracks']:
+            if caption.get('kind', '') == 'asr':
+                transcript_dict = generated_transcripts
+            else:
+                transcript_dict = manually_created_transcripts
+
+            transcript_dict[caption['languageCode']] = Transcript(
+                http_client,
+                video_id,
+                caption['baseUrl'],
+                caption['name']['simpleText'],
+                caption['languageCode'],
+                caption.get('kind', '') == 'asr',
+                translation_languages if caption.get('isTranslatable', False) else []
+            )
+
+        return TranscriptList(
+            video_id,
+            manually_created_transcripts,
+            generated_transcripts,
+            translation_languages,
+        )
+
+    def __iter__(self):
+        return iter(list(self._manually_created_transcripts.values()) + list(self._generated_transcripts.values()))
+
+    def find_transcript(self, language_codes):
+        """
+        Finds a transcript for a given language code. Manually created transcripts are returned first and only if none
+        are found, generated transcripts are used. If you only want generated transcripts use
+        find_manually_created_transcript instead.
+
+        :param language_codes: A list of language codes in a descending priority. For example, if this is set to
+        ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if
+        it fails to do so.
+        :type languages: list[str]
+        :return: the found Transcript
+        :rtype Transcript:
+        :raises: NoTranscriptFound
+        """
+        return self._find_transcript(language_codes, [self._manually_created_transcripts, self._generated_transcripts])
+
+    def find_generated_transcript(self, language_codes):
+        """
+        Finds a automatically generated transcript for a given language code.
+
+        :param language_codes: A list of language codes in a descending priority. For example, if this is set to
+        ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if
+        it fails to do so.
+        :type languages: list[str]
+        :return: the found Transcript
+        :rtype Transcript:
+        :raises: NoTranscriptFound
+        """
+        return self._find_transcript(language_codes, [self._generated_transcripts,])
+
+    def find_manually_created_transcript(self, language_codes):
+        """
+        Finds a manually created transcript for a given language code.
+
+        :param language_codes: A list of language codes in a descending priority. For example, if this is set to
+        ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if
+        it fails to do so.
+        :type languages: list[str]
+        :return: the found Transcript
+        :rtype Transcript:
+        :raises: NoTranscriptFound
+        """
+        return self._find_transcript(language_codes, [self._manually_created_transcripts,])
+
+    def _find_transcript(self, language_codes, transcript_dicts):
+        for language_code in language_codes:
+            for transcript_dict in transcript_dicts:
+                if language_code in transcript_dict:
+                    return transcript_dict[language_code]
+
+        raise NoTranscriptFound(
+            self.video_id,
+            language_codes,
+            self
+        )
+
+    def __str__(self):
+        return (
+            'For this video ({video_id}) transcripts are available in the following languages:\n\n'
+            '(MANUALLY CREATED)\n'
+            '{available_manually_created_transcript_languages}\n\n'
+            '(GENERATED)\n'
+            '{available_generated_transcripts}\n\n'
+            '(TRANSLATION LANGUAGES)\n'
+            '{available_translation_languages}'
+        ).format(
+            video_id=self.video_id,
+            available_manually_created_transcript_languages=self._get_language_description(
+                str(transcript) for transcript in self._manually_created_transcripts.values()
+            ),
+            available_generated_transcripts=self._get_language_description(
+                str(transcript) for transcript in self._generated_transcripts.values()
+            ),
+            available_translation_languages=self._get_language_description(
+                '{language_code} ("{language}")'.format(
+                    language=translation_language['language'],
+                    language_code=translation_language['language_code'],
+                ) for translation_language in self._translation_languages
+            )
+        )
+
+    def _get_language_description(self, transcript_strings):
+        description = '\n'.join(' - {transcript}'.format(transcript=transcript) for transcript in transcript_strings)
+        return description if description else 'None'
+
+
+class Transcript():
+    def __init__(self, http_client, video_id, url, language, language_code, is_generated, translation_languages):
+        """
+        You probably don't want to initialize this directly. Usually you'll access Transcript objects using a
+        TranscriptList.
+
+        :param http_client: http client which is used to make the transcript retrieving http calls
+        :type http_client: requests.Session
+        :param video_id: the id of the video this TranscriptList is for
+        :type video_id: str
+        :param url: the url which needs to be called to fetch the transcript
+        :param language: the name of the language this transcript uses
+        :param language_code:
+        :param is_generated:
+        :param translation_languages:
+        """
+        self._http_client = http_client
+        self.video_id = video_id
+        self._url = url
+        self.language = language
+        self.language_code = language_code
+        self.is_generated = is_generated
+        self.translation_languages = translation_languages
+        self._translation_languages_dict = {
+            translation_language['language_code']: translation_language['language']
+            for translation_language in translation_languages
+        }
+
+    def fetch(self):
+        """
+        Loads the actual transcript data.
+
+        :return: a list of dictionaries containing the 'text', 'start' and 'duration' keys
+        :rtype [{'text': str, 'start': float, 'end': float}]:
+        """
+        return _TranscriptParser().parse(
+            self._http_client.get(self._url).text
+        )
+
+    def __str__(self):
+        return '{language_code} ("{language}"){translation_description}'.format(
+            language=self.language,
+            language_code=self.language_code,
+            translation_description='[TRANSLATABLE]' if self.is_translatable else ''
+        )
+
+    @property
+    def is_translatable(self):
+        return len(self.translation_languages) > 0
+
+    def translate(self, language_code):
+        if not self.is_translatable:
+            raise NotTranslatable(self.video_id)
+
+        if language_code not in self._translation_languages_dict:
+            raise TranslationLanguageNotAvailable(self.video_id)
+
+        return Transcript(
+            self._http_client,
+            self.video_id,
+            '{url}&tlang={language_code}'.format(url=self._url, language_code=language_code),
+            self._translation_languages_dict[language_code],
+            language_code,
+            True,
+            [],
+        )
+
+
+class _TranscriptParser():
+    HTML_TAG_REGEX = re.compile(r'<[^>]*>', re.IGNORECASE)
+
+    def parse(self, plain_data):
+        return [
+            {
+                'text': re.sub(self.HTML_TAG_REGEX, '', unescape(xml_element.text)),
+                'start': float(xml_element.attrib['start']),
+                'duration': float(xml_element.attrib.get('dur', '0.0')),
+            }
+            for xml_element in ElementTree.fromstring(plain_data)
+            if xml_element.text is not None
+        ]
--- a/youtube_transcript_api/test/assets/youtube_no_transcript_available.html.static
+++ b/youtube_transcript_api/test/assets/youtube_no_transcript_available.html.static
--- a/youtube_transcript_api/test/assets/youtube_transcripts_disabled.html.static
+++ b/youtube_transcript_api/test/assets/youtube_transcripts_disabled.html.static
--- a/youtube_transcript_api/test/assets/youtube_video_unavailable.html.static
+++ b/youtube_transcript_api/test/assets/youtube_video_unavailable.html.static
--- a/youtube_transcript_api/test/assets/youtube_ww1_nl_en.html.static
+++ b/youtube_transcript_api/test/assets/youtube_ww1_nl_en.html.static
--- a/youtube_transcript_api/test/test_api.py
+++ b/youtube_transcript_api/test/test_api.py
@ -5,7 +5,15 @@ import os

 import httpretty

-from youtube_transcript_api._api import YouTubeTranscriptApi
+from youtube_transcript_api import (
+    YouTubeTranscriptApi,
+    TranscriptsDisabled,
+    NoTranscriptFound,
+    VideoUnavailable,
+    NoTranscriptAvailable,
+    NotTranslatable,
+    TranslationLanguageNotAvailable,
+)


 def load_asset(filename):
@ -42,6 +50,51 @@ class TestYouTubeTranscriptApi(TestCase):
            ]
        )

+    def test_list_transcripts(self):
+        transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8')
+
+        language_codes = {transcript.language_code for transcript in transcript_list}
+
+        self.assertEqual(language_codes, {'zh', 'de', 'en', 'hi', 'ja', 'ko', 'es', 'cs', 'en'})
+
+    def test_list_transcripts__find_manually_created(self):
+        transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8')
+        transcript = transcript_list.find_manually_created_transcript(['cs'])
+
+        self.assertFalse(transcript.is_generated)
+
+
+    def test_list_transcripts__find_generated(self):
+        transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8')
+
+        with self.assertRaises(NoTranscriptFound):
+            transcript_list.find_generated_transcript(['cs'])
+
+        transcript = transcript_list.find_generated_transcript(['en'])
+
+        self.assertTrue(transcript.is_generated)
+
+    def test_translate_transcript(self):
+        transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en'])
+
+        translated_transcript = transcript.translate('af')
+
+        self.assertEqual(translated_transcript.language_code, 'af')
+        self.assertIn('&tlang=af', translated_transcript._url)
+
+    def test_translate_transcript__translation_language_not_available(self):
+        transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en'])
+
+        with self.assertRaises(TranslationLanguageNotAvailable):
+            transcript.translate('xyz')
+
+    def test_translate_transcript__not_translatable(self):
+        transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en'])
+        transcript.translation_languages = []
+
+        with self.assertRaises(NotTranslatable):
+            transcript.translate('af')
+
    def test_get_transcript__correct_language_is_used(self):
        YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8', ['de', 'en'])
        query_string = httpretty.last_request().querystring
@ -53,26 +106,50 @@ class TestYouTubeTranscriptApi(TestCase):
    def test_get_transcript__fallback_language_is_used(self):
        httpretty.register_uri(
            httpretty.GET,
-            'https://www.youtube.com/api/timedtext',
-            body=''
+            'https://www.youtube.com/watch',
+            body=load_asset('youtube_ww1_nl_en.html.static')
        )

-        YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8', ['de', 'en'])
+        YouTubeTranscriptApi.get_transcript('F1xioXWb8CY', ['de', 'en'])
        query_string = httpretty.last_request().querystring

        self.assertIn('lang', query_string)
        self.assertEqual(len(query_string['lang']), 1)
        self.assertEqual(query_string['lang'][0], 'en')

-    def test_get_transcript__exception_is_raised_when_not_available(self):
+    def test_get_transcript__exception_if_video_unavailable(self):
        httpretty.register_uri(
            httpretty.GET,
-            'https://www.youtube.com/api/timedtext',
-            body=''
+            'https://www.youtube.com/watch',
+            body=load_asset('youtube_video_unavailable.html.static')
        )

-        with self.assertRaises(YouTubeTranscriptApi.CouldNotRetrieveTranscript):
-            YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8')
+        with self.assertRaises(VideoUnavailable):
+            YouTubeTranscriptApi.get_transcript('abc')
+
+    def test_get_transcript__exception_if_transcripts_disabled(self):
+        httpretty.register_uri(
+            httpretty.GET,
+            'https://www.youtube.com/watch',
+            body=load_asset('youtube_transcripts_disabled.html.static')
+        )
+
+        with self.assertRaises(TranscriptsDisabled):
+            YouTubeTranscriptApi.get_transcript('dsMFmonKDD4')
+
+    def test_get_transcript__exception_if_language_unavailable(self):
+        with self.assertRaises(NoTranscriptFound):
+            YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8', languages=['cz'])
+
+    def test_get_transcript__exception_if_no_transcript_available(self):
+        httpretty.register_uri(
+            httpretty.GET,
+            'https://www.youtube.com/watch',
+            body=load_asset('youtube_no_transcript_available.html.static')
+        )
+
+        with self.assertRaises(NoTranscriptAvailable):
+            YouTubeTranscriptApi.get_transcript('MwBPvcYFY2E')

    def test_get_transcripts(self):
        video_id_1 = 'video_id_1'
@ -99,8 +176,8 @@ class TestYouTubeTranscriptApi(TestCase):

        YouTubeTranscriptApi.get_transcripts(['video_id_1', 'video_id_2'], continue_after_error=True)

-        YouTubeTranscriptApi.get_transcript.assert_any_call(video_id_1, None, None)
-        YouTubeTranscriptApi.get_transcript.assert_any_call(video_id_2, None, None)
+        YouTubeTranscriptApi.get_transcript.assert_any_call(video_id_1, ('en',), None)
+        YouTubeTranscriptApi.get_transcript.assert_any_call(video_id_2, ('en',), None)

    def test_get_transcript__with_proxies(self):
        proxies = {'http': '', 'https:': ''}
@ -118,4 +195,4 @@ class TestYouTubeTranscriptApi(TestCase):
        )
        YouTubeTranscriptApi.get_transcript = MagicMock()
        YouTubeTranscriptApi.get_transcripts(['GJLlxj_dtq8'], proxies=proxies)
-        YouTubeTranscriptApi.get_transcript.assert_any_call('GJLlxj_dtq8', None, proxies)
+        YouTubeTranscriptApi.get_transcript.assert_any_call('GJLlxj_dtq8', ('en',), proxies)
--- a/youtube_transcript_api/test/test_cli.py
+++ b/youtube_transcript_api/test/test_cli.py
@ -3,10 +3,27 @@ from mock import MagicMock

 import json

-from youtube_transcript_api._cli import YouTubeTranscriptCli, YouTubeTranscriptApi
+from youtube_transcript_api import YouTubeTranscriptApi, VideoUnavailable
+from youtube_transcript_api._cli import YouTubeTranscriptCli


 class TestYouTubeTranscriptCli(TestCase):
+    def setUp(self):
+        self.transcript_mock = MagicMock()
+        self.transcript_mock.fetch = MagicMock(return_value=[
+            {'text': 'Hey, this is just a test', 'start': 0.0, 'duration': 1.54},
+            {'text': 'this is not the original transcript', 'start': 1.54, 'duration': 4.16},
+            {'text': 'just something shorter, I made up for testing', 'start': 5.7, 'duration': 3.239}
+        ])
+        self.transcript_mock.translate = MagicMock(return_value=self.transcript_mock)
+
+        self.transcript_list_mock = MagicMock()
+        self.transcript_list_mock.find_generated_transcript = MagicMock(return_value=self.transcript_mock)
+        self.transcript_list_mock.find_manually_created_transcript = MagicMock(return_value=self.transcript_mock)
+        self.transcript_list_mock.find_transcript = MagicMock(return_value=self.transcript_mock)
+
+        YouTubeTranscriptApi.list_transcripts = MagicMock(return_value=self.transcript_list_mock)
+
    def test_argument_parsing(self):
        parsed_args = YouTubeTranscriptCli('v1 v2 --json --languages de en'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
@ -60,7 +77,7 @@ class TestYouTubeTranscriptCli(TestCase):
        parsed_args = YouTubeTranscriptCli('v1 v2'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertEqual(parsed_args.json, False)
-        self.assertEqual(parsed_args.languages, [])
+        self.assertEqual(parsed_args.languages, ['en'])

    def test_argument_parsing__fail_without_video_ids(self):
        with self.assertRaises(SystemExit):
@ -70,12 +87,12 @@ class TestYouTubeTranscriptCli(TestCase):
        parsed_args = YouTubeTranscriptCli('v1 v2 --json'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertEqual(parsed_args.json, True)
-        self.assertEqual(parsed_args.languages, [])
+        self.assertEqual(parsed_args.languages, ['en'])

        parsed_args = YouTubeTranscriptCli('--json v1 v2'.split())._parse_args()
        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
        self.assertEqual(parsed_args.json, True)
-        self.assertEqual(parsed_args.languages, [])
+        self.assertEqual(parsed_args.languages, ['en'])

    def test_argument_parsing__languages(self):
        parsed_args = YouTubeTranscriptCli('v1 v2 --languages de en'.split())._parse_args()
@ -106,32 +123,107 @@ class TestYouTubeTranscriptCli(TestCase):
        self.assertEqual(parsed_args.http_proxy, '')
        self.assertEqual(parsed_args.https_proxy, '')

+    def test_argument_parsing__list_transcripts(self):
+        parsed_args = YouTubeTranscriptCli('--list-transcripts v1 v2'.split())._parse_args()
+        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
+        self.assertTrue(parsed_args.list_transcripts)
+
+        parsed_args = YouTubeTranscriptCli('v1 v2 --list-transcripts'.split())._parse_args()
+        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
+        self.assertTrue(parsed_args.list_transcripts)
+
+    def test_argument_parsing__translate(self):
+        parsed_args = YouTubeTranscriptCli('v1 v2 --languages de en --translate cz'.split())._parse_args()
+        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
+        self.assertEqual(parsed_args.json, False)
+        self.assertEqual(parsed_args.languages, ['de', 'en'])
+        self.assertEqual(parsed_args.translate, 'cz')
+
+        parsed_args = YouTubeTranscriptCli('v1 v2 --translate cz --languages de en'.split())._parse_args()
+        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
+        self.assertEqual(parsed_args.json, False)
+        self.assertEqual(parsed_args.languages, ['de', 'en'])
+        self.assertEqual(parsed_args.translate, 'cz')
+
+    def test_argument_parsing__manually_or_generated(self):
+        parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-manually-created'.split())._parse_args()
+        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
+        self.assertTrue(parsed_args.exclude_manually_created)
+        self.assertFalse(parsed_args.exclude_generated)
+
+        parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-generated'.split())._parse_args()
+        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
+        self.assertFalse(parsed_args.exclude_manually_created)
+        self.assertTrue(parsed_args.exclude_generated)
+
+        parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-manually-created --exclude-generated'.split())._parse_args()
+        self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
+        self.assertTrue(parsed_args.exclude_manually_created)
+        self.assertTrue(parsed_args.exclude_generated)
+
    def test_run(self):
-        YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([], []))
        YouTubeTranscriptCli('v1 v2 --languages de en'.split()).run()

-        YouTubeTranscriptApi.get_transcripts.assert_called_once_with(
-            ['v1', 'v2'],
-            languages=['de', 'en'],
-            continue_after_error=True,
-            proxies=None
+        YouTubeTranscriptApi.list_transcripts.assert_any_call('v1', proxies=None)
+        YouTubeTranscriptApi.list_transcripts.assert_any_call('v2', proxies=None)
+
+        self.transcript_list_mock.find_transcript.assert_any_call(['de', 'en'])
+
+    def test_run__failing_transcripts(self):
+        YouTubeTranscriptApi.list_transcripts = MagicMock(side_effect=VideoUnavailable('video_id'))
+
+        output = YouTubeTranscriptCli('v1 --languages de en'.split()).run()
+
+        self.assertEqual(output, str(VideoUnavailable('video_id')))
+
+    def test_run__exclude_generated(self):
+        YouTubeTranscriptCli('v1 v2 --languages de en --exclude-generated'.split()).run()
+
+        self.transcript_list_mock.find_manually_created_transcript.assert_any_call(['de', 'en'])
+
+    def test_run__exclude_manually_created(self):
+        YouTubeTranscriptCli('v1 v2 --languages de en --exclude-manually-created'.split()).run()
+
+        self.transcript_list_mock.find_generated_transcript.assert_any_call(['de', 'en'])
+
+    def test_run__exclude_manually_created_and_generated(self):
+        self.assertEqual(
+            YouTubeTranscriptCli('v1 v2 --languages de en --exclude-manually-created --exclude-generated'.split()).run(),
+            ''
        )

+    def test_run__translate(self):
+        YouTubeTranscriptCli('v1 v2 --languages de en --translate cz'.split()).run(),
+
+        self.transcript_mock.translate.assert_any_call('cz')
+
+    def test_run__list_transcripts(self):
+        YouTubeTranscriptCli('--list-transcripts v1 v2'.split()).run()
+
+        YouTubeTranscriptApi.list_transcripts.assert_any_call('v1', proxies=None)
+        YouTubeTranscriptApi.list_transcripts.assert_any_call('v2', proxies=None)
+
    def test_run__json_output(self):
-        YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([{'boolean': True}], []))
        output = YouTubeTranscriptCli('v1 v2 --languages de en --json'.split()).run()

        # will fail if output is not valid json
        json.loads(output)

    def test_run__proxies(self):
-        YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([], []))
        YouTubeTranscriptCli(
-            'v1 v2 --languages de en --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port'.split()).run()
+            (
+                'v1 v2 --languages de en '
+                '--http-proxy http://user:pass@domain:port '
+                '--https-proxy https://user:pass@domain:port'
+            ).split()
+        ).run()

-        YouTubeTranscriptApi.get_transcripts.assert_called_once_with(
-            ['v1', 'v2'],
-            languages=['de', 'en'],
-            continue_after_error=True,
+        YouTubeTranscriptApi.list_transcripts.assert_any_call(
+            'v1',
+            proxies={'http': 'http://user:pass@domain:port', 'https': 'https://user:pass@domain:port'}
+        )
+
+        YouTubeTranscriptApi.list_transcripts.assert_any_call(
+            'v2',
            proxies={'http': 'http://user:pass@domain:port', 'https': 'https://user:pass@domain:port'}
        )
				`@ -0,0 +1 @@`
				`WATCH_URL = 'https://www.youtube.com/watch?v={video_id}'`