Merge pull request #42 from jdepoix/feature/translating-transcripts
Feature/translating transcripts
This commit is contained in:
commit
68951600d9
287
README.md
287
README.md
|
@ -1,121 +1,232 @@
|
|||
# YouTube Transcript/Subtitle API (including automatically generated subtitles)
|
||||
|
||||
[](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)
|
||||
[](https://travis-ci.org/jdepoix/youtube-transcript-api)
|
||||
[](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master)
|
||||
[](http://opensource.org/licenses/MIT)
|
||||
[](https://pypi.org/project/youtube-transcript-api/)
|
||||
[](https://pypi.org/project/youtube-transcript-api/)
|
||||
# YouTube Transcript/Subtitle API (including automatically generated subtitles and subtitle translations)
|
||||
|
||||
[](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)
|
||||
[](https://travis-ci.org/jdepoix/youtube-transcript-api)
|
||||
[](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master)
|
||||
[](http://opensource.org/licenses/MIT)
|
||||
[](https://pypi.org/project/youtube-transcript-api/)
|
||||
[](https://pypi.org/project/youtube-transcript-api/)
|
||||
|
||||
This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!
|
||||
|
||||
## Install
|
||||
|
||||
It is recommended to [install this module by using pip](https://pypi.org/project/youtube-transcript-api/):
|
||||
|
||||
```
|
||||
pip install youtube_transcript_api
|
||||
```
|
||||
|
||||
If you want to use it from source, you'll have to install the dependencies manually:
|
||||
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require a headless browser, like other selenium based solutions do!
|
||||
You can either integrate this module [into an existing application](#api), or just use it via an [CLI](#cli).
|
||||
|
||||
## API
|
||||
|
||||
The easiest way to get a transcript for a given video is to execute:
|
||||
|
||||
```python
|
||||
from youtube_transcript_api import YouTubeTranscriptApi
|
||||
|
||||
YouTubeTranscriptApi.get_transcript(video_id)
|
||||
```
|
||||
|
||||
This will return a list of dictionaries looking somewhat like this:
|
||||
|
||||
```python
|
||||
[
|
||||
{
|
||||
'text': 'Hey there',
|
||||
'start': 7.58,
|
||||
'duration': 6.13
|
||||
},
|
||||
{
|
||||
'text': 'how are you',
|
||||
'start': 14.08,
|
||||
'duration': 7.58
|
||||
},
|
||||
# ...
|
||||
]
|
||||
```
|
||||
|
||||
You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it defaults to english).
|
||||
|
||||
```python
|
||||
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
||||
```
|
||||
|
||||
It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out which languages are available first, [have a look at `list_transcripts()`](#list-available-transcripts)
|
||||
|
||||
To get transcripts for a list of video ids you can call:
|
||||
|
||||
```python
|
||||
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
||||
```
|
||||
|
||||
`languages` also is optional here.
|
||||
|
||||
## Install
|
||||
### List available transcripts
|
||||
|
||||
It is recommended to [install this module by using pip](https://pypi.org/project/youtube-transcript-api/):
|
||||
|
||||
```
|
||||
pip install youtube_transcript_api
|
||||
```
|
||||
|
||||
If you want to use it from source, you'll have to install the dependencies manually:
|
||||
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## How to use it
|
||||
|
||||
You could either integrate this module into an existing application, or just use it via an CLI
|
||||
|
||||
### In code
|
||||
|
||||
To get a transcript for a given video you can do:
|
||||
If you want to list all transcripts which are available for a given video you can call:
|
||||
|
||||
```python
|
||||
from youtube_transcript_api import YouTubeTranscriptApi
|
||||
|
||||
YouTubeTranscriptApi.get_transcript(video_id)
|
||||
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, languages=['de', 'en'])
|
||||
```
|
||||
|
||||
This will return a list of dictionaries looking somewhat like this:
|
||||
This will return a `TranscriptList` object which is iterable and provides methods to filter the list of transcripts for specific languages and types, like:
|
||||
|
||||
```python
|
||||
[
|
||||
{
|
||||
'text': 'Hey there',
|
||||
'start': 7.58,
|
||||
'duration': 6.13
|
||||
},
|
||||
{
|
||||
'text': 'how are you',
|
||||
'start': 14.08,
|
||||
'duration': 7.58
|
||||
},
|
||||
# ...
|
||||
]
|
||||
transcript = transcript_list.find_transcript(['de', 'en'])
|
||||
```
|
||||
|
||||
You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it defaults to english).
|
||||
By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this default behaviour by searching for specific transcript types:
|
||||
|
||||
```python
|
||||
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
||||
# filter for manually created transcripts
|
||||
transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
|
||||
|
||||
# or automatically generated ones
|
||||
transcript = transcript_list.find_generated_transcript(['de', 'en'])
|
||||
```
|
||||
|
||||
It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. As I can't provide a complete list of all working language codes with full certainty, you may have to play around with the language codes a bit, to find the one which is working for you!
|
||||
|
||||
To get transcripts for a list fo video ids you can call:
|
||||
The methods `find_generated_transcript`, `find_manually_created_transcript`, `find_generated_transcript` return `Transcript` objects. They contain metadata regarding the transcript:
|
||||
|
||||
```python
|
||||
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
||||
print(
|
||||
transcript.video_id,
|
||||
transcript.language,
|
||||
transcript.language_code,
|
||||
# whether it has been manually created or generated by YouTube
|
||||
transcript.is_generated,
|
||||
# whether this transcript can be translated or not
|
||||
transcript.is_translatable,
|
||||
# a list of languages the transcript can be translated to
|
||||
transcript.translation_languages,
|
||||
)
|
||||
```
|
||||
|
||||
`languages` also is optional here.
|
||||
|
||||
### CLI
|
||||
|
||||
Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ...
|
||||
```
|
||||
|
||||
The CLI also gives you the option to provide a list of preferred languages:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en
|
||||
```
|
||||
|
||||
If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json
|
||||
```
|
||||
|
||||
### Proxy
|
||||
|
||||
You can specify a https/http proxy, which will be used during the requests to YouTube:
|
||||
and provide the method, which allows you to fetch the actual transcript data:
|
||||
|
||||
```python
|
||||
from youtube_transcript_api import YouTubeTranscriptApi
|
||||
|
||||
YouTubeTranscriptApi.get_transcript(video_id, proxies={"http": "http://user:pass@domain:port", "https": "https://user:pass@domain:port"})
|
||||
transcript.fetch()
|
||||
```
|
||||
|
||||
As the `proxies` dict is passed on to the `requests.get(...)` call, it follows the [format used by the requests library](http://docs.python-requests.org/en/master/user/advanced/#proxies).
|
||||
### Translate transcript
|
||||
|
||||
Using the CLI:
|
||||
YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated `Transcript` object:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port
|
||||
```python
|
||||
transcript = transcript_list.find_transcript(['en'])
|
||||
translated_transcript = transcript.translate('de')
|
||||
print(translated_transcript.fetch())
|
||||
```
|
||||
|
||||
### By example
|
||||
```python
|
||||
# retrieve the available transcripts
|
||||
transcript_list = YouTubeTranscriptApi.get('video_id')
|
||||
|
||||
# iterate over all available transcripts
|
||||
for transcript in transcript_list:
|
||||
|
||||
## Warning
|
||||
# the Transcript object provides metadata properties
|
||||
print(
|
||||
transcript.video_id,
|
||||
transcript.language,
|
||||
transcript.language_code,
|
||||
# whether it has been manually created or generated by YouTube
|
||||
transcript.is_generated,
|
||||
# whether this transcript can be translated or not
|
||||
transcript.is_translatable,
|
||||
# a list of languages the transcript can be translated to
|
||||
transcript.translation_languages,
|
||||
)
|
||||
|
||||
# fetch the actual transcript data
|
||||
print(transcript.fetch())
|
||||
|
||||
# translating the transcript will return another transcript object
|
||||
print(transcript.translate('en').fetch())
|
||||
|
||||
# you can also directly filter for the language you are looking for, using the transcript list
|
||||
transcript = transcript_list.find_transcript(['de', 'en'])
|
||||
|
||||
# or just filter for manually created transcripts
|
||||
transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
|
||||
|
||||
# or automatically generated ones
|
||||
transcript = transcript_list.find_generated_transcript(['de', 'en'])
|
||||
```
|
||||
|
||||
## CLI
|
||||
|
||||
Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ...
|
||||
```
|
||||
|
||||
The CLI also gives you the option to provide a list of preferred languages:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en
|
||||
```
|
||||
|
||||
This code uses an undocumented part of the YouTube API, which is called by the YouTube web-client. So there is no guarantee that it won't stop working tomorrow, if they change how things work. I will however do my best to make things working again as soon as possible if that happens. So if it stops working, let me know!
|
||||
You can also specify if you want to exclude automatically generated or manually created subtitles:
|
||||
|
||||
## Donation
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-generated
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-manually-created
|
||||
```
|
||||
|
||||
If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json
|
||||
```
|
||||
|
||||
If this project makes you happy by reducing your development time, you can make me happy by treating me to a cup of coffee :)
|
||||
Translating transcripts using the CLI is also possible:
|
||||
|
||||
[](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages en --translate de
|
||||
```
|
||||
|
||||
If you are not sure which languages are available for a given video you can call, to list all available transcripts:
|
||||
|
||||
```
|
||||
youtube_transcript_api --list-transcripts <first_video_id>
|
||||
```
|
||||
|
||||
## Proxy
|
||||
|
||||
You can specify a https/http proxy, which will be used during the requests to YouTube:
|
||||
|
||||
```python
|
||||
from youtube_transcript_api import YouTubeTranscriptApi
|
||||
|
||||
YouTubeTranscriptApi.get_transcript(video_id, proxies={"http": "http://user:pass@domain:port", "https": "https://user:pass@domain:port"})
|
||||
```
|
||||
|
||||
As the `proxies` dict is passed on to the `requests.get(...)` call, it follows the [format used by the requests library](http://docs.python-requests.org/en/master/user/advanced/#proxies).
|
||||
|
||||
Using the CLI:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port
|
||||
```
|
||||
|
||||
|
||||
## Warning
|
||||
|
||||
This code uses an undocumented part of the YouTube API, which is called by the YouTube web-client. So there is no guarantee that it won't stop working tomorrow, if they change how things work. I will however do my best to make things working again as soon as possible if that happens. So if it stops working, let me know!
|
||||
|
||||
## Donation
|
||||
|
||||
If this project makes you happy by reducing your development time, you can make me happy by treating me to a cup of coffee :)
|
||||
|
||||
[](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)
|
|
@ -1,3 +1,11 @@
|
|||
from ._api import YouTubeTranscriptApi
|
||||
from ._transcripts import TranscriptList, Transcript
|
||||
from ._errors import TranscriptsDisabled, NoTranscriptFound, CouldNotRetrieveTranscript, VideoUnavailable
|
||||
from ._errors import (
|
||||
TranscriptsDisabled,
|
||||
NoTranscriptFound,
|
||||
CouldNotRetrieveTranscript,
|
||||
VideoUnavailable,
|
||||
NotTranslatable,
|
||||
TranslationLanguageNotAvailable,
|
||||
NoTranscriptAvailable,
|
||||
)
|
||||
|
|
|
@ -4,17 +4,68 @@ from ._transcripts import TranscriptListFetcher
|
|||
|
||||
|
||||
class YouTubeTranscriptApi():
|
||||
@classmethod
|
||||
def list_transcripts(cls, video_id, proxies=None):
|
||||
"""
|
||||
Retrieves the list of transcripts which are available for a given video. It returns a `TranscriptList` object
|
||||
which is iterable and provides methods to filter the list of transcripts for specific languages. While iterating
|
||||
over the `TranscriptList` the individual transcripts are represented by `Transcript` objects, which provide
|
||||
metadata and can either be fetched by calling `transcript.fetch()` or translated by calling
|
||||
`transcript.translate('en')`. Example::
|
||||
|
||||
# retrieve the available transcripts
|
||||
transcript_list = YouTubeTranscriptApi.get('video_id')
|
||||
|
||||
# iterate over all available transcripts
|
||||
for transcript in transcript_list:
|
||||
# the Transcript object provides metadata properties
|
||||
print(
|
||||
transcript.video_id,
|
||||
transcript.language,
|
||||
transcript.language_code,
|
||||
# whether it has been manually created or generated by YouTube
|
||||
transcript.is_generated,
|
||||
# a list of languages the transcript can be translated to
|
||||
transcript.translation_languages,
|
||||
)
|
||||
|
||||
# fetch the actual transcript data
|
||||
print(transcript.fetch())
|
||||
|
||||
# translating the transcript will return another transcript object
|
||||
print(transcript.translate('en').fetch())
|
||||
|
||||
# you can also directly filter for the language you are looking for, using the transcript list
|
||||
transcript = transcript_list.find_transcript(['de', 'en'])
|
||||
|
||||
# or just filter for manually created transcripts
|
||||
transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
|
||||
|
||||
# or automatically generated ones
|
||||
transcript = transcript_list.find_generated_transcript(['de', 'en'])
|
||||
|
||||
:param video_id: the youtube video id
|
||||
:type video_id: str
|
||||
:param proxies: a dictionary mapping of http and https proxies to be used for the network requests
|
||||
:type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies
|
||||
:return: the list of available transcripts
|
||||
:rtype TranscriptList:
|
||||
"""
|
||||
with requests.Session() as http_client:
|
||||
http_client.proxies = proxies if proxies else {}
|
||||
return TranscriptListFetcher(http_client).fetch(video_id)
|
||||
|
||||
@classmethod
|
||||
def get_transcripts(cls, video_ids, languages=('en',), continue_after_error=False, proxies=None):
|
||||
"""
|
||||
Retrieves the transcripts for a list of videos.
|
||||
|
||||
:param video_ids: a list of youtube video ids
|
||||
:type video_ids: [str]
|
||||
:type video_ids: list[str]
|
||||
:param languages: A list of language codes in a descending priority. For example, if this is set to ['de', 'en']
|
||||
it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails to
|
||||
do so.
|
||||
:type languages: [str]
|
||||
:type languages: list[str]
|
||||
:param continue_after_error: if this is set the execution won't be stopped, if an error occurs while retrieving
|
||||
one of the video transcripts
|
||||
:type continue_after_error: bool
|
||||
|
@ -22,7 +73,7 @@ class YouTubeTranscriptApi():
|
|||
:type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies
|
||||
:return: a tuple containing a dictionary mapping video ids onto their corresponding transcripts, and a list of
|
||||
video ids, which could not be retrieved
|
||||
:rtype: ({str: [{'text': str, 'start': float, 'end': float}]}, [str]})
|
||||
:rtype ({str: [{'text': str, 'start': float, 'end': float}]}, [str]}):
|
||||
"""
|
||||
data = {}
|
||||
unretrievable_videos = []
|
||||
|
@ -41,19 +92,19 @@ class YouTubeTranscriptApi():
|
|||
@classmethod
|
||||
def get_transcript(cls, video_id, languages=('en',), proxies=None):
|
||||
"""
|
||||
Retrieves the transcript for a single video.
|
||||
Retrieves the transcript for a single video. This is just a shortcut for calling::
|
||||
|
||||
YouTubeTranscriptApi.list_transcripts(video_id, proxies).find_transcript(languages).fetch()
|
||||
|
||||
:param video_id: the youtube video id
|
||||
:type video_id: str
|
||||
:param languages: A list of language codes in a descending priority. For example, if this is set to ['de', 'en']
|
||||
it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails to
|
||||
do so.
|
||||
:type languages: [str]
|
||||
:type languages: list[str]
|
||||
:param proxies: a dictionary mapping of http and https proxies to be used for the network requests
|
||||
:type proxies: {'http': str, 'https': str} - http://docs.python-requests.org/en/master/user/advanced/#proxies
|
||||
:return: a list of dictionaries containing the 'text', 'start' and 'duration' keys
|
||||
:rtype: [{'text': str, 'start': float, 'end': float}]
|
||||
:rtype [{'text': str, 'start': float, 'end': float}]:
|
||||
"""
|
||||
with requests.Session() as http_client:
|
||||
http_client.proxies = proxies if proxies else {}
|
||||
return TranscriptListFetcher(http_client).fetch(video_id).find_transcript(languages).fetch()
|
||||
return cls.list_transcripts(video_id, proxies).find_transcript(languages).fetch()
|
||||
|
|
|
@ -14,22 +14,45 @@ class YouTubeTranscriptCli():
|
|||
def run(self):
|
||||
parsed_args = self._parse_args()
|
||||
|
||||
if parsed_args.exclude_manually_created and parsed_args.exclude_generated:
|
||||
return ''
|
||||
|
||||
proxies = None
|
||||
if parsed_args.http_proxy != '' or parsed_args.https_proxy != '':
|
||||
proxies = {"http": parsed_args.http_proxy, "https": parsed_args.https_proxy}
|
||||
|
||||
transcripts, unretrievable_videos = YouTubeTranscriptApi.get_transcripts(
|
||||
parsed_args.video_ids,
|
||||
languages=parsed_args.languages,
|
||||
continue_after_error=True,
|
||||
proxies=proxies
|
||||
)
|
||||
transcripts = []
|
||||
exceptions = []
|
||||
|
||||
for video_id in parsed_args.video_ids:
|
||||
try:
|
||||
transcripts.append(self._fetch_transcript(parsed_args, proxies, video_id))
|
||||
except Exception as exception:
|
||||
exceptions.append(exception)
|
||||
|
||||
return '\n\n'.join(
|
||||
[str(YouTubeTranscriptApi.CouldNotRetrieveTranscript(video_id)) for video_id in unretrievable_videos]
|
||||
[str(exception) for exception in exceptions]
|
||||
+ ([json.dumps(transcripts) if parsed_args.json else pprint.pformat(transcripts)] if transcripts else [])
|
||||
)
|
||||
|
||||
def _fetch_transcript(self, parsed_args, proxies, video_id):
|
||||
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, proxies=proxies)
|
||||
|
||||
if parsed_args.list_transcripts:
|
||||
return str(transcript_list)
|
||||
|
||||
if parsed_args.exclude_manually_created:
|
||||
transcript = transcript_list.find_generated_transcript(parsed_args.languages)
|
||||
elif parsed_args.exclude_generated:
|
||||
transcript = transcript_list.find_manually_created_transcript(parsed_args.languages)
|
||||
else:
|
||||
transcript = transcript_list.find_transcript(parsed_args.languages)
|
||||
|
||||
if parsed_args.translate:
|
||||
transcript = transcript.translate(parsed_args.translate)
|
||||
|
||||
return transcript.fetch()
|
||||
|
||||
def _parse_args(self):
|
||||
parser = argparse.ArgumentParser(
|
||||
description=(
|
||||
|
@ -38,6 +61,13 @@ class YouTubeTranscriptCli():
|
|||
'other selenium based solutions do!'
|
||||
)
|
||||
)
|
||||
parser.add_argument(
|
||||
'--list-transcripts',
|
||||
action='store_const',
|
||||
const=True,
|
||||
default=False,
|
||||
help='This will list the languages in which the given videos are available in.',
|
||||
)
|
||||
parser.add_argument('video_ids', nargs='+', type=str, help='List of YouTube video IDs.')
|
||||
parser.add_argument(
|
||||
'--languages',
|
||||
|
@ -46,11 +76,25 @@ class YouTubeTranscriptCli():
|
|||
type=str,
|
||||
help=(
|
||||
'A list of language codes in a descending priority. For example, if this is set to "de en" it will '
|
||||
'first try to fetch the german transcript (de) and then fetch the english transcipt (en) if it fails '
|
||||
'first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails '
|
||||
'to do so. As I can\'t provide a complete list of all working language codes with full certainty, you '
|
||||
'may have to play around with the language codes a bit, to find the one which is working for you!'
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
'--exclude-generated',
|
||||
action='store_const',
|
||||
const=True,
|
||||
default=False,
|
||||
help='If this flag is set transcripts which have been generated by YouTube will not be retrieved.',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--exclude-manually-created',
|
||||
action='store_const',
|
||||
const=True,
|
||||
default=False,
|
||||
help='If this flag is set transcripts which have been manually created will not be retrieved.',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--json',
|
||||
action='store_const',
|
||||
|
@ -59,13 +103,24 @@ class YouTubeTranscriptCli():
|
|||
help='If this flag is set the output will be JSON formatted.',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--http-proxy', dest='http_proxy',
|
||||
default='', metavar='URL',
|
||||
'--translate',
|
||||
default='',
|
||||
help=(
|
||||
'The language code for the language you want this transcript to be translated to. Use the '
|
||||
'--list-transcripts feature to find out which languages are translatable and which translation '
|
||||
'languages are available.'
|
||||
)
|
||||
)
|
||||
parser.add_argument(
|
||||
'--http-proxy',
|
||||
default='',
|
||||
metavar='URL',
|
||||
help='Use the specified HTTP proxy.'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--https-proxy', dest='https_proxy',
|
||||
default='', metavar='URL',
|
||||
'--https-proxy',
|
||||
default='',
|
||||
metavar='URL',
|
||||
help='Use the specified HTTPS proxy.'
|
||||
)
|
||||
|
||||
|
|
|
@ -11,7 +11,7 @@ class CouldNotRetrieveTranscript(Exception):
|
|||
GITHUB_REFERRAL = (
|
||||
'\n\nIf you are sure that the described cause is not responsible for this error '
|
||||
'and that a transcript should be retrievable, please create an issue at '
|
||||
'https://github.com/jdepoix/youtube-transcript-api/issues.'
|
||||
'https://github.com/jdepoix/youtube-transcript-api/issues. '
|
||||
'Please add which version of youtube_transcript_api you are using '
|
||||
'and provide the information needed to replicate the error. '
|
||||
'Also make sure that there are no open issues which already describe your problem!'
|
||||
|
@ -43,6 +43,18 @@ class TranscriptsDisabled(CouldNotRetrieveTranscript):
|
|||
CAUSE_MESSAGE = 'Subtitles are disabled for this video'
|
||||
|
||||
|
||||
class NoTranscriptAvailable(CouldNotRetrieveTranscript):
|
||||
CAUSE_MESSAGE = 'No transcripts are available for this video'
|
||||
|
||||
|
||||
class NotTranslatable(CouldNotRetrieveTranscript):
|
||||
CAUSE_MESSAGE = 'The requested language is not translatable'
|
||||
|
||||
|
||||
class TranslationLanguageNotAvailable(CouldNotRetrieveTranscript):
|
||||
CAUSE_MESSAGE = 'The requested translation language is not available'
|
||||
|
||||
|
||||
class NoTranscriptFound(CouldNotRetrieveTranscript):
|
||||
CAUSE_MESSAGE = (
|
||||
'No transcripts were found for any of the requested language codes: {requested_language_codes}\n\n'
|
||||
|
|
|
@ -12,7 +12,14 @@ from xml.etree import ElementTree
|
|||
import re
|
||||
|
||||
from ._html_unescaping import unescape
|
||||
from ._errors import VideoUnavailable, NoTranscriptFound, TranscriptsDisabled
|
||||
from ._errors import (
|
||||
VideoUnavailable,
|
||||
NoTranscriptFound,
|
||||
TranscriptsDisabled,
|
||||
NotTranslatable,
|
||||
TranslationLanguageNotAvailable,
|
||||
NoTranscriptAvailable,
|
||||
)
|
||||
from ._settings import WATCH_URL
|
||||
|
||||
|
||||
|
@ -36,9 +43,14 @@ class TranscriptListFetcher():
|
|||
|
||||
raise TranscriptsDisabled(video_id)
|
||||
|
||||
return json.loads(splitted_html[1].split(',"videoDetails')[0].replace('\n', ''))[
|
||||
'playerCaptionsTracklistRenderer'
|
||||
]
|
||||
captions_json = json.loads(
|
||||
splitted_html[1].split(',"videoDetails')[0].replace('\n', '')
|
||||
)['playerCaptionsTracklistRenderer']
|
||||
|
||||
if 'captionTracks' not in captions_json:
|
||||
raise NoTranscriptAvailable(video_id)
|
||||
|
||||
return captions_json
|
||||
|
||||
def _fetch_html(self, video_id):
|
||||
return self._http_client.get(WATCH_URL.format(video_id=video_id)).text.replace(
|
||||
|
@ -53,10 +65,7 @@ class TranscriptList():
|
|||
This object represents a list of transcripts. It can be iterated over to list all transcripts which are available
|
||||
for a given YouTube video. Also it provides functionality to search for a transcript in a given language.
|
||||
"""
|
||||
|
||||
# TODO implement iterator
|
||||
|
||||
def __init__(self, video_id, manually_created_transcripts, generated_transcripts):
|
||||
def __init__(self, video_id, manually_created_transcripts, generated_transcripts, translation_languages):
|
||||
"""
|
||||
The constructor is only for internal use. Use the static build method instead.
|
||||
|
||||
|
@ -66,10 +75,13 @@ class TranscriptList():
|
|||
:type manually_created_transcripts: dict[str, Transcript]
|
||||
:param generated_transcripts: dict mapping language codes to the generated transcripts
|
||||
:type generated_transcripts: dict[str, Transcript]
|
||||
:param translation_languages: list of languages which can be used for translatable languages
|
||||
:type translation_languages: list[dict[str, str]]
|
||||
"""
|
||||
self.video_id = video_id
|
||||
self._manually_created_transcripts = manually_created_transcripts
|
||||
self._generated_transcripts = generated_transcripts
|
||||
self._translation_languages = translation_languages
|
||||
|
||||
@staticmethod
|
||||
def build(http_client, video_id, captions_json):
|
||||
|
@ -83,7 +95,7 @@ class TranscriptList():
|
|||
:param captions_json: the JSON parsed from the YouTube pages static HTML
|
||||
:type captions_json: dict
|
||||
:return: the created TranscriptList
|
||||
:rtype TranscriptList
|
||||
:rtype TranscriptList:
|
||||
"""
|
||||
translation_languages = [
|
||||
{
|
||||
|
@ -108,15 +120,19 @@ class TranscriptList():
|
|||
caption['name']['simpleText'],
|
||||
caption['languageCode'],
|
||||
caption.get('kind', '') == 'asr',
|
||||
translation_languages if caption['isTranslatable'] else []
|
||||
translation_languages if caption.get('isTranslatable', False) else []
|
||||
)
|
||||
|
||||
return TranscriptList(
|
||||
video_id,
|
||||
manually_created_transcripts,
|
||||
generated_transcripts,
|
||||
translation_languages,
|
||||
)
|
||||
|
||||
def __iter__(self):
|
||||
return iter(list(self._manually_created_transcripts.values()) + list(self._generated_transcripts.values()))
|
||||
|
||||
def find_transcript(self, language_codes):
|
||||
"""
|
||||
Finds a transcript for a given language code. Manually created transcripts are returned first and only if none
|
||||
|
@ -126,9 +142,9 @@ class TranscriptList():
|
|||
:param language_codes: A list of language codes in a descending priority. For example, if this is set to
|
||||
['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if
|
||||
it fails to do so.
|
||||
:type languages: [str]
|
||||
:type languages: list[str]
|
||||
:return: the found Transcript
|
||||
:rtype: Transcript
|
||||
:rtype Transcript:
|
||||
:raises: NoTranscriptFound
|
||||
"""
|
||||
return self._find_transcript(language_codes, [self._manually_created_transcripts, self._generated_transcripts])
|
||||
|
@ -140,9 +156,9 @@ class TranscriptList():
|
|||
:param language_codes: A list of language codes in a descending priority. For example, if this is set to
|
||||
['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if
|
||||
it fails to do so.
|
||||
:type languages: [str]
|
||||
:type languages: list[str]
|
||||
:return: the found Transcript
|
||||
:rtype: Transcript
|
||||
:rtype Transcript:
|
||||
:raises: NoTranscriptFound
|
||||
"""
|
||||
return self._find_transcript(language_codes, [self._generated_transcripts,])
|
||||
|
@ -154,9 +170,9 @@ class TranscriptList():
|
|||
:param language_codes: A list of language codes in a descending priority. For example, if this is set to
|
||||
['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if
|
||||
it fails to do so.
|
||||
:type languages: [str]
|
||||
:type languages: list[str]
|
||||
:return: the found Transcript
|
||||
:rtype: Transcript
|
||||
:rtype Transcript:
|
||||
:raises: NoTranscriptFound
|
||||
"""
|
||||
return self._find_transcript(language_codes, [self._manually_created_transcripts,])
|
||||
|
@ -179,22 +195,28 @@ class TranscriptList():
|
|||
'(MANUALLY CREATED)\n'
|
||||
'{available_manually_created_transcript_languages}\n\n'
|
||||
'(GENERATED)\n'
|
||||
'{available_generated_transcripts}'
|
||||
'{available_generated_transcripts}\n\n'
|
||||
'(TRANSLATION LANGUAGES)\n'
|
||||
'{available_translation_languages}'
|
||||
).format(
|
||||
video_id=self.video_id,
|
||||
available_manually_created_transcript_languages=self._get_language_description(
|
||||
self._manually_created_transcripts.values()
|
||||
str(transcript) for transcript in self._manually_created_transcripts.values()
|
||||
),
|
||||
available_generated_transcripts=self._get_language_description(
|
||||
self._generated_transcripts.values()
|
||||
str(transcript) for transcript in self._generated_transcripts.values()
|
||||
),
|
||||
available_translation_languages=self._get_language_description(
|
||||
'{language_code} ("{language}")'.format(
|
||||
language=translation_language['language'],
|
||||
language_code=translation_language['language_code'],
|
||||
) for translation_language in self._translation_languages
|
||||
)
|
||||
)
|
||||
|
||||
def _get_language_description(self, transcripts):
|
||||
return '\n'.join(
|
||||
' - {transcript}'.format(transcript=str(transcript))
|
||||
for transcript in transcripts
|
||||
) if transcripts else 'None'
|
||||
def _get_language_description(self, transcript_strings):
|
||||
description = '\n'.join(' - {transcript}'.format(transcript=transcript) for transcript in transcript_strings)
|
||||
return description if description else 'None'
|
||||
|
||||
|
||||
class Transcript():
|
||||
|
@ -220,45 +242,49 @@ class Transcript():
|
|||
self.language_code = language_code
|
||||
self.is_generated = is_generated
|
||||
self.translation_languages = translation_languages
|
||||
self._translation_languages_dict = {
|
||||
translation_language['language_code']: translation_language['language']
|
||||
for translation_language in translation_languages
|
||||
}
|
||||
|
||||
def fetch(self):
|
||||
"""
|
||||
Loads the actual transcript data.
|
||||
|
||||
:return: a list of dictionaries containing the 'text', 'start' and 'duration' keys
|
||||
:rtype: [{'text': str, 'start': float, 'end': float}]
|
||||
:rtype [{'text': str, 'start': float, 'end': float}]:
|
||||
"""
|
||||
return _TranscriptParser().parse(
|
||||
self._http_client.get(self._url).text
|
||||
)
|
||||
|
||||
def __str__(self):
|
||||
return '{language_code} ("{language}")'.format(
|
||||
return '{language_code} ("{language}"){translation_description}'.format(
|
||||
language=self.language,
|
||||
language_code=self.language_code,
|
||||
translation_description='[TRANSLATABLE]' if self.is_translatable else ''
|
||||
)
|
||||
|
||||
# TODO integrate translations in future release
|
||||
# @property
|
||||
# def is_translatable(self):
|
||||
# return len(self.translation_languages) > 0
|
||||
#
|
||||
#
|
||||
# class TranslatableTranscript(Transcript):
|
||||
# def __init__(self, http_client, url, translation_languages):
|
||||
# super(TranslatableTranscript, self).__init__(http_client, url)
|
||||
# self._translation_languages = translation_languages
|
||||
# self._translation_language_codes = {language['language_code'] for language in translation_languages}
|
||||
#
|
||||
#
|
||||
# def translate(self, language_code):
|
||||
# if language_code not in self._translation_language_codes:
|
||||
# raise TranslatableTranscript.TranslationLanguageNotAvailable()
|
||||
#
|
||||
# return Transcript(
|
||||
# self._http_client,
|
||||
# '{url}&tlang={language_code}'.format(url=self._url, language_code=language_code)
|
||||
# )
|
||||
@property
|
||||
def is_translatable(self):
|
||||
return len(self.translation_languages) > 0
|
||||
|
||||
def translate(self, language_code):
|
||||
if not self.is_translatable:
|
||||
raise NotTranslatable(self.video_id)
|
||||
|
||||
if language_code not in self._translation_languages_dict:
|
||||
raise TranslationLanguageNotAvailable(self.video_id)
|
||||
|
||||
return Transcript(
|
||||
self._http_client,
|
||||
self.video_id,
|
||||
'{url}&tlang={language_code}'.format(url=self._url, language_code=language_code),
|
||||
self._translation_languages_dict[language_code],
|
||||
language_code,
|
||||
True,
|
||||
[],
|
||||
)
|
||||
|
||||
|
||||
class _TranscriptParser():
|
||||
|
@ -269,7 +295,7 @@ class _TranscriptParser():
|
|||
{
|
||||
'text': re.sub(self.HTML_TAG_REGEX, '', unescape(xml_element.text)),
|
||||
'start': float(xml_element.attrib['start']),
|
||||
'duration': float(xml_element.attrib['dur']),
|
||||
'duration': float(xml_element.attrib.get('dur', '0.0')),
|
||||
}
|
||||
for xml_element in ElementTree.fromstring(plain_data)
|
||||
if xml_element.text is not None
|
||||
|
|
File diff suppressed because one or more lines are too long
|
@ -5,7 +5,15 @@ import os
|
|||
|
||||
import httpretty
|
||||
|
||||
from youtube_transcript_api import YouTubeTranscriptApi, VideoUnavailable, NoTranscriptFound, TranscriptsDisabled
|
||||
from youtube_transcript_api import (
|
||||
YouTubeTranscriptApi,
|
||||
TranscriptsDisabled,
|
||||
NoTranscriptFound,
|
||||
VideoUnavailable,
|
||||
NoTranscriptAvailable,
|
||||
NotTranslatable,
|
||||
TranslationLanguageNotAvailable,
|
||||
)
|
||||
|
||||
|
||||
def load_asset(filename):
|
||||
|
@ -42,6 +50,51 @@ class TestYouTubeTranscriptApi(TestCase):
|
|||
]
|
||||
)
|
||||
|
||||
def test_list_transcripts(self):
|
||||
transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8')
|
||||
|
||||
language_codes = {transcript.language_code for transcript in transcript_list}
|
||||
|
||||
self.assertEqual(language_codes, {'zh', 'de', 'en', 'hi', 'ja', 'ko', 'es', 'cs', 'en'})
|
||||
|
||||
def test_list_transcripts__find_manually_created(self):
|
||||
transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8')
|
||||
transcript = transcript_list.find_manually_created_transcript(['cs'])
|
||||
|
||||
self.assertFalse(transcript.is_generated)
|
||||
|
||||
|
||||
def test_list_transcripts__find_generated(self):
|
||||
transcript_list = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8')
|
||||
|
||||
with self.assertRaises(NoTranscriptFound):
|
||||
transcript_list.find_generated_transcript(['cs'])
|
||||
|
||||
transcript = transcript_list.find_generated_transcript(['en'])
|
||||
|
||||
self.assertTrue(transcript.is_generated)
|
||||
|
||||
def test_translate_transcript(self):
|
||||
transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en'])
|
||||
|
||||
translated_transcript = transcript.translate('af')
|
||||
|
||||
self.assertEqual(translated_transcript.language_code, 'af')
|
||||
self.assertIn('&tlang=af', translated_transcript._url)
|
||||
|
||||
def test_translate_transcript__translation_language_not_available(self):
|
||||
transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en'])
|
||||
|
||||
with self.assertRaises(TranslationLanguageNotAvailable):
|
||||
transcript.translate('xyz')
|
||||
|
||||
def test_translate_transcript__not_translatable(self):
|
||||
transcript = YouTubeTranscriptApi.list_transcripts('GJLlxj_dtq8').find_transcript(['en'])
|
||||
transcript.translation_languages = []
|
||||
|
||||
with self.assertRaises(NotTranslatable):
|
||||
transcript.translate('af')
|
||||
|
||||
def test_get_transcript__correct_language_is_used(self):
|
||||
YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8', ['de', 'en'])
|
||||
query_string = httpretty.last_request().querystring
|
||||
|
@ -88,6 +141,16 @@ class TestYouTubeTranscriptApi(TestCase):
|
|||
with self.assertRaises(NoTranscriptFound):
|
||||
YouTubeTranscriptApi.get_transcript('GJLlxj_dtq8', languages=['cz'])
|
||||
|
||||
def test_get_transcript__exception_if_no_transcript_available(self):
|
||||
httpretty.register_uri(
|
||||
httpretty.GET,
|
||||
'https://www.youtube.com/watch',
|
||||
body=load_asset('youtube_no_transcript_available.html.static')
|
||||
)
|
||||
|
||||
with self.assertRaises(NoTranscriptAvailable):
|
||||
YouTubeTranscriptApi.get_transcript('MwBPvcYFY2E')
|
||||
|
||||
def test_get_transcripts(self):
|
||||
video_id_1 = 'video_id_1'
|
||||
video_id_2 = 'video_id_2'
|
||||
|
|
|
@ -3,10 +3,27 @@ from mock import MagicMock
|
|||
|
||||
import json
|
||||
|
||||
from youtube_transcript_api._cli import YouTubeTranscriptCli, YouTubeTranscriptApi
|
||||
from youtube_transcript_api import YouTubeTranscriptApi, VideoUnavailable
|
||||
from youtube_transcript_api._cli import YouTubeTranscriptCli
|
||||
|
||||
|
||||
class TestYouTubeTranscriptCli(TestCase):
|
||||
def setUp(self):
|
||||
self.transcript_mock = MagicMock()
|
||||
self.transcript_mock.fetch = MagicMock(return_value=[
|
||||
{'text': 'Hey, this is just a test', 'start': 0.0, 'duration': 1.54},
|
||||
{'text': 'this is not the original transcript', 'start': 1.54, 'duration': 4.16},
|
||||
{'text': 'just something shorter, I made up for testing', 'start': 5.7, 'duration': 3.239}
|
||||
])
|
||||
self.transcript_mock.translate = MagicMock(return_value=self.transcript_mock)
|
||||
|
||||
self.transcript_list_mock = MagicMock()
|
||||
self.transcript_list_mock.find_generated_transcript = MagicMock(return_value=self.transcript_mock)
|
||||
self.transcript_list_mock.find_manually_created_transcript = MagicMock(return_value=self.transcript_mock)
|
||||
self.transcript_list_mock.find_transcript = MagicMock(return_value=self.transcript_mock)
|
||||
|
||||
YouTubeTranscriptApi.list_transcripts = MagicMock(return_value=self.transcript_list_mock)
|
||||
|
||||
def test_argument_parsing(self):
|
||||
parsed_args = YouTubeTranscriptCli('v1 v2 --json --languages de en'.split())._parse_args()
|
||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||
|
@ -106,32 +123,107 @@ class TestYouTubeTranscriptCli(TestCase):
|
|||
self.assertEqual(parsed_args.http_proxy, '')
|
||||
self.assertEqual(parsed_args.https_proxy, '')
|
||||
|
||||
def test_argument_parsing__list_transcripts(self):
|
||||
parsed_args = YouTubeTranscriptCli('--list-transcripts v1 v2'.split())._parse_args()
|
||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||
self.assertTrue(parsed_args.list_transcripts)
|
||||
|
||||
parsed_args = YouTubeTranscriptCli('v1 v2 --list-transcripts'.split())._parse_args()
|
||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||
self.assertTrue(parsed_args.list_transcripts)
|
||||
|
||||
def test_argument_parsing__translate(self):
|
||||
parsed_args = YouTubeTranscriptCli('v1 v2 --languages de en --translate cz'.split())._parse_args()
|
||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||
self.assertEqual(parsed_args.json, False)
|
||||
self.assertEqual(parsed_args.languages, ['de', 'en'])
|
||||
self.assertEqual(parsed_args.translate, 'cz')
|
||||
|
||||
parsed_args = YouTubeTranscriptCli('v1 v2 --translate cz --languages de en'.split())._parse_args()
|
||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||
self.assertEqual(parsed_args.json, False)
|
||||
self.assertEqual(parsed_args.languages, ['de', 'en'])
|
||||
self.assertEqual(parsed_args.translate, 'cz')
|
||||
|
||||
def test_argument_parsing__manually_or_generated(self):
|
||||
parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-manually-created'.split())._parse_args()
|
||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||
self.assertTrue(parsed_args.exclude_manually_created)
|
||||
self.assertFalse(parsed_args.exclude_generated)
|
||||
|
||||
parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-generated'.split())._parse_args()
|
||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||
self.assertFalse(parsed_args.exclude_manually_created)
|
||||
self.assertTrue(parsed_args.exclude_generated)
|
||||
|
||||
parsed_args = YouTubeTranscriptCli('v1 v2 --exclude-manually-created --exclude-generated'.split())._parse_args()
|
||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||
self.assertTrue(parsed_args.exclude_manually_created)
|
||||
self.assertTrue(parsed_args.exclude_generated)
|
||||
|
||||
def test_run(self):
|
||||
YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([], []))
|
||||
YouTubeTranscriptCli('v1 v2 --languages de en'.split()).run()
|
||||
|
||||
YouTubeTranscriptApi.get_transcripts.assert_called_once_with(
|
||||
['v1', 'v2'],
|
||||
languages=['de', 'en'],
|
||||
continue_after_error=True,
|
||||
proxies=None
|
||||
YouTubeTranscriptApi.list_transcripts.assert_any_call('v1', proxies=None)
|
||||
YouTubeTranscriptApi.list_transcripts.assert_any_call('v2', proxies=None)
|
||||
|
||||
self.transcript_list_mock.find_transcript.assert_any_call(['de', 'en'])
|
||||
|
||||
def test_run__failing_transcripts(self):
|
||||
YouTubeTranscriptApi.list_transcripts = MagicMock(side_effect=VideoUnavailable('video_id'))
|
||||
|
||||
output = YouTubeTranscriptCli('v1 --languages de en'.split()).run()
|
||||
|
||||
self.assertEqual(output, str(VideoUnavailable('video_id')))
|
||||
|
||||
def test_run__exclude_generated(self):
|
||||
YouTubeTranscriptCli('v1 v2 --languages de en --exclude-generated'.split()).run()
|
||||
|
||||
self.transcript_list_mock.find_manually_created_transcript.assert_any_call(['de', 'en'])
|
||||
|
||||
def test_run__exclude_manually_created(self):
|
||||
YouTubeTranscriptCli('v1 v2 --languages de en --exclude-manually-created'.split()).run()
|
||||
|
||||
self.transcript_list_mock.find_generated_transcript.assert_any_call(['de', 'en'])
|
||||
|
||||
def test_run__exclude_manually_created_and_generated(self):
|
||||
self.assertEqual(
|
||||
YouTubeTranscriptCli('v1 v2 --languages de en --exclude-manually-created --exclude-generated'.split()).run(),
|
||||
''
|
||||
)
|
||||
|
||||
def test_run__translate(self):
|
||||
YouTubeTranscriptCli('v1 v2 --languages de en --translate cz'.split()).run(),
|
||||
|
||||
self.transcript_mock.translate.assert_any_call('cz')
|
||||
|
||||
def test_run__list_transcripts(self):
|
||||
YouTubeTranscriptCli('--list-transcripts v1 v2'.split()).run()
|
||||
|
||||
YouTubeTranscriptApi.list_transcripts.assert_any_call('v1', proxies=None)
|
||||
YouTubeTranscriptApi.list_transcripts.assert_any_call('v2', proxies=None)
|
||||
|
||||
def test_run__json_output(self):
|
||||
YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([{'boolean': True}], []))
|
||||
output = YouTubeTranscriptCli('v1 v2 --languages de en --json'.split()).run()
|
||||
|
||||
# will fail if output is not valid json
|
||||
json.loads(output)
|
||||
|
||||
def test_run__proxies(self):
|
||||
YouTubeTranscriptApi.get_transcripts = MagicMock(return_value=([], []))
|
||||
YouTubeTranscriptCli(
|
||||
'v1 v2 --languages de en --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port'.split()).run()
|
||||
(
|
||||
'v1 v2 --languages de en '
|
||||
'--http-proxy http://user:pass@domain:port '
|
||||
'--https-proxy https://user:pass@domain:port'
|
||||
).split()
|
||||
).run()
|
||||
|
||||
YouTubeTranscriptApi.get_transcripts.assert_called_once_with(
|
||||
['v1', 'v2'],
|
||||
languages=['de', 'en'],
|
||||
continue_after_error=True,
|
||||
YouTubeTranscriptApi.list_transcripts.assert_any_call(
|
||||
'v1',
|
||||
proxies={'http': 'http://user:pass@domain:port', 'https': 'https://user:pass@domain:port'}
|
||||
)
|
||||
|
||||
YouTubeTranscriptApi.list_transcripts.assert_any_call(
|
||||
'v2',
|
||||
proxies={'http': 'http://user:pass@domain:port', 'https': 'https://user:pass@domain:port'}
|
||||
)
|
||||
|
|
Loading…
Reference in New Issue