updated README
This commit is contained in:
parent
936ef3c1d0
commit
aa34a2ceb3
133
README.md
133
README.md
|
@ -1,4 +1,5 @@
|
|||
# YouTube Transcript/Subtitle API (including automatically generated subtitles)
|
||||
|
||||
# YouTube Transcript/Subtitle API (including automatically generated subtitles and subtitle translations)
|
||||
|
||||
[](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)
|
||||
[](https://travis-ci.org/jdepoix/youtube-transcript-api)
|
||||
|
@ -7,7 +8,7 @@
|
|||
[](https://pypi.org/project/youtube-transcript-api/)
|
||||
[](https://pypi.org/project/youtube-transcript-api/)
|
||||
|
||||
This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require a headless browser, like other selenium based solutions do!
|
||||
This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!
|
||||
|
||||
## Install
|
||||
|
||||
|
@ -23,13 +24,11 @@ If you want to use it from source, you'll have to install the dependencies manua
|
|||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## How to use it
|
||||
You can either integrate this module [into an existing application](#api), or just use it via an [CLI](#cli).
|
||||
|
||||
You could either integrate this module into an existing application, or just use it via an CLI
|
||||
## API
|
||||
|
||||
### In code
|
||||
|
||||
To get a transcript for a given video you can do:
|
||||
The easiest way to get a transcript for a given video is to execute:
|
||||
|
||||
```python
|
||||
from youtube_transcript_api import YouTubeTranscriptApi
|
||||
|
@ -61,9 +60,9 @@ You can also add the `languages` param if you want to make sure the transcripts
|
|||
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
||||
```
|
||||
|
||||
It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. As I can't provide a complete list of all working language codes with full certainty, you may have to play around with the language codes a bit, to find the one which is working for you!
|
||||
It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out which languages are available first, [have a look at `list_transcripts()`](#list-available-transcripts)
|
||||
|
||||
To get transcripts for a list fo video ids you can call:
|
||||
To get transcripts for a list of video ids you can call:
|
||||
|
||||
```python
|
||||
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
||||
|
@ -71,7 +70,100 @@ YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
|||
|
||||
`languages` also is optional here.
|
||||
|
||||
### CLI
|
||||
### List available transcripts
|
||||
|
||||
If you want to list all transcripts which are available for a given video you can call
|
||||
|
||||
```python
|
||||
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, languages=['de', 'en'])
|
||||
```
|
||||
|
||||
This will return a `TranscriptList` object which is iterable and provides methods to filter the list of transcripts for specific languages and types, like:
|
||||
|
||||
```python
|
||||
transcript = transcript_list.find_transcript(['de', 'en'])
|
||||
```
|
||||
|
||||
By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this default behaviour by searching for specific transcript types:
|
||||
|
||||
```python
|
||||
# filter for manually created transcripts
|
||||
transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
|
||||
|
||||
# or automatically generated ones
|
||||
transcript = transcript_list.find_generated_transcript(['de', 'en'])
|
||||
```
|
||||
|
||||
The methods `find_generated_transcript`, `find_manually_created_transcript`, `find_generated_transcript` return `Transcript` objects. They contain metadata regarding the transcript
|
||||
|
||||
```python
|
||||
print(
|
||||
transcript.video_id,
|
||||
transcript.language,
|
||||
transcript.language_code,
|
||||
# whether it has been manually created or generated by YouTube
|
||||
transcript.is_generated,
|
||||
# whether this transcript can be translated or not
|
||||
transcript.is_translatable,
|
||||
# a list of languages the transcript can be translated to
|
||||
transcript.translation_languages,
|
||||
)
|
||||
```
|
||||
|
||||
and provide the method, which allows you to fetch the actual transcript data:
|
||||
|
||||
```python
|
||||
transcript.fetch()
|
||||
```
|
||||
|
||||
### Translate transcript
|
||||
|
||||
YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated `Transcript` object:
|
||||
|
||||
```python
|
||||
transcript = transcript_list.find_transcript(['en'])
|
||||
translated_transcript = transcript.translate('de')
|
||||
print(translated_transcript.fetch())
|
||||
```
|
||||
|
||||
### By example
|
||||
```python
|
||||
# retrieve the available transcripts
|
||||
transcript_list = YouTubeTranscriptApi.get('video_id')
|
||||
|
||||
# iterate over all available transcripts
|
||||
for transcript in transcript_list:
|
||||
|
||||
# the Transcript object provides metadata properties
|
||||
print(
|
||||
transcript.video_id,
|
||||
transcript.language,
|
||||
transcript.language_code,
|
||||
# whether it has been manually created or generated by YouTube
|
||||
transcript.is_generated,
|
||||
# whether this transcript can be translated or not
|
||||
transcript.is_translatable,
|
||||
# a list of languages the transcript can be translated to
|
||||
transcript.translation_languages,
|
||||
)
|
||||
|
||||
# fetch the actual transcript data
|
||||
print(transcript.fetch())
|
||||
|
||||
# translating the transcript will return another transcript object
|
||||
print(transcript.translate('en').fetch())
|
||||
|
||||
# you can also directly filter for the language you are looking for, using the transcript list
|
||||
transcript = transcript_list.find_transcript(['de', 'en'])
|
||||
|
||||
# or just filter for manually created transcripts
|
||||
transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
|
||||
|
||||
# or automatically generated ones
|
||||
transcript = transcript_list.find_generated_transcript(['de', 'en'])
|
||||
```
|
||||
|
||||
## CLI
|
||||
|
||||
Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:
|
||||
|
||||
|
@ -85,13 +177,32 @@ The CLI also gives you the option to provide a list of preferred languages:
|
|||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en
|
||||
```
|
||||
|
||||
You can also specify if you want to exclude automatically generated or manually created subtitles:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-generated
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --exclude-manually-created
|
||||
```
|
||||
|
||||
If you would prefer to write it into a file or pipe it into another application, you can also output the results as json using the following line:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages de en --json > transcripts.json
|
||||
```
|
||||
|
||||
### Proxy
|
||||
Translating transcripts using the CLI is also possible:
|
||||
|
||||
```
|
||||
youtube_transcript_api <first_video_id> <second_video_id> ... --languages en --translate de
|
||||
```
|
||||
|
||||
If you are not sure which languages are available for a given video you can call:
|
||||
|
||||
```
|
||||
youtube_transcript_api --list-transcripts <first_video_id>
|
||||
```
|
||||
|
||||
## Proxy
|
||||
|
||||
You can specify a https/http proxy, which will be used during the requests to YouTube:
|
||||
|
||||
|
|
Loading…
Reference in New Issue