ucast/notes/YouTubeDownloading.ipynb

9 KiB

Get all videos of a channel

import scrapetube

channel_url = 'https://www.youtube.com/channel/UCGiJh0NZ52wRhYKYnuZI08Q'

videos = list(scrapetube.get_channel(channel_url=channel_url))

for v in videos:
    print(v['title']['runs'][0]['text'], v['videoId'])
ThetaDev @ Embedded World 2019 ZPxEr4YdWt8
Easter special: 3D printed Bunny _I5IFObm_-k
ThetaDevlog#2 - MySensors singleLED mmEDPbbSnaY
ThetaDevlog#1 - MySensors Smart Home! Cda4zS-1j-k

Get channel feed

YouTube provides an XML feed for each channel. The feed contains the latest 15 videos, so it is not suitable for downloading an entire channel (use the method described above for that). Getting the feed is a lot faster though, so it is the preferred method to look for new videos on subscribed channels.

import feedparser

channel_id = 'UCGiJh0NZ52wRhYKYnuZI08Q'

feed_url = f'https://www.youtube.com/feeds/videos.xml?channel_id={channel_id}'
feed = feedparser.parse(feed_url)

for entry in feed['entries']:
    print(entry['title'], entry['yt_videoid'])
ThetaDev @ Embedded World 2019 ZPxEr4YdWt8
Easter special: 3D printed Bunny _I5IFObm_-k
ThetaDevlog#2 - MySensors singleLED mmEDPbbSnaY
ThetaDevlog#1 - MySensors Smart Home! Cda4zS-1j-k

Get channel metadata (ID, name, description, avatar)

The channel ID is not always contained in the channel URL, because large channels may have vanity URLs that start with /c/ (e.g. https://www.youtube.com/c/LinusTechTips)

from scrapetube import scrapetube
import requests
import json

channel_url = 'https://www.youtube.com/channel/UCGiJh0NZ52wRhYKYnuZI08Q'
channel_url2 = 'https://www.youtube.com/c/MrBeast6000'

session = requests.Session()
session.headers[
    "User-Agent"
] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36"

url = f"{channel_url}/videos?view=0&flow=grid"

html = scrapetube.get_initial_data(session, url)
data = json.loads(
    scrapetube.get_json_from_html(html, "var ytInitialData = ", 0, "};") + "}"
)
metadata = data['metadata']['channelMetadataRenderer']

channel_id = metadata['externalId']
name = metadata['title']
description = metadata['description']
avatar = metadata['avatar']['thumbnails'][0]['url']

print('Kanal-ID:', channel_id)
print('Name:', name)
print('Description:', description)
print('Avatar:', avatar)
Kanal-ID: UCGiJh0NZ52wRhYKYnuZI08Q
Name: ThetaDev
Description: I'm ThetaDev. I love creating cool projects using electronics, 3D printers and other awesome tech-based stuff.
Avatar: https://yt3.ggpht.com/ytc/AKedOLSnFfmpibLLoqyaYdsF6bJ-zaLPzomII__FrJve1w=s900-c-k-c0x00ffffff-no-rj

Download video

Videos are downloaded using yt-dlp. We also use the SponsorBlock database to cut out sponsor segments.

from operator import itemgetter
from yt_dlp import YoutubeDL

video_id = 'mmEDPbbSnaY'

ydl_params = {
    'format': 'bestaudio',
    'postprocessors': [
        {
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3'
        },
        {
            'key': 'SponsorBlock',
            'categories': ['sponsor'],
            'when': 'after_filter'
        },
        {
            'key': 'ModifyChapters',
            'remove_sponsor_segments': ['sponsor']
        }
    ]
}


def get_thumbnail_url(vinfo):
    """Get the best quality thumbnail"""
    return max(vinfo['thumbnails'], key=itemgetter('preference'))['url']


with YoutubeDL(ydl_params) as ydl:
    # extract_info downloads the video and returns its metadata
    vinfo = ydl.extract_info(video_id)

    title = vinfo['fulltitle']
    thumbnail = get_thumbnail_url(vinfo)
    channel_name = vinfo['uploader']
    description = vinfo['description']

print('Video:', title)
print('Channel:', channel_name)
print('Thumbnail:', thumbnail)
print('Description', description)
[youtube] mmEDPbbSnaY: Downloading webpage
[youtube] mmEDPbbSnaY: Downloading android player API JSON
[youtube] mmEDPbbSnaY: Downloading MPD manifest
[youtube] mmEDPbbSnaY: Downloading MPD manifest
[SponsorBlock] Fetching SponsorBlock segments
[SponsorBlock] No segments were found in the SponsorBlock database
[info] mmEDPbbSnaY: Downloading 1 format(s): 251
[download] Destination: ThetaDevlog#2 - MySensors singleLED [mmEDPbbSnaY].webm
[download] 100% of 7.80MiB in 00:00                  
[ExtractAudio] Destination: ThetaDevlog#2 - MySensors singleLED [mmEDPbbSnaY].mp3
Deleting original file ThetaDevlog#2 - MySensors singleLED [mmEDPbbSnaY].webm (pass -k to keep)
[ModifyChapters] SponsorBlock information is unavailable
Video: ThetaDevlog#2 - MySensors singleLED
Channel: ThetaDev
Thumbnail: https://i.ytimg.com/vi_webp/mmEDPbbSnaY/maxresdefault.webp
Description The PCBs and components for the MySensors smart home devices arrived!
In this video I'll show you how to build the singleLED controller to switch/dim your 12V led lights. Detailed building instructions can be found on OpenHardware or GitHub.

__PROJECT_LINKS___________________________
OpenHardware: https://www.openhardware.io/view/563
GitHub: https://github.com/Theta-Dev/MySensors-singleLED

Programming adapter: https://thdev.org/?Projects___misc___micro_JST
Board definitions: http://files.thdev.org/arduino/atmega.zip

__COMPONENT_SUPPLIERS__________________
Electronic components: https://www.aliexpress.com/
PCBs: http://www.allpcb.com/
3D printing filament: https://www.dasfilament.de/
______________________________________________
My website: https://thdev.org
Twitter: https://twitter.com/Theta_Dev
______________________________________________
Music by Bartlebeats: https://bartlebeats.bandcamp.com