{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Get all videos of a channel" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ThetaDev @ Embedded World 2019 ZPxEr4YdWt8\n", "Easter special: 3D printed Bunny _I5IFObm_-k\n", "ThetaDevlog#2 - MySensors singleLED mmEDPbbSnaY\n", "ThetaDevlog#1 - MySensors Smart Home! Cda4zS-1j-k\n" ] } ], "source": [ "import scrapetube\n", "\n", "channel_url = 'https://www.youtube.com/channel/UCGiJh0NZ52wRhYKYnuZI08Q'\n", "\n", "videos = list(scrapetube.get_channel(channel_url=channel_url))\n", "\n", "for v in videos:\n", " print(v['title']['runs'][0]['text'], v['videoId'])" ] }, { "cell_type": "markdown", "source": [ "### Get channel feed\n", "\n", "YouTube provides an XML feed for each channel. The feed contains the latest 15 videos, so it is not suitable for downloading an entire channel (use the method described above for that).\n", "Getting the feed is a lot faster though, so it is the preferred method to look for new videos on subscribed channels." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 17, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ThetaDev @ Embedded World 2019 ZPxEr4YdWt8\n", "Easter special: 3D printed Bunny _I5IFObm_-k\n", "ThetaDevlog#2 - MySensors singleLED mmEDPbbSnaY\n", "ThetaDevlog#1 - MySensors Smart Home! Cda4zS-1j-k\n" ] } ], "source": [ "import feedparser\n", "\n", "channel_id = 'UCGiJh0NZ52wRhYKYnuZI08Q'\n", "\n", "feed_url = f'https://www.youtube.com/feeds/videos.xml?channel_id={channel_id}'\n", "feed = feedparser.parse(feed_url)\n", "\n", "for entry in feed['entries']:\n", " print(entry['title'], entry['yt_videoid'])" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "### Get channel metadata (ID, name, description, avatar)\n", "\n", "The channel ID is not always contained in the channel URL, because large channels may have vanity URLs that start with ``/c/`` (e.g. https://www.youtube.com/c/LinusTechTips)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 1, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Kanal-ID: UCGiJh0NZ52wRhYKYnuZI08Q\n", "Name: ThetaDev\n", "Description: I'm ThetaDev. I love creating cool projects using electronics, 3D printers and other awesome tech-based stuff.\n", "Avatar: https://yt3.ggpht.com/ytc/AKedOLSnFfmpibLLoqyaYdsF6bJ-zaLPzomII__FrJve1w=s900-c-k-c0x00ffffff-no-rj" ] } ], "source": [ "from scrapetube import scrapetube\n", "import requests\n", "import json\n", "\n", "channel_url = 'https://www.youtube.com/channel/UCGiJh0NZ52wRhYKYnuZI08Q'\n", "channel_url2 = 'https://www.youtube.com/c/MrBeast6000'\n", "\n", "session = requests.Session()\n", "session.headers[\n", " \"User-Agent\"\n", "] = \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36\"\n", "\n", "url = f\"{channel_url}/videos?view=0&flow=grid\"\n", "\n", "html = scrapetube.get_initial_data(session, url)\n", "data = json.loads(\n", " scrapetube.get_json_from_html(html, \"var ytInitialData = \", 0, \"};\") + \"}\"\n", ")\n", "metadata = data['metadata']['channelMetadataRenderer']\n", "\n", "channel_id = metadata['externalId']\n", "name = metadata['title']\n", "description = metadata['description']\n", "avatar = metadata['avatar']['thumbnails'][0]['url']\n", "\n", "print('Kanal-ID:', channel_id)\n", "print('Name:', name)\n", "print('Description:', description)\n", "print('Avatar:', avatar)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "### Download video\n", "\n", "Videos are downloaded using ``yt-dlp``. We also use the SponsorBlock database to cut out sponsor segments." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[youtube] mmEDPbbSnaY: Downloading webpage\n", "[youtube] mmEDPbbSnaY: Downloading android player API JSON\n", "[youtube] mmEDPbbSnaY: Downloading MPD manifest\n", "[youtube] mmEDPbbSnaY: Downloading MPD manifest\n", "[SponsorBlock] Fetching SponsorBlock segments\n", "[SponsorBlock] No segments were found in the SponsorBlock database\n", "[info] mmEDPbbSnaY: Downloading 1 format(s): 251\n", "[download] Destination: ThetaDevlog#2 - MySensors singleLED [mmEDPbbSnaY].webm\n", "[download] 100% of 7.80MiB in 00:00 \n", "[ExtractAudio] Destination: ThetaDevlog#2 - MySensors singleLED [mmEDPbbSnaY].mp3\n", "Deleting original file ThetaDevlog#2 - MySensors singleLED [mmEDPbbSnaY].webm (pass -k to keep)\n", "[ModifyChapters] SponsorBlock information is unavailable\n", "Video: ThetaDevlog#2 - MySensors singleLED\n", "Channel: ThetaDev\n", "Thumbnail: https://i.ytimg.com/vi_webp/mmEDPbbSnaY/maxresdefault.webp\n", "Description The PCBs and components for the MySensors smart home devices arrived!\n", "In this video I'll show you how to build the singleLED controller to switch/dim your 12V led lights. Detailed building instructions can be found on OpenHardware or GitHub.\n", "\n", "__PROJECT_LINKS___________________________\n", "OpenHardware: https://www.openhardware.io/view/563\n", "GitHub: https://github.com/Theta-Dev/MySensors-singleLED\n", "\n", "Programming adapter: https://thdev.org/?Projects___misc___micro_JST\n", "Board definitions: http://files.thdev.org/arduino/atmega.zip\n", "\n", "__COMPONENT_SUPPLIERS__________________\n", "Electronic components: https://www.aliexpress.com/\n", "PCBs: http://www.allpcb.com/\n", "3D printing filament: https://www.dasfilament.de/\n", "______________________________________________\n", "My website: https://thdev.org\n", "Twitter: https://twitter.com/Theta_Dev\n", "______________________________________________\n", "Music by Bartlebeats: https://bartlebeats.bandcamp.com\n" ] } ], "source": [ "from operator import itemgetter\n", "from yt_dlp import YoutubeDL\n", "\n", "video_id = 'mmEDPbbSnaY'\n", "\n", "ydl_params = {\n", " 'format': 'bestaudio',\n", " 'postprocessors': [\n", " {\n", " 'key': 'FFmpegExtractAudio',\n", " 'preferredcodec': 'mp3'\n", " },\n", " {\n", " 'key': 'SponsorBlock',\n", " 'categories': ['sponsor'],\n", " 'when': 'after_filter'\n", " },\n", " {\n", " 'key': 'ModifyChapters',\n", " 'remove_sponsor_segments': ['sponsor']\n", " }\n", " ]\n", "}\n", "\n", "\n", "def get_thumbnail_url(vinfo):\n", " \"\"\"Get the best quality thumbnail\"\"\"\n", " return max(vinfo['thumbnails'], key=itemgetter('preference'))['url']\n", "\n", "\n", "with YoutubeDL(ydl_params) as ydl:\n", " # extract_info downloads the video and returns its metadata\n", " vinfo = ydl.extract_info(video_id)\n", "\n", " title = vinfo['fulltitle']\n", " thumbnail = get_thumbnail_url(vinfo)\n", " channel_name = vinfo['uploader']\n", " description = vinfo['description']\n", "\n", "print('Video:', title)\n", "print('Channel:', channel_name)\n", "print('Thumbnail:', thumbnail)\n", "print('Description', description)" ] }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 1 }