如何使用 Python 从 YouTube 链接中提取视频 ID?

发布于 2024-10-06 05:41:03 字数 316 浏览 7 评论 0原文

我知道使用 PHP 的 parse_urlparse_str 函数可以轻松完成此操作:

$subject = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1";
$url = parse_url($subject);
parse_str($url['query'], $query);
var_dump($query);

但是如何使用 Python 实现此目的?我可以执行urlparse,但是接下来怎么办?

I know this can be easily done using PHP's parse_url and parse_str functions:

$subject = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1";
$url = parse_url($subject);
parse_str($url['query'], $query);
var_dump($query);

But how to achieve this using Python? I can do urlparse but what next?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

救赎№ 2024-10-13 05:41:04

Python 有一个用于解析 URL 的库

import urlparse
url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
query = urlparse.parse_qs(url_data.query)
video = query["v"][0]

Python has a library for parsing URLs.

import urlparse
url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
query = urlparse.parse_qs(url_data.query)
video = query["v"][0]
寄与心 2024-10-13 05:41:04

这是 Mikhail Kashkin 解决方案的 Python3 版本,添加了场景。

from urllib.parse import urlparse, parse_qs
from contextlib import suppress


# noinspection PyTypeChecker
def get_yt_id(url, ignore_playlist=False):
    # Examples:
    # - http://youtu.be/SA2iWivDJiE
    # - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
    # - http://www.youtube.com/embed/SA2iWivDJiE
    # - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
    query = urlparse(url)
    if query.hostname == 'youtu.be': return query.path[1:]
    if query.hostname in {'www.youtube.com', 'youtube.com', 'music.youtube.com'}:
        if not ignore_playlist:
        # use case: get playlist id not current video in playlist
            with suppress(KeyError):
                return parse_qs(query.query)['list'][0]
        if query.path == '/watch': return parse_qs(query.query)['v'][0]
        if query.path[:7] == '/watch/': return query.path.split('/')[2]
        if query.path[:7] == '/embed/': return query.path.split('/')[2]
        if query.path[:3] == '/v/': return query.path.split('/')[2]
   # returns None for invalid YouTube url

# unit test
@pytest.mark.parametrize(
    'url,expected_id',
    (
        ('https://youtu.be/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/watch?v=Dlxu28sQfkE&feature=youtu.be', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/watch/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/embed/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/v/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/playlist?list=PLRbcUrcJVEmX_eaAsubNOWfE4SlhGqjW4', 'PLRbcUrcJVEmX_eaAsubNOWfE4SlhGqjW4'),
    ),
)
def test_yt_id(url, expected_id):
    assert get_yt_id(url) == expected_id

This is the Python3 version of Mikhail Kashkin's solution with added scenarios.

from urllib.parse import urlparse, parse_qs
from contextlib import suppress


# noinspection PyTypeChecker
def get_yt_id(url, ignore_playlist=False):
    # Examples:
    # - http://youtu.be/SA2iWivDJiE
    # - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
    # - http://www.youtube.com/embed/SA2iWivDJiE
    # - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
    query = urlparse(url)
    if query.hostname == 'youtu.be': return query.path[1:]
    if query.hostname in {'www.youtube.com', 'youtube.com', 'music.youtube.com'}:
        if not ignore_playlist:
        # use case: get playlist id not current video in playlist
            with suppress(KeyError):
                return parse_qs(query.query)['list'][0]
        if query.path == '/watch': return parse_qs(query.query)['v'][0]
        if query.path[:7] == '/watch/': return query.path.split('/')[2]
        if query.path[:7] == '/embed/': return query.path.split('/')[2]
        if query.path[:3] == '/v/': return query.path.split('/')[2]
   # returns None for invalid YouTube url

# unit test
@pytest.mark.parametrize(
    'url,expected_id',
    (
        ('https://youtu.be/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/watch?v=Dlxu28sQfkE&feature=youtu.be', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/watch/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/embed/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/v/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/playlist?list=PLRbcUrcJVEmX_eaAsubNOWfE4SlhGqjW4', 'PLRbcUrcJVEmX_eaAsubNOWfE4SlhGqjW4'),
    ),
)
def test_yt_id(url, expected_id):
    assert get_yt_id(url) == expected_id
朕就是辣么酷 2024-10-13 05:41:04

这是正则表达式,它涵盖了这些情况 在此处输入图像描述

((?<=(v|V)/ )|(?<=be/)|(?<=(\?|\&)v=)|(?<=嵌入/))([\w-]+)

Here is RegExp it cover these cases enter image description here

((?<=(v|V)/)|(?<=be/)|(?<=(\?|\&)v=)|(?<=embed/))([\w-]+)

小霸王臭丫头 2024-10-13 05:41:04

我使用这个很棒的包 pytube$ pip install pytube< /code>

#Examples
url1='http://youtu.be/SA2iWivDJiE'
url2='http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu'
url3='http://www.youtube.com/embed/SA2iWivDJiE'
url4='http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US'
url5='https://www.youtube.com/watch?v=rTHlyTphWP0&index=6&list=PLjeDyYvG6-40qawYNR4juzvSOg-ezZ2a6'
url6='youtube.com/watch?v=_lOT2p_FCvA'
url7='youtu.be/watch?v=_lOT2p_FCvA'
url8='https://www.youtube.com/watch?time_continue=9&v=n0g-Y0oo5Qs&feature=emb_logo'

urls=[url1,url2,url3,url4,url5,url6,url7,url8]

#Get youtube id
from pytube import extract
for url in urls:
    id=extract.video_id(url)
    print(id)

输出

SA2iWivDJiE
_oPAwA_Udwc
SA2iWivDJiE
SA2iWivDJiE
rTHlyTphWP0
_lOT2p_FCvA
_lOT2p_FCvA
n0g-Y0oo5Qs

I use this great package pytube.$ pip install pytube

#Examples
url1='http://youtu.be/SA2iWivDJiE'
url2='http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu'
url3='http://www.youtube.com/embed/SA2iWivDJiE'
url4='http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US'
url5='https://www.youtube.com/watch?v=rTHlyTphWP0&index=6&list=PLjeDyYvG6-40qawYNR4juzvSOg-ezZ2a6'
url6='youtube.com/watch?v=_lOT2p_FCvA'
url7='youtu.be/watch?v=_lOT2p_FCvA'
url8='https://www.youtube.com/watch?time_continue=9&v=n0g-Y0oo5Qs&feature=emb_logo'

urls=[url1,url2,url3,url4,url5,url6,url7,url8]

#Get youtube id
from pytube import extract
for url in urls:
    id=extract.video_id(url)
    print(id)

Output

SA2iWivDJiE
_oPAwA_Udwc
SA2iWivDJiE
SA2iWivDJiE
rTHlyTphWP0
_lOT2p_FCvA
_lOT2p_FCvA
n0g-Y0oo5Qs
野心澎湃 2024-10-13 05:41:04
match = re.search(r"youtube\.com/.*v=([^&]*)", "http://www.youtube.com/watch?v=z_AbfPXTKms&test=123")
if match:
    result = match.group(1)
else:
    result = ""

未经测试。

match = re.search(r"youtube\.com/.*v=([^&]*)", "http://www.youtube.com/watch?v=z_AbfPXTKms&test=123")
if match:
    result = match.group(1)
else:
    result = ""

Untested.

苯莒 2024-10-13 05:41:04

您可以使用

from urllib.parse import urlparse

url_data = urlparse("https://www.youtube.com/watch?v=RG9TMn1FJzc")
print(url_data.query[2::])

You can use

from urllib.parse import urlparse

url_data = urlparse("https://www.youtube.com/watch?v=RG9TMn1FJzc")
print(url_data.query[2::])
听不够的曲调 2024-10-13 05:41:04

当这些参数可以按任何顺序出现时,拆分字符串是一个非常糟糕的主意。坚持使用 urlparse:

from urllib.parse import parse_qs, urlparse

def video_id(url):
    """
    Get youtube video ID from url
    ID is last element of url path for youtu.be and shorts
    """
    id = None
    parse = urlparse(url)
    if parse.netloc.lower() == 'youtu.be' or parse.path.lower().startswith('/shorts'):
        id = parse.path.split('/')[-1:]
    elif parse.query:
        id = parse_qs(parse.query).get('v')
    return id[0] if id else ''

Splitting strings is a really bad idea when those parameters could come in any order. Stick with urlparse:

from urllib.parse import parse_qs, urlparse

def video_id(url):
    """
    Get youtube video ID from url
    ID is last element of url path for youtu.be and shorts
    """
    id = None
    parse = urlparse(url)
    if parse.netloc.lower() == 'youtu.be' or parse.path.lower().startswith('/shorts'):
        id = parse.path.split('/')[-1:]
    elif parse.query:
        id = parse_qs(parse.query).get('v')
    return id[0] if id else ''
晒暮凉 2024-10-13 05:41:04

您可以尝试使用正则表达式作为 YouTube 视频 ID:

# regex for the YouTube ID: "^[^v]+v=(.{11}).*"
result = re.match('^[^v]+v=(.{11}).*', url)
print result.group(1)

Here is something you could try using regex for the youtube video ID:

# regex for the YouTube ID: "^[^v]+v=(.{11}).*"
result = re.match('^[^v]+v=(.{11}).*', url)
print result.group(1)
烙印 2024-10-13 05:41:04

不需要正则表达式。在 ? 上拆分,取第二个,在 = 上拆分,取第二个,在 & 上拆分,取第一个。

No need for regex. Split on ?, take the second, split on =, take the second, split on &, take the first.

一向肩并 2024-10-13 05:41:04

虽然这需要搜索查询,但会为您提供 id

from youtube_search import YoutubeSearch    
results = YoutubeSearch('search terms', max_results=10).to_json()    
print(results)

Although this will take a search query but gives you the id:

from youtube_search import YoutubeSearch    
results = YoutubeSearch('search terms', max_results=10).to_json()    
print(results)
℉絮湮 2024-10-13 05:41:04
url = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1"
parsed = url.split("?")
videoId = parsed[1]
print(videoId)

这适用于所有类型的 YouTube 视频链接。

url = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1"
parsed = url.split("?")
videoId = parsed[1]
print(videoId)

This will work for all kinds of YouTube video links.

九命猫 2024-10-13 05:41:04

我很晚了,但我使用这个片段来获取视频 ID。

def video_id(url: str) -> str:
    """Extract the ``video_id`` from a YouTube url.
    This function supports the following patterns:
    - :samp:`https://youtube.com/watch?v={video_id}`
    - :samp:`https://youtube.com/embed/{video_id}`
    - :samp:`https://youtu.be/{video_id}`
    :param str url:
        A YouTube url containing a video id.
    :rtype: str
    :returns:
        YouTube video id.
    """
    return regex_search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url, group=1)

def regex_search(pattern: str, string: str, group: int):
    """Shortcut method to search a string for a given pattern.
    :param str pattern:
        A regular expression pattern.
    :param str string:
        A target string to search.
    :param int group:
        Index of group to return.
    :rtype:
        str or tuple
    :returns:
        Substring pattern matches.
    """
    regex = re.compile(pattern)
    results = regex.search(string)
    if not results:
        return False

    return results.group(group)

I am very late, but I use this snippet to get the video id.

def video_id(url: str) -> str:
    """Extract the ``video_id`` from a YouTube url.
    This function supports the following patterns:
    - :samp:`https://youtube.com/watch?v={video_id}`
    - :samp:`https://youtube.com/embed/{video_id}`
    - :samp:`https://youtu.be/{video_id}`
    :param str url:
        A YouTube url containing a video id.
    :rtype: str
    :returns:
        YouTube video id.
    """
    return regex_search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url, group=1)

def regex_search(pattern: str, string: str, group: int):
    """Shortcut method to search a string for a given pattern.
    :param str pattern:
        A regular expression pattern.
    :param str string:
        A target string to search.
    :param int group:
        Index of group to return.
    :rtype:
        str or tuple
    :returns:
        Substring pattern matches.
    """
    regex = re.compile(pattern)
    results = regex.search(string)
    if not results:
        return False

    return results.group(group)
为你鎻心 2024-10-13 05:41:04

我用这个

def getId(videourl):
    vidid=videourl.find('watch?v=')
    Id = videourl[vidid+8:vidid+19]
    if vidid==-1:
        vidid=videourl.find('be/')
        Id=videourl[vidid+3:]
    return Id

I use this

def getId(videourl):
    vidid=videourl.find('watch?v=')
    Id = videourl[vidid+8:vidid+19]
    if vidid==-1:
        vidid=videourl.find('be/')
        Id=videourl[vidid+3:]
    return Id
御守 2024-10-13 05:41:03

我创建了没有正则表达式的 youtube id 解析器:

import urlparse

def video_id(value):
    """
    Examples:
    - http://youtu.be/SA2iWivDJiE
    - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
    - http://www.youtube.com/embed/SA2iWivDJiE
    - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
    """
    query = urlparse.urlparse(value)
    if query.hostname == 'youtu.be':
        return query.path[1:]
    if query.hostname in ('www.youtube.com', 'youtube.com'):
        if query.path == '/watch':
            p = urlparse.parse_qs(query.query)
            return p['v'][0]
        if query.path[:7] == '/embed/':
            return query.path.split('/')[2]
        if query.path[:3] == '/v/':
            return query.path.split('/')[2]
    # fail?
    return None

I've created youtube id parser without regexp:

import urlparse

def video_id(value):
    """
    Examples:
    - http://youtu.be/SA2iWivDJiE
    - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
    - http://www.youtube.com/embed/SA2iWivDJiE
    - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
    """
    query = urlparse.urlparse(value)
    if query.hostname == 'youtu.be':
        return query.path[1:]
    if query.hostname in ('www.youtube.com', 'youtube.com'):
        if query.path == '/watch':
            p = urlparse.parse_qs(query.query)
            return p['v'][0]
        if query.path[:7] == '/embed/':
            return query.path.split('/')[2]
        if query.path[:3] == '/v/':
            return query.path.split('/')[2]
    # fail?
    return None
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文