使用 Python 从广播流中读取 SHOUTcast/Icecast 元数据

发布于 2024-11-19 02:26:43 字数 304 浏览 5 评论 0原文

有人成功地从远程广播流中读取 SHOUTcast/Icecast 元数据吗？

有几个库可以从本地 MP3 文件读取元数据，但没有一个似乎设计用于使用无线电流（本质上是远程服务器上永无止境的 MP3 文件）。

其他建议建议从 mp3 流的开头下载有限数量的比特，但这通常会导致一堆十六进制输出，而没有任何类似于文本元数据的内容。

有人知道更成功的解决方案吗？谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我要还你自由 2024-11-26 02:26:43

#!/usr/bin/env python
import urllib2
stream_url = 'http://pub1.di.fm/di_classictrance'
request = urllib2.Request(stream_url)
try:
    request.add_header('Icy-MetaData', 1)
    response = urllib2.urlopen(request)
    icy_metaint_header = response.headers.get('icy-metaint')
    if icy_metaint_header is not None:
        metaint = int(icy_metaint_header)
        read_buffer = metaint+255
        content = response.read(read_buffer)
        title = content[metaint:].split("'")[1]
        print title
except:
    print 'Error'

有关更多详细信息，请检查此链接

#!/usr/bin/env python
import urllib2
stream_url = 'http://pub1.di.fm/di_classictrance'
request = urllib2.Request(stream_url)
try:
    request.add_header('Icy-MetaData', 1)
    response = urllib2.urlopen(request)
    icy_metaint_header = response.headers.get('icy-metaint')
    if icy_metaint_header is not None:
        metaint = int(icy_metaint_header)
        read_buffer = metaint+255
        content = response.read(read_buffer)
        title = content[metaint:].split("'")[1]
        print title
except:
    print 'Error'

For more details check this link

回复收藏 0 原文

时光是把杀猪刀 2024-11-26 02:26:43

我使用了一些 @dbogdan 的代码并创建了一个每天用于超过 4000 个流的库。
它运行良好且稳定，支持元数据，例如歌曲标题、艺术家姓名、比特率和内容类型。

你可以在以下位置找到它：
https://github.com/Dirble/streamscrobbler-python

回复收藏 0 原文

谁许谁一生繁华 2024-11-26 02:26:43

对于 10 年后发现自己在这里的其他人，这里是 @dbogdan 代码的 python3 版本。值得注意的是 content[metaint:].split("'")[1] 极其不可靠。另外，一旦您遇到“Queensrÿche”、非英语标题等......标题将充满特殊字符所在的字节。您无法解码整个标签，因此需要一些“跳过圈子”才能将标签缩减为只有一个标题。我没有从响应标头中获取采样率和比特率，因为它们总是错误的。从 MP3 磁头获取该数据。公平地说，Shoutcast/Icecast 给您的标签在各个方面都可能是错误的！只需一次偶然发现“CC Revival - 我通过小道消息听说”（这确实发生过），就会意识到这些标签没有任何官方或可靠的内容。例如，如果您使用 Shout/Icecast 标签作为 MusicBrainz 搜索的搜索参数，这可能会产生巨大的问题。

from urllib import request as urequest
import re


SRCHTITLE = re.compile(br'StreamTitle=\\*(?P<title>[^;]*);').search

def get_stream_title(tag:bytes) -> str:
    title = ''
    if m := SRCHTITLE(tag):
        #decode, strip, unescape and remove surrounding quotes (may not even be the same type of quote)
        title = m.group('title').decode('utf-8').strip().replace('\\', '')[1:-1]
    return title

def id3(url:str) -> dict:
    request = urequest.Request(url, headers={'Icy-MetaData': 1})
    
    with urequest.urlopen(request) as resp:
        metaint = int(resp.headers.get('icy-metaint', '-1'))
        
        if metaint<0: return False
        
        resp.read(metaint) #this isn't seekable so, arbitrarily read to the point we want
        
        tagdata = dict( 
            site_url = resp.headers.get('icy-url'  )        ,
            name     = resp.headers.get('icy-name' ).title(),
            genre    = resp.headers.get('icy-genre').title(),
            title    = get_stream_title(resp.read(255))     )

        return tagdata

For anyone else finding themselves here, 10 years later, here is the python3 version of @dbogdan's code. It may be notable that content[metaint:].split("'")[1] is extremely unreliable. Also, as soon as you come across "Queensrÿche", a non-english title, etc... the title is going to be full of bytes where the special characters are. You can't decode the entire tag, so there is a little bit of "jumping through hoops" to whittle the tag down to only a title. I did not grab sample rate and bitrate from the response headers because, they are constantly wrong. Get that data from the MP3 head. To be fair, the tag Shoutcast/Icecast gives you can be wrong in every way! It only takes one time of coming across "C C Revival - I hearded through the grapevine" (which really happened) to realize there is nothing official or reliable about these tags. This can create huge problems if, for instance: you use Shout/Ice cast tags as search parameters for MusicBrainz searches.

from urllib import request as urequest
import re


SRCHTITLE = re.compile(br'StreamTitle=\\*(?P<title>[^;]*);').search

def get_stream_title(tag:bytes) -> str:
    title = ''
    if m := SRCHTITLE(tag):
        #decode, strip, unescape and remove surrounding quotes (may not even be the same type of quote)
        title = m.group('title').decode('utf-8').strip().replace('\\', '')[1:-1]
    return title

def id3(url:str) -> dict:
    request = urequest.Request(url, headers={'Icy-MetaData': 1})
    
    with urequest.urlopen(request) as resp:
        metaint = int(resp.headers.get('icy-metaint', '-1'))
        
        if metaint<0: return False
        
        resp.read(metaint) #this isn't seekable so, arbitrarily read to the point we want
        
        tagdata = dict( 
            site_url = resp.headers.get('icy-url'  )        ,
            name     = resp.headers.get('icy-name' ).title(),
            genre    = resp.headers.get('icy-genre').title(),
            title    = get_stream_title(resp.read(255))     )

        return tagdata

回复收藏 0 原文

依靠 2024-11-26 02:26:43

由于 mp3 是一种专有格式，因此规范并不那么容易获得。我认为，这个网站提供了很好的概述。

在普通 mp3 文件中，ID3v1 元数据标记位于文件的最末尾，它构成最后 128 个字节。这实际上是一个糟糕的设计。 ID3 系统是作为事后添加到 mp3 中的，所以我想在不破坏向后兼容性的情况下没有其他方法可以做到这一点。这意味着，如果广播流像永无止境的 mp3 文件一样提供，则不能有正常意义上的 ID3 标签。

我会向广播电台的运营人员核实；也许他们把 ID3 标签放在了一个非标准的地方？

回复收藏 0 原文

~没有更多了~