如何创建媒体文件的稳定校验和?

发布于 2024-08-24 07:53:19 字数 119 浏览 12 评论 0原文

如何仅创建媒体数据的校验和而不包含元数据以获得媒体文件的稳定标识。最好是使用支持多种格式的库的跨平台方法。例如 vlc、ffmpeg 或 mplayer。

(媒体文件应该是常见格式的音频和视频,图像也很好)

how can i create a checksum of only the media data without the metadata to get a stable identification for a media file. preferably an cross platform approach with a library that has support for many formats. e.g. vlc, ffmpeg or mplayer.

(media files should be audio and video in common formats, images would be nice to have too)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梅倚清风 2024-08-31 07:53:20

好吧,可能晚了 11 年才得到答案,但万一像我这样的人偶然发现了这个……

ffmpeg 可以输出各个流的校验和。因此,相同的音频或视频将输出相同的校验和,与其容器格式或元数据无关。

文件 $filename 的视频轨道示例,将输出写入 $filename.md5

ffmpeg -i "$filename" -map 0:v -codec copy -f md5 "$filename.md5"

对于音频,请使用 -map 0:a

要输出到 STDOUT,请使用 -。例如:

ffmpeg -i "$filename" -map 0:a -codec copy -hide_banner -loglevel warning -f md5 -

Well, it may be 11 years too late for an answer, but in case others like me stumble upon this...

ffmpeg can output checksums for individual streams. So the same audio or video would output the same checksum independently of it's container format or metadata.

Example for the video track of file $filename, writing the output to $filename.md5 :

ffmpeg -i "$filename" -map 0:v -codec copy -f md5 "$filename.md5"

For audio, use -map 0:a.

To output to STDOUT, use -. For example:

ffmpeg -i "$filename" -map 0:a -codec copy -hide_banner -loglevel warning -f md5 -
独自唱情﹋歌 2024-08-31 07:53:20

我不知道有任何现有的独立于平台的软件可以完成此任务,但我确实知道可以用解释型(独立于平台的)语言(例如 Java)来完成此任务。

本质上,我们只需要从文件中去除任何元数据(标签),预先对视频文件进行多路分解。理论上,在解复用和删除元数据之后,可以对文件进行哈希处理,并与经过相同过程的另一个文件进行比较,以匹配相同的文件,尽管具有不同的标签。与指纹不同,这不会识别相似的歌曲/电影,而是识别相同的文件(想象一下,您可能想要已存档的给定歌曲的 10 个不同版本或比特率,但不希望其中任何一个的 2 个相同副本四处传播) 。

其中最麻烦的部分是删除标签,因为标签格式有许多不同的规范,这些规范不一定在不同的应用程序中以相同的方式实现,即通过两个不同的应用程序分别给出相同标签的相同确切音频文件可能不会产生相同的输出文件。这可能对纯音频校验和的概念造成致命问题的唯一方法是流行的标记软件对文件的二进制音频部分进行任何更改,或者以非标准方式填充音频。

获取校验和是微不足道的,但我不知道有任何独立于平台的库来解复用和取消标记 mpeg 文件。我知道在 'nix 环境中,mpgtx 是一个很棒的命令行工具,可以执行解复用和去标记,但显然这不是一个独立于平台的解决方案。

也许有人感到雄心勃勃?

I don't know of any existing platform-independent software that will accomplish this, but I do know a way that this could be accomplished in an interpreted (platform-independent) language such as Java.

Essentially, we simply need to strip any metadata (tags) from the file, demultiplexing video files beforehand. Theoretically after demux and removing metadata, one could hash the file and compare against another file that has undergone the same process to match identical files despite having different tags. Unlike a fingerprint, this would not identify similar songs/movies but identical files (imagine you might want the 10 different versions or bitrates of a given song you've archived, but don't want 2 identical copies of any of them floating around).

The most troubling part of this is removing tags as there are many different specifications for tag formats which are not necessarily implemented the same across different applications, i.e. the same exact audio file given identical tags separately through two different applications may not result in identical output files. The only way this could pose an issue fatal to the concept of an audio-only checksum is if popular tagging software makes any changes to the binary audio portion of the file, or pads the audio in a non-standard way.

Taking a checksum is trivial, but I'm not aware off the top of my head of any platform independent libraries to demux and detag mpeg files. I know that in 'nix environments, mpgtx is a great command-line tool that could perform the demux and detag, but obviously that is not a platform-independent solution.

Maybe someone out there feels ambitious?

百善笑为先 2024-08-31 07:53:20

这是一个围绕 mvik 基于 ffmpeg 的 shell 脚本answer,如果成功则打印 MD5,或者如果成功则打印 stderr 输出失败的。

#!/bin/bash

# Compute the MD5 of the audio stream of an MP3 file, ignoring ID3 tags.

# The problem with comparing MP3 files is that a simple change to the ID3 tags
# in one file will cause the two files to have differing MD5 sums.  This script
# avoids that problem by taking the MD5 of only the audio stream, ignoring the
# tags.

# Note that by virtue of using ffmpeg, this script happens to also work for any
# other audio file format supported by ffmpeg (not just MP3's).

set -e

stdoutf=$( mktemp mp3md5.XXXXXX )
stderrf=$( mktemp mp3md5.XXXXXX )

set +e
ffmpeg -i "$1" -c:a copy -f md5 - >$stdoutf 2>$stderrf
ret=$?
set -e

if test $ret -ne 0 ; then
    cat $stderrf
else
    cat $stdoutf | sed 's/MD5=//'
fi

rm -f $stdoutf $stderrf
exit $ret

Here is a shell script around mvik's ffmpeg-based answer which prints the MD5 in case of success, or the stderr output in case of failure.

#!/bin/bash

# Compute the MD5 of the audio stream of an MP3 file, ignoring ID3 tags.

# The problem with comparing MP3 files is that a simple change to the ID3 tags
# in one file will cause the two files to have differing MD5 sums.  This script
# avoids that problem by taking the MD5 of only the audio stream, ignoring the
# tags.

# Note that by virtue of using ffmpeg, this script happens to also work for any
# other audio file format supported by ffmpeg (not just MP3's).

set -e

stdoutf=$( mktemp mp3md5.XXXXXX )
stderrf=$( mktemp mp3md5.XXXXXX )

set +e
ffmpeg -i "$1" -c:a copy -f md5 - >$stdoutf 2>$stderrf
ret=$?
set -e

if test $ret -ne 0 ; then
    cat $stderrf
else
    cat $stdoutf | sed 's/MD5=//'
fi

rm -f $stdoutf $stderrf
exit $ret
静若繁花 2024-08-31 07:53:20

我发现的一种可能的解决方案似乎是使用 vlc:

./VLC -I rc snd.mp3 :sout='#std{mux=raw,access=file,dst=-}' vlc://quit | sha1sum

one possible solution i found seems to be with vlc:

./VLC -I rc snd.mp3 :sout='#std{mux=raw,access=file,dst=-}' vlc://quit | sha1sum
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文