检测带有 ID3 标签的重复 MP3 文件吗?

发布于 2025-01-01 17:01:36 字数 1344 浏览 1 评论 0原文

如何检测(最好是 Java)具有不同 ID3 标签的重复 MP3 文件?这些文件具有相同的编码/格式。它应该适用于 ID3 的两个版本:ID3v1 和 ID3v2。

这是我到目前为止的代码。但它不适用于 Id3v1 标签。

try {

       String filepath = "c:\tmp";

       Vector<String> mp3_files = new Vector<String>();
       mp3_files.add(filepath + "test_with_id3.mp3");
       mp3_files.add(filepath + "test_without_id3");

       Iterator<String> i_mp3fp = mp3_files.iterator();

       while (i_mp3fp.hasNext()){

          String mp3_fp = i_mp3fp.next();

          AudioInputStream din = null;
          File file = new File(mp3_fp);
          AudioInputStream in = AudioSystem.getAudioInputStream(file);
          AudioFormat baseFormat = in.getFormat();

          AudioFormat decodedFormat = new AudioFormat(
             AudioFormat.Encoding.PCM_SIGNED,
             baseFormat.getSampleRate(), 16, baseFormat.getChannels(),
             baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
             false);
          din = AudioSystem.getAudioInputStream(decodedFormat, in);

          String md5 = org.apache.commons.codec.digest.DigestUtils.md5Hex( din );
          System.out.println("Name: "+mp3_fp+" | Hash: "+md5);
          din.close();

}

当我这样做时,我想我必须比较具有不同编码的 mp3。无论如何。 我认为更好的解决方案是读取 mp3 文件 - 忽略所有 id3 标签 - 进行校验和并比较它们。是否有用于读取和过滤 mp3 文件的库?

谢谢你们的帮助!

How can I detect (preferably Java) duplicate MP3 files with different ID3 tags? The files have the same encoding / format. It should work with both versions of ID3: ID3v1 and ID3v2.

This is my code so far. But it is not working with Id3v1 tags.

try {

       String filepath = "c:\tmp";

       Vector<String> mp3_files = new Vector<String>();
       mp3_files.add(filepath + "test_with_id3.mp3");
       mp3_files.add(filepath + "test_without_id3");

       Iterator<String> i_mp3fp = mp3_files.iterator();

       while (i_mp3fp.hasNext()){

          String mp3_fp = i_mp3fp.next();

          AudioInputStream din = null;
          File file = new File(mp3_fp);
          AudioInputStream in = AudioSystem.getAudioInputStream(file);
          AudioFormat baseFormat = in.getFormat();

          AudioFormat decodedFormat = new AudioFormat(
             AudioFormat.Encoding.PCM_SIGNED,
             baseFormat.getSampleRate(), 16, baseFormat.getChannels(),
             baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
             false);
          din = AudioSystem.getAudioInputStream(decodedFormat, in);

          String md5 = org.apache.commons.codec.digest.DigestUtils.md5Hex( din );
          System.out.println("Name: "+mp3_fp+" | Hash: "+md5);
          din.close();

}

When I did this I thought I had to compare mp3 with different encodings. Anyways.
I think a better solution would be just reading the mp3 files - ignoring all the id3 tags - do a checksum and compare them. Is there a lib for reading and filtering a mp3 file?

Thank you guys for your help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一张白纸 2025-01-08 17:01:36

将文件转换为原始 PCM,并对输出进行 MD5

虽然在 Java 中肯定有办法做到这一点,但我怀疑使用 FFmpeg + bash 可能会更快。

for file in *.mp3
do
ffmpeg -i $file -f s16le  - | md5 > $file.md5
done

Convert the files to raw PCM, and MD5 the output

While there is surely a way to do this in Java, I suspect it might be quicker to use FFmpeg + bash.

for file in *.mp3
do
ffmpeg -i $file -f s16le  - | md5 > $file.md5
done
初懵 2025-01-08 17:01:36

我对 MP3 和 ID3 标签格式没有任何经验,但快速浏览一下 Wikipedia 就会发现:

ID3v1

ID3v1标签占用128字节,以字符串TAG开头。标签被放置在文件末尾

只需读取整个 MP3 文件,跳过最后 128 个字节。

ID3v2

3.1。 ID3v2 标头

ID3v2 标签大小存储为 32 位同步安全整数(第 6.2 节),总共 28 个有效位(代表最大 256MB)。

标头格式非常简单。如果文件以 ID3v2 标头开头,则读取总标头大小并跳过那么多字节。

获得“原始”文件后,逐字节比较内容或使用哈希值。

I don't have any experience with MP3 and ID3 tags format, but a quick look to Wikipedia reveals that:

ID3v1

The ID3v1 tag occupies 128 bytes, beginning with the string TAG. The tag was placed at the end of the file

Just read the whole MP3 file skipping the last 128 bytes.

ID3v2

3.1. ID3v2 header

The ID3v2 tag size is stored as a 32 bit synchsafe integer (section 6.2), making a total of 28 effective bits (representing up to 256MB).

The header format is pretty simple. If the file starts with ID3v2 header, read the total header size and skip that many bytes.

Once you have the "raw" file, compare contents byte-by-byte or using a hash.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文