检测带有 ID3 标签的重复 MP3 文件吗?
如何检测(最好是 Java)具有不同 ID3 标签的重复 MP3 文件?这些文件具有相同的编码/格式。它应该适用于 ID3 的两个版本:ID3v1 和 ID3v2。
这是我到目前为止的代码。但它不适用于 Id3v1 标签。
try {
String filepath = "c:\tmp";
Vector<String> mp3_files = new Vector<String>();
mp3_files.add(filepath + "test_with_id3.mp3");
mp3_files.add(filepath + "test_without_id3");
Iterator<String> i_mp3fp = mp3_files.iterator();
while (i_mp3fp.hasNext()){
String mp3_fp = i_mp3fp.next();
AudioInputStream din = null;
File file = new File(mp3_fp);
AudioInputStream in = AudioSystem.getAudioInputStream(file);
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
baseFormat.getSampleRate(), 16, baseFormat.getChannels(),
baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
false);
din = AudioSystem.getAudioInputStream(decodedFormat, in);
String md5 = org.apache.commons.codec.digest.DigestUtils.md5Hex( din );
System.out.println("Name: "+mp3_fp+" | Hash: "+md5);
din.close();
}
当我这样做时,我想我必须比较具有不同编码的 mp3。无论如何。 我认为更好的解决方案是读取 mp3 文件 - 忽略所有 id3 标签 - 进行校验和并比较它们。是否有用于读取和过滤 mp3 文件的库?
谢谢你们的帮助!
How can I detect (preferably Java) duplicate MP3 files with different ID3 tags? The files have the same encoding / format. It should work with both versions of ID3: ID3v1 and ID3v2.
This is my code so far. But it is not working with Id3v1 tags.
try {
String filepath = "c:\tmp";
Vector<String> mp3_files = new Vector<String>();
mp3_files.add(filepath + "test_with_id3.mp3");
mp3_files.add(filepath + "test_without_id3");
Iterator<String> i_mp3fp = mp3_files.iterator();
while (i_mp3fp.hasNext()){
String mp3_fp = i_mp3fp.next();
AudioInputStream din = null;
File file = new File(mp3_fp);
AudioInputStream in = AudioSystem.getAudioInputStream(file);
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
baseFormat.getSampleRate(), 16, baseFormat.getChannels(),
baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
false);
din = AudioSystem.getAudioInputStream(decodedFormat, in);
String md5 = org.apache.commons.codec.digest.DigestUtils.md5Hex( din );
System.out.println("Name: "+mp3_fp+" | Hash: "+md5);
din.close();
}
When I did this I thought I had to compare mp3 with different encodings. Anyways.
I think a better solution would be just reading the mp3 files - ignoring all the id3 tags - do a checksum and compare them. Is there a lib for reading and filtering a mp3 file?
Thank you guys for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
将文件转换为原始 PCM,并对输出进行 MD5
虽然在 Java 中肯定有办法做到这一点,但我怀疑使用 FFmpeg + bash 可能会更快。
Convert the files to raw PCM, and MD5 the output
While there is surely a way to do this in Java, I suspect it might be quicker to use FFmpeg + bash.
我对 MP3 和 ID3 标签格式没有任何经验,但快速浏览一下 Wikipedia 就会发现:
ID3v1
只需读取整个 MP3 文件,跳过最后 128 个字节。
ID3v2
标头格式非常简单。如果文件以 ID3v2 标头开头,则读取总标头大小并跳过那么多字节。
获得“原始”文件后,逐字节比较内容或使用哈希值。
I don't have any experience with MP3 and ID3 tags format, but a quick look to Wikipedia reveals that:
ID3v1
Just read the whole MP3 file skipping the last 128 bytes.
ID3v2
The header format is pretty simple. If the file starts with ID3v2 header, read the total header size and skip that many bytes.
Once you have the "raw" file, compare contents byte-by-byte or using a hash.