wav 到 midi 转换
我是这个领域的新手 - 但我需要在 java 中执行 WAV 到 MIDI 的转换。 有没有办法知道 WAV 到 MIDI 转换到底涉及哪些步骤? 我有一个非常粗略的想法,正如你需要的那样; 对 wav 文件进行采样、过滤、使用 FFT 进行频谱分析、特征提取,然后将提取的特征写入 MIDI。 但我找不到可靠的来源或论文来说明如何做到这一切? 有人可以给我一些线索,告诉我如何开始以及从哪里开始吗? 是否有任何开源 API 可用于此 WAV 到 MIDI 转换过程?
预先致谢
I'm new to this field - but I need to perform a WAV-to-MIDI conversion in java.
Is there a way to know what exactly are the steps involved in WAV-to-MIDI conversion?
I have a very rough idea as in you need to;
sample the wav file, filter it, use FFT for spectral analysis, feature extraction and then write the extracted features on to MIDI.
But I cannot find solid sources or papers as in how to do all that?
Can some one give me clues as in how and where to start?
Are there any Open Source APIs available for this WAV-to-MIDI conversion process?
Advance thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是一个比您想象的更复杂的过程。
这个研究问题通常被称为音乐转录:将音乐的低级表示(例如波形)转换为高级表示(例如 MIDI 甚至乐谱)的行为。
您的解决方案的复杂性将取决于输入数据的复杂性。大量的研究论文只讨论单音钢琴或鼓上的音乐转录……因为它们很容易转录。 (相对而言。) 小提琴更难。声音就更难了。小提琴加声乐加钢琴就更难了。交响乐几乎是不可能的。你明白了。
音乐转录的基本要素涉及以下任何重叠领域:(
在 Google Scholar 或 ISMIR 中搜索有关“音乐转录”的论文会议记录:http://www.ismir.net。如果您对上述子主题之一更感兴趣,我可以进一步指出您。祝你好运。
编辑:话虽如此,我们都可以在网络上找到现有的解决方案。请随意尝试。但当你这样做时,要用批判性的眼睛和耳朵来评估它们。哪些类型的音频信号会导致转录失败?
编辑2:啊,你只是为钢琴做这个。好的,这是可行的。音乐转录已经发展到可以很好地转录单音钢琴。拉赫玛尼诺夫协奏曲仍然会带来问题。
我们的建议取决于您的最终目标。您声明“需要在Java中执行...”。所以听起来你只是想要一些东西能够发挥作用,而不管它如何让你达到目标。在这种情况下,我百分百同意其他人的观点:使用现有的东西。
这实际上是一个有趣的问题;我所知道的所有 MIR 库通常都是 C/C++/Python/Matlab。但不是Java。 EchoNest 有一个 Java API,但我认为它不能进行音符级转录。 http://developer.echonest.com。 (编辑:它确实音符级转录。返回的数据包括音调、音色、节拍、塔图姆等。但我发现复调仍然是一个问题。)
哦,Marsyas 是基于 Java 的。凉爽的。我以为这只是C++。 http://marsyas.info/ 我推荐这个。它是由 MIR 教授 George Tzanetakis 开发的。它可以进行信号级分析,应该是一个不错的选择。
现在,如果这是为了获得有趣的学习体验,我认为您可以使用 Java 中的声音操作实用程序来试验 WAV 信号,看看会产生什么结果。
我更好地描述了 MIR 软件:我们使用的工具
编辑:此页面比 Matlab,您可能对 MIR 工具箱感兴趣
这是一个很好的常见数据集页面:MIR 数据集
It's a more involved process than you might imagine.
This research problem is often referred to as music transcription: the act of converting a low-level representation of music (e.g., waveform) into a higher-level representation such as MIDI or even sheet music.
The sophistication of your solution will depend upon the complexity of your input data. Tons of research papers address music transcription only on monophonic piano or drums... because they are easy to transcribe. (Relatively.) Violin is harder. Voice is even harder. Violin plus voice plus piano is much harder. A symphony is nearly impossible. You get the picture.
The basic elements of music transcription involve any of the following overlapping areas:
Search for papers on "music transcription" on Google Scholar or from the ISMIR proceedings: http://www.ismir.net. If you are more interested in one of the above subtopics, I can point you further. Good luck.
EDIT: That being said, there are existing solutions that we can all find on the web. Feel free to try them. But as you do, evaluate them with a critical eye and ear. What types of audio signals would cause transcription to fail?
EDIT 2: Ah, you are only doing this for piano. Okay, this is doable. Music transcription has advanced to the point where it can transcribe monophonic piano pretty well. A Rachmaninov concerto will still pose problems.
Our recommendations depend upon your end goal. You state "need to perform... in Java." So it sounds like you just want something to work regardless of how it gets you there. In that case, I agree 100% with others: use something that exists.
That's actually an interesting question; all of the MIR libraries I know are typically C/C++/Python/Matlab. But not Java. The EchoNest has a Java API, but I don't think it does note-level transcription. http://developer.echonest.com. (Edit: It does note-level transcription. The returned data includes pitch, timbre, beat, tatum, and more. But I find polyphony is still a problem.)
Oh, Marsyas is Java-based. Cool. I thought it was just C++. http://marsyas.info/ I recommend this. It's developed by George Tzanetakis, a professor in MIR. It does signal-level analysis and should be a good option.
Now, if this is for a fun learning experience, I think you can use the sound manipulation utilities in Java to experiment with the WAV signal and see what comes out.
EDIT: This page describes MIR software better than I can: The Tools We Use
For Matlab, you may be interested in the MIR Toolbox
Here is a nice page of common datasets: MIR Datasets
对于该领域的新手来说,这是一项非常艰巨的任务,除非您的意思是您熟悉信号分析和特征检测,并且想要更具体地研究自动转录。
没有用于 WAV 到 MIDI 转换的 API。 Vamp 是一个特征提取插件框架,但要进行自动转录,您需要使用所有现有插件的功能,以及它们中尚不存在的实现功能。
浏览vamp下载页面上的插件说明,任何不明白的说明都是如果你想这样做,你应该开始研究的主题。
This is a very big undertaking for being new to the field, unless you mean you are familiar with signal analysis and feature detection in general and want to look more specifically into automatic transcription.
There is no API for WAV to MIDI conversion. Vamp is a framework for feature extraction plugins, but to do automatic transcription you would need to use all the functionality of the existing plugins, plus implement functionality that exists in none of them yet.
Browse through the descriptions of the plugins on the vamp download page, any descriptions you do not understand are topics you should start researching if you want to do this.
如果您不需要自动执行此任务(即,对于人们可以上传 MP3 并取回 MIDI 文件的网站),那么您应该考虑使用像 Melodyne 已经非常擅长做到这一点。正如 Steve 指出的,这是一项非常艰巨的任务,即使是目前最好的算法和解决方案也不是 100% 可靠。
因此,如果您只是在做工作室工作并且需要进行一些转换,那么使用已经为此任务设计的工具可能会为您节省一些时间(并且带来很多麻烦)。
If you don't need to automate this task (ie, for a website where people can upload MP3's and get MIDI files back), then you should consider using a tool like Melodyne which is already quite good at going this. As Steve noted, this is a very difficult task to accomplish, and even the best algorithms and solutions present at the moment are not 100% reliable.
So if you are just doing studio work and need to do a few conversions, it will probably save you a bit of time (and lots of headache) to use a tool already designed for this task.
这是一个仍处于高度开发阶段的领域,但是,有一些(实验性)算法可用。
您可以安装声音注释器并使用一些 vamp 插件。
例如:
This is a field which is still highly under development, yet, there are some (experimental) algorithms available.
You can install sonic annotator and use a few vamp plugins.
For example:
海豚,抱歉这么粗鲁,但你完全低估了这个问题。您想要实现的目标 - 涉及演奏时使用的所有参数的完整钢琴声音转录需要与在该领域工作多年的人员进行大量研究。即使是一群信号处理博士也必须投入大量工作才能接近您的意思。音乐转录需要数十年的工作才能达到半可靠的效果。我建议你选择一个不同的问题,你可以比这个问题处理得更好。
Dolphin, sorry to be brusque, but you have completely underestimated the problem. What you want to achieve - a full piano sound transcription involving all parameters that were used while playing would need an enormous amount of research with people who have worked in the field for many years. Even a group of PhDs in signal processing would have to invest a lot of work to even come close to what you mean. Music transcription has needed decades of work to even work halfway reliable. I'd suggest you pick a different problem which you can manage better than this.