乐谱音频分析

发布于 2024-11-10 10:35:43 字数 1770 浏览 7 评论 0原文

我目前正在开发一个程序,该程序可以分析独奏音乐家演奏乐器的 wav 文件并检测其中的音符。为此,它执行 FFT,然后查看生成的数据。目标是(在某些时候)通过编写 MIDI 文件来生成乐谱。

我只是想就它可能困难的地方获得一些意见,是否有人尝试过,也许有一些值得研究的事情。目前我最大的困难是,并非所有音符都纯粹是一种频率,而且我还无法检测到和弦;只是单个音符。此外,我检测到的音符之间必须有一个暂停,这样我就可以确定一个已经结束,另一个开始了。对此的任何评论也将非常受欢迎!

这是当信号传入新帧时我使用的代码。它寻找样本中最主要的频率:

    //Get frequency vector for power match
        double[] frequencyVectorDoubleArray = Accord.Audio.Tools.GetFrequencyVector(waveSignal.Length, waveSignal.SampleRate);

        powerSpectrumDoubleArray[0] = 0; // zero DC

        double[,] frequencyPowerDoubleArray = new double[powerSpectrumDoubleArray.Length, 2];

        for (int i = 0; i < powerSpectrumDoubleArray.Length; i++)
        {
            if (frequencyVectorDoubleArray[i] > 15.00)
            {
                frequencyPowerDoubleArray[i, 0] = frequencyVectorDoubleArray[i];
                frequencyPowerDoubleArray[i, 1] = powerSpectrumDoubleArray[i];
            }
        }

    //Method for finding the highest frequency in a sample of frequency domain data
        //But I want to filter out stuff
        pulsePowerDouble = lowestPowerAcceptedDouble;//0;//lowestPowerAccepted;
        int frequencyIndexAtPulseInt = 0;
        int oldFrequencyIndexAtPulse = 0;
        for (int j = 0; j < frequencyPowerDoubleArray.Length / 2; j++)
        {
            if (frequencyPowerDoubleArray[j, 1] > pulsePowerDouble)
            {
                oldPulsePowerDouble = pulsePowerDouble;
                pulsePowerDouble = frequencyPowerDoubleArray[j, 1];

                oldFrequencyIndexAtPulse = frequencyIndexAtPulseInt;
                frequencyIndexAtPulseInt = j;
            }
        }
        foundFreq = frequencyPowerDoubleArray[frequencyIndexAtPulseInt, 0];

I'm currently working on a program that analyses a wav file of a solo musician playing an instrument and detects the notes within it. To do this it performs an FFT and then looks at the data produced. The goal is to (at some point) produce the sheet music by writing a midi file.

I just wanted to get a few opinions on what might be difficult about it, whether anyones tried it before, maybe a few things it would be good to research. At the moment my biggest struggle is that not all notes are purely one frequency and I cannot yet detect chords; just single notes. Also there has to be a pause between the notes I am detecting so I know for sure one has ended and the other started. Any comments on this would also be very welcome!

This is the code I use when A new frame comes in from the signal. it looks for the frequency that is most dominant in the sample:

    //Get frequency vector for power match
        double[] frequencyVectorDoubleArray = Accord.Audio.Tools.GetFrequencyVector(waveSignal.Length, waveSignal.SampleRate);

        powerSpectrumDoubleArray[0] = 0; // zero DC

        double[,] frequencyPowerDoubleArray = new double[powerSpectrumDoubleArray.Length, 2];

        for (int i = 0; i < powerSpectrumDoubleArray.Length; i++)
        {
            if (frequencyVectorDoubleArray[i] > 15.00)
            {
                frequencyPowerDoubleArray[i, 0] = frequencyVectorDoubleArray[i];
                frequencyPowerDoubleArray[i, 1] = powerSpectrumDoubleArray[i];
            }
        }

    //Method for finding the highest frequency in a sample of frequency domain data
        //But I want to filter out stuff
        pulsePowerDouble = lowestPowerAcceptedDouble;//0;//lowestPowerAccepted;
        int frequencyIndexAtPulseInt = 0;
        int oldFrequencyIndexAtPulse = 0;
        for (int j = 0; j < frequencyPowerDoubleArray.Length / 2; j++)
        {
            if (frequencyPowerDoubleArray[j, 1] > pulsePowerDouble)
            {
                oldPulsePowerDouble = pulsePowerDouble;
                pulsePowerDouble = frequencyPowerDoubleArray[j, 1];

                oldFrequencyIndexAtPulse = frequencyIndexAtPulseInt;
                frequencyIndexAtPulseInt = j;
            }
        }
        foundFreq = frequencyPowerDoubleArray[frequencyIndexAtPulseInt, 0];

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

永言不败 2024-11-17 10:35:43

1)关于频率估计和音调估计(这是两个不同的主题)有很多(几十年的)研究文献。

2)峰值FFT频率与音高不同。一些独奏乐器仅一个音符就可以产生十多个频率峰值,更不用说一个和弦了,而且在音高附近没有一个最大的峰值。对于一些常见的仪器,峰值甚至可能不是数学上精确的谐波。

3) 使用短的无窗 FFT 的峰值箱并不是一个很好的频率估计器。

4) 注意开始检测可能需要一些复杂的模式匹配,具体取决于仪器。

1) There is a lot (several decades worth) of research literature on frequency estimation and pitch estimation (which are two different subjects).

2) Peak FFT frequency is not the same as the musical pitch. Some solo musical instruments can produces well over a dozen frequency peaks for just one note, let alone a chord, and with none of the largest peaks anywhere near the musical pitch. For some common instruments, the peaks might not even be mathematically exact harmonics.

3) Using the peak bin of a short unwindowed FFT isn't a great frequency estimator.

4) Note onset detection might require some sophisticated pattern matching, depending on the instrument.

故人爱我别走 2024-11-17 10:35:43

您不想关注最高频率,而是关注最低频率。任何乐器的每个音符都充满和声。期待听到基音及其以上的每个八度音阶。加上所有二次和三次谐波。

当小号和长号演奏相同的音符时,谐波使它们听起来不同。

You don't want to focus on the highest frequency, but rather the lowest. Every note from any musical instrument is full of harmonics. Expect to hear the fundamental, and every octave above it. Plus all the second and third harmonics.

Harmonics is what makes a trumpet sound different from a trombone when they are both playing the same note.

吻安 2024-11-17 10:35:43

不幸的是,这是一个极其困难的问题,一些原因已经给出了。我会从“音符识别”的文献搜索(例如谷歌学术)开始。

如果这不是一个业余时间项目,请注意 - 我见过在这个特定的浅滩上创建硕士论文但没有得到任何有用的结果。

Unfortunately this is an extremely hard problem, some of the reasons have already been given. I would start with a literature search (Google Scholar, for instance) for "musical note identification".

If this isn't a spare time project, beware - I have seen masters theses founder on this particular shoal without getting any useful results.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文