绘制声音的音高（频率）图

发布于 2024-10-12 10:52:49 字数 1452 浏览 4 评论 0原文

我想将声音的音调绘制成图表。

目前我可以绘制振幅。下图是由 getUnscaledAmplitude() 返回的数据创建的：

alt text

AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream(file)));
byte[] bytes = new byte[(int) (audioInputStream.getFrameLength()) * (audioInputStream.getFormat().getFrameSize())];
audioInputStream.read(bytes);

// Get amplitude values for each audio channel in an array.
graphData = type.getUnscaledAmplitude(bytes, 1);


public int[][] getUnscaledAmplitude(byte[] eightBitByteArray, int nbChannels)
{
    int[][] toReturn = new int[nbChannels][eightBitByteArray.length / (2 * nbChannels)];
    int index = 0;

    for (int audioByte = 0; audioByte < eightBitByteArray.length;)
    {
        for (int channel = 0; channel < nbChannels; channel++)
        {
            // Do the byte to sample conversion.
            int low = (int) eightBitByteArray[audioByte];
            audioByte++;
            int high = (int) eightBitByteArray[audioByte];
            audioByte++;
            int sample = (high << 8) + (low & 0x00ff);

            toReturn[channel][index] = sample;
        }
        index++;
    }

    return toReturn;
}

但我需要显示音频的音调，而不是幅度。快速傅立叶变换似乎可以得到音调，但它需要知道比我拥有的原始字节更多的变量，并且非常复杂且数学化。

我有办法做到这一点吗？

原文

I want to plot the pitch of a sound into a graph.

Currently I can plot the amplitude. The graph below is created by the data returned by getUnscaledAmplitude():

alt text

AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream(file)));
byte[] bytes = new byte[(int) (audioInputStream.getFrameLength()) * (audioInputStream.getFormat().getFrameSize())];
audioInputStream.read(bytes);

// Get amplitude values for each audio channel in an array.
graphData = type.getUnscaledAmplitude(bytes, 1);


public int[][] getUnscaledAmplitude(byte[] eightBitByteArray, int nbChannels)
{
    int[][] toReturn = new int[nbChannels][eightBitByteArray.length / (2 * nbChannels)];
    int index = 0;

    for (int audioByte = 0; audioByte < eightBitByteArray.length;)
    {
        for (int channel = 0; channel < nbChannels; channel++)
        {
            // Do the byte to sample conversion.
            int low = (int) eightBitByteArray[audioByte];
            audioByte++;
            int high = (int) eightBitByteArray[audioByte];
            audioByte++;
            int sample = (high << 8) + (low & 0x00ff);

            toReturn[channel][index] = sample;
        }
        index++;
    }

    return toReturn;
}

But I need to show the audio's pitch, not amplitude. Fast Fourier transform appears to get the pitch, but it needs to know more variables than the raw bytes I have, and is very complex and mathematical.

Is there a way I can do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如日中天 2024-10-19 10:52:49

频率（客观指标）与音调（主观量）不同。一般来说，基音检测是一个非常棘手的问题。

假设您现在只想绘制频率响应图表，那么您别无选择，只能使用 FFT，因为它是获取时域数据频率响应的方法。（嗯，还有其他方法，例如离散余弦变换，但它们实施起来同样棘手，解释起来也更棘手）。

如果您在 FFT 的实现上遇到困难，请注意，它实际上只是计算离散傅立叶变换 (DFT) 的有效算法；请参阅http://en.wikipedia.org/wiki/Discrete_Fourier_transform。基本的 DFT 算法要简单得多（只需两个嵌套循环），但运行速度要慢很多（O(N^2) 而不是 O(N log N)）。

如果您希望做比简单地绘制频率内容更复杂的事情（例如音调检测或加窗（如其他人所建议的）），恐怕您将了解数学的含义。

回复收藏 0 原文

没有你我更好 2024-10-19 10:52:49

快速傅里叶变换不需要知道更多的输入字节。不要被维基百科的文章吓跑。 FFT 算法将获取您的输入信号（使用常见的 FFT 算法，样本数需要为 2 的幂，例如 256、512、1024）并返回具有相同大小的复数向量。因为您的输入是实数，而不是复杂的，（虚部设置为零），返回的向量将是对称的。其中只有一半包含数据。由于您不关心相位，因此您可以简单地获取复数的大小，即 sqrt(a^2+b^2)。仅取复数的绝对值也可能有效，在某些语言中，这相当于前面的表达式。

有可用的 FFT Java 实现，例如： http://www.cs .princeton.edu/introcs/97data/FFT.java.html

伪代码将类似于：

Complex in[1024];
Complex out[1024];
Copy your signal into in
FFT(in, out)
for every member of out compute sqrt(a^2+b^2)
To find frequency with highest power scan for the maximum value in the first 512 points in out

输出将包含零到采样频率一半之间的频率的整体。

由于 FFT 假定重复信号，您可能需要对输入信号应用窗口。但一开始不用担心这个。

您可以在网上找到更多信息，例如：FFT 初学者

也正如 Oli 指出的那样存在多个频率时，感知的音高是一个更复杂的现象。

Fast Fourier Transform doesn't need to know more then the input bytes you have. Don't be scared off by the Wikipedia article. An FFT algorithm will take your input signal (with the common FFT algorithms the number of samples is required to be a power of 2, e.g. 256, 512, 1024) and return a vector of complex numbers with the same size. Because your input is real, not complex, (imaginary portion set to zero) the returned vector will be symmetric. Only half of it will contain data. Since you do not care about the phase you can simply take the magnitude of the complex numbers, which is sqrt(a^2+b^2). Just taking the absoulte value of a complex number may also work, in some languages this is equivalent to the previous expression.

There are Java implementations of FFT available, e.g.: http://www.cs.princeton.edu/introcs/97data/FFT.java.html

Pseudo code will look something like:

Complex in[1024];
Complex out[1024];
Copy your signal into in
FFT(in, out)
for every member of out compute sqrt(a^2+b^2)
To find frequency with highest power scan for the maximum value in the first 512 points in out

The output will contain entires for frequencies between zero and half your sampling frequency.

Since FFT assumes a repeating signal you may want to apply a window to your input signal. But don't worry about this at first.

You can find more information on the web, e.g.: FFT for beginners

Also as Oli notes when multiple frequencies are present the perceived pitch is a more complex phenomenon.

回复收藏 0 原文