我已经在一个 Android 项目上工作了一段时间,该项目显示输入信号的基频(充当调谐器)。我已经成功实现了 AudioRecord 类并正在从中获取数据。但是,我很难对此数据执行 FFT 来获取输入信号的基频。我一直在此处查看帖子 ,并且正在使用 Java 中的 FFT 和复杂类与之配合。
我已成功使用 Java 中的 FFT 中的 FFT 函数,但我不确定是否获得了正确的结果。对于 FFT 的幅度 (sqrt[rere+imim]),我得到的值一开始很高,大约 15000 Hz,然后慢慢减小到大约 300 Hz。似乎不对。
另外,就麦克风的原始数据而言,数据似乎很好,除了前 50 个值左右始终是数字 3,除非我仍在应用程序中再次按下调谐按钮,然后我只能得到大约15.这正常吗?
这是我的一些代码。
首先,我使用以下代码将短数据(从麦克风获得)转换为双精度数据,该代码来自 我一直在看的帖子。这段代码我不完全理解,但我认为它有效。
//Conversion from short to double
double[] micBufferData = new double[bufferSizeInBytes];//size may need to change
final int bytesPerSample = 2; // As it is 16bit PCM
final double amplification = 1.0; // choose a number as you like
for (int index = 0, floatIndex = 0; index < bufferSizeInBytes - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = audioData[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = amplification * (sample / 32768.0);
micBufferData[floatIndex] = sample32;
}
然后代码继续如下:
//Create Complex array for use in FFT
Complex[] fftTempArray = new Complex[bufferSizeInBytes];
for (int i=0; i<bufferSizeInBytes; i++)
{
fftTempArray[i] = new Complex(micBufferData[i], 0);
}
//Obtain array of FFT data
final Complex[] fftArray = FFT.fft(fftTempArray);
final Complex[] fftInverse = FFT.ifft(fftTempArray);
//Create an array of magnitude of fftArray
double[] magnitude = new double[fftArray.length];
for (int i=0; i<fftArray.length; i++){
magnitude[i]= fftArray[i].abs();
}
fft.setTextColor(Color.GREEN);
fft.setText("fftArray is "+ fftArray[500] +" and fftTempArray is "+fftTempArray[500] + " and fftInverse is "+fftInverse[500]+" and audioData is "+audioData[500]+ " and magnitude is "+ magnitude[1] + ", "+magnitude[500]+", "+magnitude[1000]+" Good job!");
for(int i = 2; i < samples; i++){
fft.append(" " + magnitude[i] + " Hz");
}
最后一点只是检查我得到的值(并让我保持理智!)。在上面提到的帖子中,它谈到了需要采样频率并给出了这段代码:
private double ComputeFrequency(int arrayIndex) {
return ((1.0 * sampleRate) / (1.0 * fftOutWindowSize)) * arrayIndex;
}
如何实现这段代码?我不太明白 fftOutWindowSize 和 arrayIndex 来自哪里?
非常感谢任何帮助!
达斯汀
I have been working on an Android project for awhile that displays the fundamental frequency of an input signal (to act as a tuner). I have successfully implemented the AudioRecord class and am getting data from it. However, I am having a hard time performing an FFT on this data to get the fundamental frequency of the input signal. I have been looking at the post here, and am using FFT in Java and Complex class to go with it.
I have successfully used the FFT function found in FFT in Java, but I am not sure if I am obtaining the correct results. For the magnitude of the FFT (sqrt[rere+imim]) I am getting values that start high, around 15000 Hz, and then slowly diminish to about 300 Hz. Doesn't seem right.
Also, as far as the raw data from the mic goes, the data seems fine, except that the first 50 values or so are always the number 3, unless I hit the tuning button again while still in the application and then I only get about 15. Is that normal?
Here is a bit of my code.
First of all, I convert the short data (obtained from the microphone) to a double using the following code which is from the post I have been looking at. This snippet of code I do not completely understand, but I think it works.
//Conversion from short to double
double[] micBufferData = new double[bufferSizeInBytes];//size may need to change
final int bytesPerSample = 2; // As it is 16bit PCM
final double amplification = 1.0; // choose a number as you like
for (int index = 0, floatIndex = 0; index < bufferSizeInBytes - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = audioData[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = amplification * (sample / 32768.0);
micBufferData[floatIndex] = sample32;
}
The code then continues as follows:
//Create Complex array for use in FFT
Complex[] fftTempArray = new Complex[bufferSizeInBytes];
for (int i=0; i<bufferSizeInBytes; i++)
{
fftTempArray[i] = new Complex(micBufferData[i], 0);
}
//Obtain array of FFT data
final Complex[] fftArray = FFT.fft(fftTempArray);
final Complex[] fftInverse = FFT.ifft(fftTempArray);
//Create an array of magnitude of fftArray
double[] magnitude = new double[fftArray.length];
for (int i=0; i<fftArray.length; i++){
magnitude[i]= fftArray[i].abs();
}
fft.setTextColor(Color.GREEN);
fft.setText("fftArray is "+ fftArray[500] +" and fftTempArray is "+fftTempArray[500] + " and fftInverse is "+fftInverse[500]+" and audioData is "+audioData[500]+ " and magnitude is "+ magnitude[1] + ", "+magnitude[500]+", "+magnitude[1000]+" Good job!");
for(int i = 2; i < samples; i++){
fft.append(" " + magnitude[i] + " Hz");
}
That last bit is just to check what values I am getting (and to keep me sane!). In the post referred to above, it talks about needing the sampling frequency and gives this code:
private double ComputeFrequency(int arrayIndex) {
return ((1.0 * sampleRate) / (1.0 * fftOutWindowSize)) * arrayIndex;
}
How do I implement this code? I don't realy understand where fftOutWindowSize and arrayIndex comes from?
Any help is greatly appreciated!
Dustin
发布评论
评论(3)
最近我正在做一个需要几乎相同的项目。也许你不再需要任何帮助,但无论如何我都会给出我的想法。也许将来有人需要这个。
"double[] micBufferData = new double[bufferSizeInBytes];"
我认为micBufferData
的大小应该是“bufferSizeInBytes / 2
”,因为每个样本占用两个字节,并且micBufferData
的大小应该是样本号。要使用您最后给出的代码,您应该首先在样本数组中找到峰值索引。我使用 double 数组作为输入而不是 Complex,所以在我的例子中,它类似于: double maxVal = -1;int maxIndex = -1;
2*j 是实部,2*j+1 是虚部。
maxIndex
是您想要的峰值幅度的索引(更多详细信息此处),并将其用作ComputeFrequency
函数的输入。返回值是您想要的样本数组的频率。希望它可以帮助某人。
Recently I'm working on a project which requires almost the same. Probably you don't need any help anymore but I will give my thoughts anyway. Maybe someone need this in the future.
"double[] micBufferData = new double[bufferSizeInBytes];"
I think the size ofmicBufferData
should be "bufferSizeInBytes / 2
", since every sample takes two bytes and the size ofmicBufferData
should be the sample number.To use the code you gave at last, you should firstly find the peak index in the sample array. I used double array as input instead of Complex, so in my case it is something like:
double maxVal = -1;int maxIndex = -1;
2*j is the real part and 2*j+1 is the imaginary part.
maxIndex
is the index of the peak magnitude you want (More detail here), and use it as input to theComputeFrequency
function. The return value is the frequency of the sample array you want.Hopefully it can help someone.
您应该根据时间与频率分辨率要求选择 FFT 窗口大小,而不仅仅是在创建 FFT 临时数组时使用音频缓冲区大小。
数组索引是您的 int i,如您的模量 [i] 打印语句中所使用的那样。
音乐的基本音调频率通常与 FFT 峰值幅度不同,因此您可能需要研究一些音调估计算法。
You should pick an FFT window size depending on your time versus frequency resolution requirements, and not just use the audio buffer size when creating your FFT temp array.
The array index is your int i, as used in your magnitude[i] print statement.
The fundamental pitch frequency for music is often different from FFT peak magnitude, so you may want to research some pitch estimation algorithms.
我怀疑您得到的奇怪结果是因为您可能需要解压 FFT。如何完成此操作取决于您使用的库(请参阅 此处 有关如何将其打包到 GSL 中的文档)。堆积可能意味着实部和虚部不在数组中您期望的位置。
对于有关窗口大小和分辨率的其他问题,如果您正在创建调谐器,那么我建议尝试大约 20ms 的窗口大小(例如 44.1kHz 下的 1024 个样本)。对于调谐器,您需要相当高的分辨率,因此您可以尝试按 8 或 16 倍进行零填充,这将为您提供 3-6Hz 的分辨率。
I suspect that the strange results you're getting are because you might need to unpack the FFT. How this is done will depend on the library that you're using (see here for docs on how it's packed in GSL, for example). The packing may mean that the real and imaginary components are not in the positions in the array that you expect.
For your other questions about window size and resolution, if you're creating a tuner then I'd suggest trying a window size of about 20ms (eg 1024 samples at 44.1kHz). For a tuner you need quite high resolution, so you could try zero-padding by a factor of 8 or 16 which will give you a resolution of 3-6Hz.