C# 实时音频分析中的计时

发布于 2024-08-01 18:22:28 字数 219 浏览 11 评论 0原文

我正在尝试从 C# 中的实时音频确定“每分钟节拍数”。 不过,我检测到的并不是音乐,而是持续不断的敲击声。 我的问题是确定这些水龙头之间的时间,以便我可以确定“每分钟的水龙头”我尝试使用 WaveIn.cs 类,但我不太明白它是如何采样的。 我没有每秒获得一定数量的样本进行分析。 我想我真的不知道如何每秒读取确切数量的样本来了解样本之间的时间。

任何帮助我走向正确方向的帮助将不胜感激。

I'm trying to determine the "beats per minute" from real-time audio in C#. It is not music that I'm detecting in though, just a constant tapping sound. My problem is determining the time between those taps so I can determine "taps per minute" I have tried using the WaveIn.cs class out there, but I don't really understand how its sampling. I'm not getting a set number of samples a second to analyze. I guess I really just don't know how to read in an exact number of samples a second to know the time between to samples.

Any help to get me in the right direction would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

他不在意 2024-08-08 18:22:28

我不确定您使用的是哪个 WaveIn.cs 类,但通常使用录制音频的代码,您要么 A) 告诉代码开始录制,然后在稍后的某个时刻告诉代码停止,然后您得到返回一个数组(通常是short[]类型),其中包含该时间段内记录的数据; 或 B) 告诉代码以给定的缓冲区大小开始记录,并且当每个缓冲区被填充时,代码会回调您定义的方法并引用已填充的缓冲区,并且此过程将继续,直到您告诉它停止录音。

假设您的录制格式为每个样本 16 位(也称为 2 字节)、每秒 44100 个样本、单声道(1 通道)。 在 (A) 的情况下,假设您开始录制,然后在 10 秒后停止录制。 您最终将得到一个长度为 441,000 (44,100 x 10) 个元素的短[]数组。 我不知道您使用什么算法来检测“水龙头”,但假设您在该数组中的元素 0、元素 22,050、元素 44,100、元素 66,150 等处检测到水龙头。这意味着您会在每个 处找到水龙头。 5 秒(因为 22,050 是每秒 44,100 个样本的一半),这意味着每秒有 2 个敲击,因此为 120 BPM。

在 (B) 的情况下,假设您开始使用 44,100 个样本(也称为 1 秒)的固定缓冲区大小进行记录。 当每个缓冲区进入时,您会在元素 0 和元素 22,050 处找到抽头。 按照与上述相同的逻辑,您将计算出 120 BPM。

希望这可以帮助。 一般来说,对于节拍检测,最好记录相对较长的时间并通过大量数据对节拍进行计数。 尝试估计“瞬时”速度更加困难并且容易出错,就像实时估计录音的音高比录制完整音符更困难一样。

I'm not sure which WaveIn.cs class you're using, but usually with code that records audio, you either A) tell the code to start recording, and then at some later point you tell the code to stop, and you get back an array (usually of type short[]) that comprises the data recorded during this time period; or B) tell the code to start recording with a given buffer size, and as each buffer is filled, the code makes a callback to a method you've defined with a reference to the filled buffer, and this process continues until you tell it to stop recording.

Let's assume that your recording format is 16 bits (aka 2 bytes) per sample, 44100 samples per second, and mono (1 channel). In the case of (A), let's say you start recording and then stop recording exactly 10 seconds later. You will end up with a short[] array that is 441,000 (44,100 x 10) elements in length. I don't know what algorithm you're using to detect "taps", but let's say that you detect taps in this array at element 0, element 22,050, element 44,100, element 66,150 etc. This means you're finding taps every .5 seconds (because 22,050 is half of 44,100 samples per second), which means you have 2 taps per second and thus 120 BPM.

In the case of (B) let's say you start recording with a fixed buffer size of 44,100 samples (aka 1 second). As each buffer comes in, you find taps at element 0 and at element 22,050. By the same logic as above, you'll calculate 120 BPM.

Hope this helps. With beat detection in general, it's best to record for a relatively long time and count the beats through a large array of data. Trying to estimate the "instantaneous" tempo is more difficult and prone to error, just like estimating the pitch of a recording is more difficult to do in realtime than with a recording of a full note.

甜妞爱困 2024-08-08 18:22:28

我认为您可能会将样本与“水龙头”混淆。

样本是表示给定时刻声波高度的数字。 典型的波形文件每秒可能采样 44,100 次,因此如果您有两个立体声通道,则每秒有 88,200 个 16 位数字(样本)。

如果您获取所有这些数字并将它们绘制成图表,您将得到如下所示的内容:

替代文字
(来源:vbaccelerator.com)

What you are looking for is this peak ------------^

那是水龙头。

I think you might be confusing samples with "taps."

A sample is a number representing the height of the sound wave at a given moment in time. A typical wave file might be sampled 44,100 times a second, so if you have two channels for stereo, you have 88,200 sixteen-bit numbers (samples) per second.

If you take all of these numbers and graph them, you will get something like this:

alt text
(source: vbaccelerator.com)

What you are looking for is this peak ------------^

That is the tap.

極樂鬼 2024-08-08 18:22:28

假设我们正在讨论相同的 WaveIn.cs,则 WaveLib.WaveInRecorder 的构造函数采用 WaveLib.WaveFormat 对象作为参数。 这允许您设置音频格式,即。 采样率、位深度等。只需扫描音频样本中的峰值或检测“抽头”并记录峰值之间样本的平均距离。

由于您知道音频流的采样率(例如 44100 个采样/秒),因此请取平均峰值距离(以采样为单位),乘以 1/(采样率)即可得到两次点击之间的时间(以秒为单位),然后除以60 以获得点击之间的时间(以分钟为单位),并反转以获得点击/分钟。

希望有帮助

Assuming we're talking about the same WaveIn.cs, the constructor of WaveLib.WaveInRecorder takes a WaveLib.WaveFormat object as a parameter. This allows you to set the audio format, ie. samples rate, bit depth, etc. Just scan the audio samples for peaks or however you're detecting "taps" and record the average distance in samples between peaks.

Since you know the sample rate of the audio stream (eg. 44100 samples/second), take your average peak distance (in samples), multiply by 1/(samples rate) to get the time (in seconds) between taps, divide by 60 to get the time (in minutes) between taps, and invert to get the taps/minute.

Hope that helps

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文