音频活动检测最简单快速的方法？

发布于 2024-09-07 23:48:50 字数 220 浏览 3 评论 0原文

给出的是一个包含 320 个元素的数组 (int16)，它们表示持续时间为 20 ms 的音频信号（16 位 LPCM）。我正在寻找一种最简单且非常快速的方法，该方法应该确定该数组是否包含活动音频（如语音或音乐），但不包含噪音或静音。我不需要很高的决策质量，但必须非常快。

我首先想到将元素的所有平方或绝对值相加，并将它们的总和与阈值进行比较，但这种方法在我的系统上非常慢，即使它是O(n)。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

几味少女 2024-09-14 23:48:50

你不会比平方和方法更快。

到目前为止，您可能尚未执行的一项优化是使用运行总计。也就是说，在每个时间步长中，不是对最后 n 个样本的平方求和，而是保留运行总计并用最近样本的平方进行更新。为了避免运行总计随着时间的推移不断增长，请添加指数衰减。用伪代码表示：

decay_constant=0.999;  // Some suitable value smaller than 1
total=0;
for t=1,...
    // Exponential decay
    total=total*decay_constant;

    // Add in latest sample
    total+=current_sample;

    if total>threshold
        // do something
    end
end

当然，您必须调整衰减常数和阈值以适合您的应用程序。如果这还不够快，无法实时运行，那么您的 DSP严重性能不足......

You're not going to get much faster than a sum-of-squares approach.

One optimization that you may not be doing so far is to use a running total. That is, in each time step, instead of summing the squares of the last n samples, keep a running total and update that with the square of the most recent sample. To avoid your running total from growing and growing over time, add an exponential decay. In pseudocode:

decay_constant=0.999;  // Some suitable value smaller than 1
total=0;
for t=1,...
    // Exponential decay
    total=total*decay_constant;

    // Add in latest sample
    total+=current_sample;

    if total>threshold
        // do something
    end
end

Of course, you'll have to tune the decay constant and threshold to suit your application. If this isn't fast enough to run in real time, you have a seriously underpowered DSP...

回复收藏 0 原文

甜妞爱困 2024-09-14 23:48:50

您可以尝试计算两个简单的“统计数据” - 首先是分布（最大-最小）。沉默的传播范围非常小。其次是多样性 - 将可能值的范围划分为 16 个括号（= 值范围），然后在浏览元素时确定该元素属于哪个括号。噪音对于所有括号都有相似的数字，而音乐或语音应该更喜欢其中的一些，而忽略其他的。

这应该可以在一次遍历数组的情况下完成，并且不需要复杂的算术，只需要对值进行一些加法和比较。

还要考虑一些近似值，例如仅取每个第四个值，从而将检查的元素数量减少到 80。对于音频信号，这应该没问题。

回复收藏 0 原文

嘴硬脾气大 2024-09-14 23:48:50

不久前我做了这样的事情。经过一些实验，我得出了一个在我的案例中运行良好的解决方案。

我使用了大约 120 毫秒内运行平均值的立方变化率。当没有声音（只有噪音）时，表达式应该在零附近徘徊。一旦比率在几次运行中开始增加，您可能会采取一些行动。


rate = cur_avg^3 - prev_avg^3

我使用了立方体，因为正方形不够有攻击性。如果立方体对你来说太慢，请尝试使用平方和位移位。希望这有帮助。

I did something like this a while back. After some experimentation I arrived at a solution that worked sufficiently well in my case.

I used the rate of change in the cube of the running average over about 120ms. When there is silence (only noise that is) the expression should be hovering around zero. As soon as the rate starts increasing over a couple of runs, you probably have some action going on.


rate = cur_avg^3 - prev_avg^3

I used a cube because the square just wasn't agressive enough. If the cube is to slow for you, try using the square and a bitshift instead. Hope this helps.

回复收藏 0 原文