Python/SciPy 的寻峰算法
我可以通过查找一阶导数的零交叉或其他东西来自己编写一些东西,但这似乎是一个足够常见的函数,可以包含在标准库中。有人知道其中一个吗?
我的特定应用是二维数组,但通常它将用于在 FFT 等中查找峰值。
具体来说,在此类问题中,存在多个强峰值,然后有许多由噪声引起的较小“峰值”这应该被忽略。这些只是例子;不是我的实际数据:
一维峰值:
二维峰值:
峰值查找算法将找到这些峰值的位置(而不仅仅是它们的值),并且理想情况下会找到真正的样本间峰值,而不仅仅是具有最大值的索引,可能使用 二次插值之类的。
通常,您只关心一些强峰值,因此选择它们要么是因为它们高于特定阈值,要么是因为它们是有序列表的前 n 个峰值,按幅度排名。
正如我所说,我自己知道如何写这样的东西。我只是问是否有一个已知运行良好的预先存在的函数或包。
更新:
我翻译了一个 MATLAB 脚本,它对于一维模型工作得很好情况,但可能会更好。
更新的更新:
Sixtenbe 为一维案例创建了更好的版本。
I can write something myself by finding zero-crossings of the first derivative or something, but it seems like a common-enough function to be included in standard libraries. Anyone know of one?
My particular application is a 2D array, but usually it would be used for finding peaks in FFTs, etc.
Specifically, in these kinds of problems, there are multiple strong peaks, and then lots of smaller "peaks" that are just caused by noise that should be ignored. These are just examples; not my actual data:
1-dimensional peaks:
2-dimensional peaks:
The peak-finding algorithm would find the location of these peaks (not just their values), and ideally would find the true inter-sample peak, not just the index with maximum value, probably using quadratic interpolation or something.
Typically you only care about a few strong peaks, so they'd either be chosen because they're above a certain threshold, or because they're the first n peaks of an ordered list, ranked by amplitude.
As I said, I know how to write something like this myself. I'm just asking if there's a pre-existing function or package that's known to work well.
Update:
I translated a MATLAB script and it works decently for the 1-D case, but could be better.
Updated update:
sixtenbe created a better version for the 1-D case.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
函数
scipy.signal.find_peaks
,顾名思义,对此很有用。但重要的是要充分理解其参数宽度
、阈值
、距离
以及最重要的突出
根据我的测试和文档,突出的概念是保留良好峰值并丢弃噪声峰值的“有用概念”。
什么是(地形)突出?这是“从山顶到达任何更高地形所需的最低高度”,如下所示:
这个想法是:
测试:
我故意使用了(嘈杂的)频率变化的正弦曲线,因为它显示出许多困难。我们可以看到
width
参数在这里不是很有用,因为如果您将最小width
设置得太高,那么它将无法跟踪非常接近的峰值高频部分。如果您将宽度
设置得太低,信号的左侧部分将会出现许多不需要的峰值。距离
也有同样的问题。threshold
仅与直接邻居进行比较,这在这里没有用。prominence
是提供最佳解决方案的解决方案。请注意,您可以组合其中许多参数!代码:
The function
scipy.signal.find_peaks
, as its name suggests, is useful for this. But it's important to understand well its parameterswidth
,threshold
,distance
and above allprominence
to get a good peak extraction.According to my tests and the documentation, the concept of prominence is "the useful concept" to keep the good peaks, and discard the noisy peaks.
What is (topographic) prominence? It is "the minimum height necessary to descend to get from the summit to any higher terrain", as it can be seen here:
The idea is:
Test:
I used a (noisy) frequency-varying sinusoid on purpose because it shows many difficulties. We can see that the
width
parameter is not very useful here because if you set a minimumwidth
too high, then it won't be able to track very close peaks in the high frequency part. If you setwidth
too low, you would have many unwanted peaks in the left part of the signal. Same problem withdistance
.threshold
only compares with the direct neighbours, which is not useful here.prominence
is the one that gives the best solution. Note that you can combine many of these parameters!Code:
我正在研究类似的问题,并且发现一些最好的参考来自化学(来自质谱数据中发现的峰)。要全面了解峰值查找算法,请阅读此。这是我遇到过的对寻峰技术最好、最清晰的评论之一。 (小波最适合在噪声数据中查找此类峰值。)。
看起来您的峰值清晰可见,并且没有隐藏在噪音中。在这种情况下,我建议使用平滑的 savtizky-golay 导数来查找峰值(如果您只是区分上面的数据,您将得到一堆误报。)。这是一种非常有效的技术,并且非常容易实现(您确实需要一个带有基本操作的矩阵类)。如果您只是找到第一个 SG 导数的零交叉点,我想您会很高兴。
I'm looking at a similar problem, and I've found some of the best references come from chemistry (from peaks finding in mass-spec data). For a good thorough review of peaking finding algorithms read this. This is one of the best clearest reviews of peak finding techniques that I've run across. (Wavelets are the best for finding peaks of this sort in noisy data.).
It looks like your peaks are clearly defined and aren't hidden in the noise. That being the case I'd recommend using smooth savtizky-golay derivatives to find the peaks (If you just differentiate the data above you'll have a mess of false positives.). This is a very effective technique and is pretty easy to implemented (you do need a matrix class w/ basic operations). If you simply find the zero crossing of the first S-G derivative I think you'll be happy.
scipy 中有一个名为
scipy.signal.find_peaks_cwt
的函数,听起来很适合您的需求,但我没有使用它的经验,所以我不能推荐..http://docs.scipy.org/doc/scipy/reference/ generated/scipy。 signal.find_peaks_cwt.html
There is a function in scipy named
scipy.signal.find_peaks_cwt
which sounds like is suitable for your needs, however I don't have experience with it so I cannot recommend..http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html
对于那些不确定在 Python 中使用哪种峰值查找算法的人,这里是替代方案的快速概述:https: //github.com/MonsieurV/py-findpeaks
希望自己有一个与 MatLab
findpeaks
函数等效的函数,我发现 detect_peaks 函数 是一个很好的收获。非常容易使用:
这将为您提供:
For those not sure about which peak-finding algorithms to use in Python, here a rapid overview of the alternatives: https://github.com/MonsieurV/py-findpeaks
Wanting myself an equivalent to the MatLab
findpeaks
function, I've found that the detect_peaks function from Marcos Duarte is a good catch.Pretty easy to use:
Which will give you:
要检测正峰值和负峰值,PeakDetect 很有帮助。
To detect both positive and negative peaks, PeakDetect is helpful.
以可靠的方式检测频谱中的峰值已经进行了大量研究,例如 80 年代音乐/音频信号正弦建模的所有工作。在文献中查找“正弦建模”。
如果您的信号像示例一样干净,那么简单的“给我一些幅度高于 N 个邻居的信号”应该可以很好地工作。如果您有噪声信号,一种简单但有效的方法是及时查看峰值并跟踪它们:然后检测谱线而不是谱峰。 IOW,您在信号的滑动窗口上计算 FFT,以及时获得一组频谱(也称为频谱图)。然后,您可以查看光谱峰值随时间的演变(即在连续窗口中)。
Detecting peaks in a spectrum in a reliable way has been studied quite a bit, for example all the work on sinusoidal modelling for music/audio signals in the 80ies. Look for "Sinusoidal Modeling" in the literature.
If your signals are as clean as the example, a simple "give me something with an amplitude higher than N neighbours" should work reasonably well. If you have noisy signals, a simple but effective way is to look at your peaks in time, to track them: you then detect spectral lines instead of spectral peaks. IOW, you compute the FFT on a sliding window of your signal, to get a set of spectrum in time (also called spectrogram). You then look at the evolution of the spectral peak in time (i.e. in consecutive windows).
有用于查找数据异常值的标准统计函数和方法,这可能是您在第一种情况下需要的。使用衍生品可以解决你的第二个问题。然而,我不确定是否有一种方法可以同时解决连续函数和采样数据。
There are standard statistical functions and methods for finding outliers to data, which is probably what you need in the first case. Using derivatives would solve your second. I'm not sure for a method which solves both continuous functions and sampled data, however.
我不认为您正在寻找的内容是由 SciPy 提供的。在这种情况下,我会自己编写代码。
scipy.interpolate 的样条插值和平滑非常好,对于拟合峰值然后找到最大值的位置可能非常有帮助。
I do not think that what you are looking for is provided by SciPy. I would write the code myself, in this situation.
The spline interpolation and smoothing from scipy.interpolate are quite nice and might be quite helpful in fitting peaks and then finding the location of their maximum.
首先,如果没有进一步的规范,“峰值”的定义是模糊的。例如,对于以下系列,您会将 5-4-5 称为一个峰还是两个峰?
1-2-1-2-1-1-5-4-5-1-1-5-1
在这种情况下,您至少需要两个阈值:1) 一个高阈值,只有高于该阈值才会出现极值登记为峰值; 2) 较低的阈值,使得由低于该阈值的小值分隔的极值将成为两个峰值。
峰值检测是极值理论文献中经过深入研究的主题,也称为“极值去簇”。其典型应用包括根据环境变量的连续读数来识别危险事件,例如分析风速以检测风暴事件。
First things first, the definition of "peak" is vague if without further specifications. For example, for the following series, would you call 5-4-5 one peak or two?
1-2-1-2-1-1-5-4-5-1-1-5-1
In this case, you'll need at least two thresholds: 1) a high threshold only above which can an extreme value register as a peak; and 2) a low threshold so that extreme values separated by small values below it will become two peaks.
Peak detection is a well-studied topic in Extreme Value Theory literature, also known as "declustering of extreme values". Its typical applications include identifying hazard events based on continuous readings of environmental variables e.g. analysing wind speed to detect storm events.
正如此页面底部所述峰没有通用的定义。因此,如果不引入额外的假设(条件、参数等),寻找峰值的通用算法就无法发挥作用。此页面提供了一些最精简的建议。上面答案中列出的所有文献或多或少都是以迂回的方式做同样的事情,所以请随意选择。
无论如何,您有责任根据您的经验和相关频谱(曲线)的属性(噪声、采样、带宽等)缩小特征需要具有的属性范围,以便将其归类为峰值。
As mentioned at the bottom of this page there is no universal definition of a peak. Therefore a universal algorithm that finds peaks cannot work without bringing in additional assumptions (conditions, parameters etc.). This page provides some of the most stripped down suggestions. All the literature listed in the answers above is a more or less roundabout manner to do the same so feel free to take your pick.
In any case, it is your duty to narrow down the properties a feature needs to have in order to be classified as a peak, based on your experience and properties of spectra (curves) in question (noise, sampling, bandwidths, etc.)