我在插值时无法让 scipy.interpolate.UnivariateSpline 使用任何平滑。基于函数页面< /a> 以及一些 以前的帖子,我相信它应该提供平滑s
参数。
这是我的代码:
# Imports
import scipy
import pylab
# Set up and plot actual data
x = [0, 5024.2059124920379, 7933.1645067836089, 7990.4664106277542, 9879.9717114947653, 13738.60563208926, 15113.277958924193]
y = [0.0, 3072.5653360000988, 5477.2689107965398, 5851.6866463790966, 6056.3852496014106, 7895.2332350173638, 9154.2956175610598]
pylab.plot(x, y, "o", label="Actual")
# Plot estimates using splines with a range of degrees
for k in range(1, 4):
mySpline = scipy.interpolate.UnivariateSpline(x=x, y=y, k=k, s=2)
xi = range(0, 15100, 20)
yi = mySpline(xi)
pylab.plot(xi, yi, label="Predicted k=%d" % k)
# Show the plot
pylab.grid(True)
pylab.xticks(rotation=45)
pylab.legend( loc="lower right" )
pylab.show()
这是结果:
我已经尝试过一系列 s< /code> 值(0.01、0.1、1、2、5、50)以及显式权重,设置为相同的值(1.0)或随机。我仍然无法得到任何平滑,并且结的数量始终与数据点的数量相同。特别是,我正在寻找像第四点(7990.4664106277542、5851.6866463790966)这样的异常值进行平滑。
是因为我没有足够的数据吗?如果是这样,是否有类似的样条函数或聚类技术可以应用来实现这几个数据点的平滑?
I'm having trouble getting scipy.interpolate.UnivariateSpline to use any smoothing when interpolating. Based on the function's page as well as some previous posts, I believe it should provide smoothing with the s
parameter.
Here is my code:
# Imports
import scipy
import pylab
# Set up and plot actual data
x = [0, 5024.2059124920379, 7933.1645067836089, 7990.4664106277542, 9879.9717114947653, 13738.60563208926, 15113.277958924193]
y = [0.0, 3072.5653360000988, 5477.2689107965398, 5851.6866463790966, 6056.3852496014106, 7895.2332350173638, 9154.2956175610598]
pylab.plot(x, y, "o", label="Actual")
# Plot estimates using splines with a range of degrees
for k in range(1, 4):
mySpline = scipy.interpolate.UnivariateSpline(x=x, y=y, k=k, s=2)
xi = range(0, 15100, 20)
yi = mySpline(xi)
pylab.plot(xi, yi, label="Predicted k=%d" % k)
# Show the plot
pylab.grid(True)
pylab.xticks(rotation=45)
pylab.legend( loc="lower right" )
pylab.show()
Here is the result:
I have tried this with a range of s
values (0.01, 0.1, 1, 2, 5, 50), as well as explicit weights, set to either the same thing (1.0) or randomized. I still can't get any smoothing, and the number of knots is always the same as the number of data points. In particular, I'm looking for outliers like that 4th point (7990.4664106277542, 5851.6866463790966) to be smoothed over.
Is it because I don't have enough data? If so, is there a similar spline function or cluster technique I can apply to achieve smoothing with this few datapoints?
发布评论
评论(4)
简短的回答:您需要更仔细地选择
s
的值。的文档指出:
UnivariateSpline 如果您不传入显式权重,可以推断出平滑的“合理”值约为
s = m * v
,其中m
是数字数据点的数量和v
数据的方差。在本例中,s_good ~ 5e7
。编辑:
s
的合理值当然还取决于数据中的噪声水平。文档似乎建议在(m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) 范围内选择
其中s
)) * std**2std
是与要平滑的“噪声”相关的标准差。Short answer: you need to choose the value for
s
more carefully.The documentation for UnivariateSpline states that:
From this one can deduce that "reasonable" values for smoothing, if you don't pass in explicit weights, are around
s = m * v
wherem
is the number of data points andv
the variance of the data. In this case,s_good ~ 5e7
.EDIT: sensible values for
s
depend of course also on the noise level in the data. The docs seem to recommend choosings
in the range(m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2
wherestd
is the standard deviation associated with the "noise" you want to smooth over.@Zhenya 在数据点之间手动设置结的答案太粗糙,无法在噪声数据中提供良好的结果,而不选择如何应用该技术。然而,受到他/她建议的启发,我成功地使用了 Mean-Shift 聚类。它执行簇计数的自动确定,并且似乎做了相当好的平滑工作(实际上非常平滑)。
@Zhenya's answer of manually setting knots in between datapoints was too rough to deliver good results in noisy data without being selective about how this technique is applied. However, inspired by his/her suggestion, I have had success with Mean-Shift clustering from the scikit-learn package. It performs auto-determination of the cluster count and seems to do a fairly good smoothing job (very smooth in fact).
虽然我不知道有哪个库可以立即为你做这件事,但我会尝试更多的 DIY 方法:我会从在原始数据点之间制作一个带有结的样条线开始,在两个
x
和y
。在您的特定示例中,在第 4 点和第 5 点之间有一个结应该可以解决问题,因为它会消除x=8000
处的巨大导数。While I'm not aware of any library which will do it for you off-hand, I'd try a bit more DIY approach: I'd start from making a spline with knots in between the raw data points, in both
x
andy
. In your particular example, having a single knot in between the 4th and 5th points should do the trick, since it'd remove the huge derivative at aroundx=8000
.我在运行 BigChef 的答案时遇到了麻烦,这是一个适用于 python 3.6 的变体:
I had trouble getting BigChef's answer running, here is a variation that works on python 3.6: