我知道 此主题。不过,这次我想最终确定Python 中的实际实现。
我唯一的问题是肘点似乎随着代码的不同实例而变化。观察这篇文章中显示的两个图。虽然它们在视觉上看起来相似,但肘点的值发生了显着变化。两条曲线均由 20 次不同运行的平均值生成。即使这样,肘点的值也发生了显着的变化。我可以采取哪些预防措施来确保该值落在某个范围内?
我的尝试如下所示:
def elbowPoint(points):
secondDerivative = collections.defaultdict(lambda:0)
for i in range(1, len(points) - 1):
secondDerivative[i] = points[i+1] + points[i-1] - 2*points[i]
max_index = secondDerivative.values().index(max(secondDerivative.values()))
elbow_point = max_index + 1
return elbow_point
points = [0.80881476685027154, 0.79457906121371058, 0.78071124401504677, 0.77110686192601441, 0.76062373158581287, 0.75174963969985187, 0.74356408965979193, 0.73577573557299236, 0.72782434749305047, 0.71952590556748364, 0.71417942487824781, 0.7076502559300516, 0.70089375208028415, 0.69393584640497064, 0.68550490458450741, 0.68494440529025913, 0.67920157634796108, 0.67280267176628761]
max_point = elbowPoint(points)
I am aware of the existence of this, and this on this topic. However, I would like to finalize on an actual implementation in Python this time.
My only problem is that the elbow point seems to be changing from different instantiations of my code. Observe the two plots shown in this post. While they appear to be visually similar, the value of the elbow point changed significantly. Both the curves were generated from an average of 20 different runs. Even then, there is a significant shift in the value of the elbow point. What precautions can I take to make sure that the value falls within a certain bound?
My attempt is shown below:
def elbowPoint(points):
secondDerivative = collections.defaultdict(lambda:0)
for i in range(1, len(points) - 1):
secondDerivative[i] = points[i+1] + points[i-1] - 2*points[i]
max_index = secondDerivative.values().index(max(secondDerivative.values()))
elbow_point = max_index + 1
return elbow_point
points = [0.80881476685027154, 0.79457906121371058, 0.78071124401504677, 0.77110686192601441, 0.76062373158581287, 0.75174963969985187, 0.74356408965979193, 0.73577573557299236, 0.72782434749305047, 0.71952590556748364, 0.71417942487824781, 0.7076502559300516, 0.70089375208028415, 0.69393584640497064, 0.68550490458450741, 0.68494440529025913, 0.67920157634796108, 0.67280267176628761]
max_point = elbowPoint(points)
发布评论
评论(1)
听起来您真正关心的是如何平滑数据,因为它包含噪声?在这种情况下,也许您应该首先将曲线拟合到数据,然后找到拟合曲线的肘部?
这是否有效取决于噪声源,以及噪声对您的应用是否重要?顺便说一句,您可能想通过查看拟合中省略一个点时数据如何变化(或希望不变化)来了解拟合对数据的敏感度(显然,使用足够高的多项式,您总是会得到很好的拟合)到一组特定的数据,但您可能对一般情况感兴趣)
我不知道这种方法是否可以接受,直观上我认为对小错误的敏感性不好。最终,通过拟合曲线,您可以说在理想情况下,基本过程是由曲线建模的,任何与曲线的偏差都是错误/噪声
Its sounds like your actual concern is how to smooth your data as it contains noise? in which case perhaps you should fit a curve to the data first, then find the elbow of the fitted curve?
Whether this will work would depend on the source of the noise, and if the noise is important for your application? by the way you may want to see how sensitive your fit is to your data by seeing how it changes (or hopefully doesn't) when a point is omitted from the fit (obviously with a high enough polynomial you will always get a good fit to a specific set of data, but you are presumably interested in the general case)
I have no idea if this approach is acceptable, intuitively though i'd think that sensitivity to small errors is bad. ultimately by fitting a curve you are saying that the underlying process is, in the ideal case, modelled by the curve, and any deviation from the curve is an error/noise