稳定地找到曲线的肘点?

发布于 2024-11-19 11:25:17 字数 1334 浏览 1 评论 0 原文

我知道 此主题。不过,这次我想最终确定Python 中的实际实现。

我唯一的问题是肘点似乎随着代码的不同实例而变化。观察这篇文章中显示的两个图。虽然它们在视觉上看起来相似,但肘点的值发生了显着变化。两条曲线均由 20 次不同运行的平均值生成。即使这样,肘点的值也发生了显着的变化。我可以采取哪些预防措施来确保该值落在某个范围内?

我的尝试如下所示:

def elbowPoint(points):
  secondDerivative = collections.defaultdict(lambda:0)
  for i in range(1, len(points) - 1):
    secondDerivative[i] = points[i+1] + points[i-1] - 2*points[i]

  max_index = secondDerivative.values().index(max(secondDerivative.values()))
  elbow_point = max_index + 1
  return elbow_point

points = [0.80881476685027154, 0.79457906121371058, 0.78071124401504677, 0.77110686192601441, 0.76062373158581287, 0.75174963969985187, 0.74356408965979193, 0.73577573557299236, 0.72782434749305047, 0.71952590556748364, 0.71417942487824781, 0.7076502559300516, 0.70089375208028415, 0.69393584640497064, 0.68550490458450741, 0.68494440529025913, 0.67920157634796108, 0.67280267176628761]
max_point = elbowPoint(points)  

在此处输入图像描述 在此处输入图像描述

I am aware of the existence of this, and this on this topic. However, I would like to finalize on an actual implementation in Python this time.

My only problem is that the elbow point seems to be changing from different instantiations of my code. Observe the two plots shown in this post. While they appear to be visually similar, the value of the elbow point changed significantly. Both the curves were generated from an average of 20 different runs. Even then, there is a significant shift in the value of the elbow point. What precautions can I take to make sure that the value falls within a certain bound?

My attempt is shown below:

def elbowPoint(points):
  secondDerivative = collections.defaultdict(lambda:0)
  for i in range(1, len(points) - 1):
    secondDerivative[i] = points[i+1] + points[i-1] - 2*points[i]

  max_index = secondDerivative.values().index(max(secondDerivative.values()))
  elbow_point = max_index + 1
  return elbow_point

points = [0.80881476685027154, 0.79457906121371058, 0.78071124401504677, 0.77110686192601441, 0.76062373158581287, 0.75174963969985187, 0.74356408965979193, 0.73577573557299236, 0.72782434749305047, 0.71952590556748364, 0.71417942487824781, 0.7076502559300516, 0.70089375208028415, 0.69393584640497064, 0.68550490458450741, 0.68494440529025913, 0.67920157634796108, 0.67280267176628761]
max_point = elbowPoint(points)  

enter image description here
enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

浅语花开 2024-11-26 11:25:17

听起来您真正关心的是如何平滑数据,因为它包含噪声?在这种情况下,也许您应该首先将曲线拟合到数据,然后找到拟合曲线的肘部?

这是否有效取决于噪声源,以及噪声对您的应用是否重要?顺便说一句,您可能想通过查看拟合中省略一个点时数据如何变化(或希望不变化)来了解拟合对数据的敏感度(显然,使用足够高的多项式,您总是会得到很好的拟合)到一组特定的数据,但您可能对一般情况感兴趣)

我不知道这种方法是否可以接受,直观上我认为对小错误的敏感性不好。最终,通过拟合曲线,您可以说在理想情况下,基本过程是由曲线建模的,任何与曲线的偏差都是错误/噪声

Its sounds like your actual concern is how to smooth your data as it contains noise? in which case perhaps you should fit a curve to the data first, then find the elbow of the fitted curve?

Whether this will work would depend on the source of the noise, and if the noise is important for your application? by the way you may want to see how sensitive your fit is to your data by seeing how it changes (or hopefully doesn't) when a point is omitted from the fit (obviously with a high enough polynomial you will always get a good fit to a specific set of data, but you are presumably interested in the general case)

I have no idea if this approach is acceptable, intuitively though i'd think that sensitivity to small errors is bad. ultimately by fitting a curve you are saying that the underlying process is, in the ideal case, modelled by the curve, and any deviation from the curve is an error/noise

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文