曲线与自定义功能的直方图不符合

发布于 2025-02-13 22:32:09 字数 1051 浏览 0 评论 0原文

我试图将曲线拟合到直方图，但是即使直方图不是直方图，所得曲线也是平坦的。如何正确安装曲线？

我当前的代码：

import numpy as np

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

import pandas as pd

import scipy.optimize as optimization


x = np.random.normal(1e-10, 1e-7, size=10000)

def func(x, a, b, c):
    return a * (np.exp(-b*(x-c)**2))

bins=25

logbins = np.logspace(np.log10(1.0E-10),np.log10(1E-07),bins)


bin_heights, bin_borders, _ = plt.hist(x, bins=logbins, edgecolor='black', color='b')
bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2

x0    = np.array([0.0, 0.0, 0.0])

popt,cov = optimization.curve_fit(func, bin_centers, bin_heights,x0)
a,b,c=popt

popt, pcov = curve_fit(func, bin_centers, bin_heights, p0=[a,b,c])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 1000)

plt.plot(x_interval_for_fit, func(x_interval_for_fit, *popt), label='Fit',color='r')


plt.xscale('log')

plt.show()

结果：

“显示我的曲线的屏幕截图是平坦的，但我的直方图增加了”

原文

I am trying to fit a curve to a histogram, but the resulting curve is flat even though the histogram was not. How can I fit the curve correctly?

My current code:

import numpy as np

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

import pandas as pd

import scipy.optimize as optimization


x = np.random.normal(1e-10, 1e-7, size=10000)

def func(x, a, b, c):
    return a * (np.exp(-b*(x-c)**2))

bins=25

logbins = np.logspace(np.log10(1.0E-10),np.log10(1E-07),bins)


bin_heights, bin_borders, _ = plt.hist(x, bins=logbins, edgecolor='black', color='b')
bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2

x0    = np.array([0.0, 0.0, 0.0])

popt,cov = optimization.curve_fit(func, bin_centers, bin_heights,x0)
a,b,c=popt

popt, pcov = curve_fit(func, bin_centers, bin_heights, p0=[a,b,c])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 1000)

plt.plot(x_interval_for_fit, func(x_interval_for_fit, *popt), label='Fit',color='r')


plt.xscale('log')

plt.show()

The result:

Screen shot showing my curve is flat but my histogram increases

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

他夏了夏天 2025-02-20 22:32:10

您会获得不良结果，因为您使用的函数适合直方图根本不像直方图的形状。通过使用简单的二阶插值函数，结果要好得多（尽管您可能会说不理想）：

def func(x, a, b, c):
    return a * x**2 + b * x + c  # Simple 2nd-order polynomial

与代码一起使用（您可以删除两个优化步骤，并且只能执行一次），我得到了以下结果：

您代码中可能无意识的一件事是，尽管您是事实，但在直方图上创建了一个正态分布，您决定以令人惊讶的方式汇总它们：仅考虑您在分布的一侧的东西（因为您从1e-10开始，并且您的分布是中心在1E-10中），然后将垃圾桶大小对数增加，最终会在较大的垃圾箱上获得更多点。另外，您忽略了一半以上的积分（这些要小于1E-10，检查hist's 文档）。您可以通过：np.sum（bin_heights）检查它，您会发现计数不到len（x）的一半。

如果我上面所说的是无意的，您真的想找出随机数的原始生成高斯（但使用计数而不是密度），那么您应该直接在未修改的直方图上做到这一点。但是，一个问题，因为您使用的是非常小的标准偏差，所以问题的条件非常糟糕（从某种意义上说，雅各布的价值差异很大，具体取决于您走到哪个维度）。您可以感觉到优化者通过手动计算错误并在改变参数时查看其更改（或不改变）来理解每个参数的影响是多么困难。由于您正在进行最小二乘优化，因此您的成本函数看起来像这样：

f = lambda a, b, c: sum((bin_heights - func(bin_centers, a, b, c))**2)

如果您使用此功能稍微播放一些初始条件，请查看优化者所看到的，您会注意到为什么会融合很难：

f(0, 0, 0)   # 8_407_566
f(0, 0, 1)   # 8_407_566, i.e. no change
f(0, 1, 0)   # 8_407_566, i.e. no change
f(1, 0, 0)   # 8_387_591, clearly an improvement

这不是数学上严格的论点（a和b接近0时，雅各布的不良条件均接近0），但它很好地直觉了。

因此，您需要一个很好的“第一个猜测”。当然，在这个问题中，由于您知道自己正在尝试适合高斯，因此可以将平均值，最大值和标准偏差用作初始参数（在您的情况下，是标准偏差正方形的倒数）。下面的代码显示了一组“较少调谐”的初始参数，这些参数也有效：

def func(x, a, b, c):  # Original function
    return a * np.exp(- b * (x - c)**2)

bins=25  # Let hist do its magic
bin_heights, bin_borders, _ = plt.hist(x, bins=bins, edgecolor='black', color='b')
bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2

x0 = np.array([600.0, 1.0e7, 0.0])

popt, cov = curve_fit(func, bin_centers, bin_heights, x0)

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 1000)

plt.plot(x_interval_for_fit, func(x_interval_for_fit, *x0), label='Initial guess',color='y')
plt.plot(x_interval_for_fit, func(x_interval_for_fit, *popt), label='Fit',color='r')
plt.show()

结果，结果是由初始猜测（以黄色）创建的曲线以及插值的结果（红色）：

You are getting bad results because the function you are using to fit the histogram doesn't look like the shape of the histogram at all. By using a simple second order interpolation function, the results are a lot better (though you might say not ideal):

def func(x, a, b, c):
    return a * x**2 + b * x + c  # Simple 2nd-order polynomial

Using it with your code (you can remove the two optimisation steps and do that only once), I got the following result:

One thing that may have been unintentional in your code is that, in spite of the fact that you created a normal distribution, on your histogram you decided to bin them in a surprising way: considering only what you have on one side of the distribution (since you start from 1e-10, and your distribution is centred in 1e-10) and you increase your bin size logarithmically to the right, you'll end up with more points on the larger bins. Also, you are ignoring more than half of your points (those the are smaller than 1e-10, check hist's documentation). You can check it by: np.sum(bin_heights), you'll see that the count is less than half of len(x).

If all I said above was unintentional and you really wanted to find out the original generating gaussian of the random numbers (but with counts instead of density), you should do that directly over your unmodified histogram. One problem though, since you are using a very small standard deviation, is that the problem is very badly conditioned (in the sense that the Jacobian has widely varying values, depending on which dimension you go). You can have a feeling for "how hard" it is for the optimiser to understand the influence of each parameter by calculating the error manually and seeing how it changes (or doesn't) as you vary the parameters. Since you are doing a least-squares optimisation, your cost function looks like this:

f = lambda a, b, c: sum((bin_heights - func(bin_centers, a, b, c))**2)

If you play with this function a bit around 0 initial condition, to see what the optimiser is seeing, you'll notice why convergence is difficult:

f(0, 0, 0)   # 8_407_566
f(0, 0, 1)   # 8_407_566, i.e. no change
f(0, 1, 0)   # 8_407_566, i.e. no change
f(1, 0, 0)   # 8_387_591, clearly an improvement

This is not a mathematically rigorous argument (that comes from the ill-conditioning of the Jacobian when a and b are close to 0), but it gives a good intuition of what is going wrong.

Thus, you need a good "first guess". Of course, in this problem, since you know that you are trying to fit a gaussian, you could use the average, max value and standard deviation as initial parameters (in your case, the inverse of the square of the standard deviation). The code below shows a set of "less tuned" initial parameters that also work:

def func(x, a, b, c):  # Original function
    return a * np.exp(- b * (x - c)**2)

bins=25  # Let hist do its magic
bin_heights, bin_borders, _ = plt.hist(x, bins=bins, edgecolor='black', color='b')
bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2

x0 = np.array([600.0, 1.0e7, 0.0])

popt, cov = curve_fit(func, bin_centers, bin_heights, x0)

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 1000)

plt.plot(x_interval_for_fit, func(x_interval_for_fit, *x0), label='Initial guess',color='y')
plt.plot(x_interval_for_fit, func(x_interval_for_fit, *popt), label='Fit',color='r')
plt.show()

And the result, with the curve created by the initial guess (in yellow) and the result of the interpolation (in red):