使用 scipy、numpy、python 等进行 sigmoidal 回归
我有两个变量(x 和 y),它们彼此之间存在一定的 sigmoidal 关系,并且我需要找到某种预测方程,使我能够在给定 x 的任何值的情况下预测 y 的值。我的预测方程需要显示两个变量之间的 S 形关系。因此,我不能满足于产生一条线的线性回归方程。我需要看到两个变量图表右侧和左侧发生的斜率逐渐曲线变化。
在谷歌搜索曲线回归和 python 之后,我开始使用 numpy.polyfit,但这给了我可怕的结果,如果你运行下面的代码,你可以看到。 谁能告诉我如何重写下面的代码以获得我想要的 sigmoidal 回归方程类型?
如果你运行下面的代码,你可以看到它给出了一个向下的抛物线,这不是我的变量之间的关系应该是什么样的。相反,我的两个变量之间应该有更多的 sigmoidal 关系,但与我在下面的代码中使用的数据紧密配合。下面代码中的数据来自大样本研究,因此它们比五个数据点可能暗示的统计能力更强。我没有大样本研究的实际数据,但我有下面的方法及其标准差(我没有显示)。我更愿意只用下面列出的平均数据绘制一个简单的函数,但如果复杂性能够带来实质性的改进,代码可能会变得更复杂。
如何更改我的代码以显示 sigmoidal 函数的最佳拟合,最好使用 scipy、numpy 和 python?这是我的代码的当前版本,需要待修复:
import numpy as np
import matplotlib.pyplot as plt
# Create numpy data arrays
x = np.array([821,576,473,377,326])
y = np.array([255,235,208,166,157])
# Use polyfit and poly1d to create the regression equation
z = np.polyfit(x, y, 3)
p = np.poly1d(z)
xp = np.linspace(100, 1600, 1500)
pxp=p(xp)
# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.ylim(140,310)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()
编辑如下:(重新构建问题)
您的回复及其速度非常令人印象深刻。谢谢你,乌努特布。 但是,为了产生更有效的结果,我需要重新构建我的数据值。这意味着将 x 值重新转换为最大 x 值的百分比,同时将 y 值重新转换为原始数据中 x 值的百分比。我尝试使用您的代码执行此操作,并得出以下结果:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
# Create numpy data arrays
'''
# Comment out original data
#x = np.array([821,576,473,377,326])
#y = np.array([255,235,208,166,157])
'''
# Re-calculate x values as a percentage of the first (maximum)
# original x value above
x = np.array([1.000,0.702,0.576,0.459,0.397])
# Recalculate y values as a percentage of their respective x values
# from original data above
y = np.array([0.311,0.408,0.440,0.440,0.482])
def sigmoid(p,x):
x0,y0,c,k=p
y = c / (1 + np.exp(-k*(x-x0))) + y0
return y
def residuals(p,x,y):
return y - sigmoid(p,x)
p_guess=(600,200,100,0.01)
(p,
cov,
infodict,
mesg,
ier)=scipy.optimize.leastsq(residuals,p_guess,args=(x,y),full_output=1,warning=True)
'''
# comment out original xp to allow for better scaling of
# new values
#xp = np.linspace(100, 1600, 1500)
'''
xp = np.linspace(0, 1.1, 1100)
pxp=sigmoid(p,xp)
x0,y0,c,k=p
print('''\
x0 = {x0}
y0 = {y0}
c = {c}
k = {k}
'''.format(x0=x0,y0=y0,c=c,k=k))
# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.ylim(0,1)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()
您能告诉我如何修复此修改后的代码吗?
注意:通过重新转换数据,我基本上将 2d (x,y) sigmoid 绕 z 轴旋转了 180 度。此外,1.000 并不是真正的 x 值的最大值。相反,1.000 是最大测试条件下不同测试参与者的值范围的平均值。
下面第二次编辑:
谢谢你,ubuntu。我仔细阅读了你的代码,并在 scipy 文档中查找了它的各个方面。由于您的名字似乎是作为 scipy 文档的作者出现的,我希望您能回答以下问题:
1.)leastsq() 是否调用残差(),然后返回输入 y 向量与sigmoid() 函数返回的 y 向量?如果是这样,它如何解释输入 y 向量和 sigmoid() 函数返回的 y 向量的长度差异?
2.) 看来我可以为任何数学方程调用leastsq(),只要我通过残差函数访问该数学方程,该函数又调用数学函数。这是真的吗?
3.) 另外,我注意到 p_guess 与 p 具有相同数量的元素。这是否意味着 p_guess 的四个元素分别与 x0、y0、c 和 k 返回的值按顺序对应?
4.) 作为参数发送到residuals() 和sigmoid() 函数的p 与leastsq() 输出的p 是否相同,并且leastsq() 函数在返回之前在内部使用该p?
5.) p 和 p_guess 是否可以有任意数量的元素,具体取决于用作模型的方程的复杂性,只要 p 中的元素数量等于 p_guess 中的元素数量?
I have two variables (x and y) that have a somewhat sigmoidal relationship with each other, and I need to find some sort of prediction equation that will enable me to predict the value of y, given any value of x. My prediction equation needs to show the somewhat sigmoidal relationship between the two variables. Therefore, I cannot settle for a linear regression equation that produces a line. I need to see the gradual, curvilinear change in slope that occurs at both the right and left of the graph of the two variables.
I started using numpy.polyfit after googling curvilinear regression and python, but that gave me the awful results you can see if you run the code below. Can anyone show me how to re-write the code below to get the type of sigmoidal regression equation that I want?
If you run the code below, you can see that it gives a downward facing parabola, which is not what the relationship between my variables should look like. Instead, there should be more of a sigmoidal relationship between my two variables, but with a tight fit with the data that I am using in the code below. The data in the code below are means from a large-sample research study, so they pack more statistical power than their five data points might suggest. I do not have the actual data from the large-sample research study, but I do have the means below and their standard deviations(which I am not showing). I would prefer to just plot a simple function with the mean data listed below, but the code could get more complex if complexity would offer substantial improvements.
How can I change my code to show a best fit of a sigmoidal function, preferably using scipy, numpy, and python? Here is the current version of my code, which needs to be fixed:
import numpy as np
import matplotlib.pyplot as plt
# Create numpy data arrays
x = np.array([821,576,473,377,326])
y = np.array([255,235,208,166,157])
# Use polyfit and poly1d to create the regression equation
z = np.polyfit(x, y, 3)
p = np.poly1d(z)
xp = np.linspace(100, 1600, 1500)
pxp=p(xp)
# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.ylim(140,310)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()
EDIT BELOW: (Re-framed the question)
Your response, and its speed, are very impressive. Thank you, unutbu.
But, in order to produce more valid results, I need to re-frame my data values. This means re-casting x values as a percentage of the max x value, while re-casting y values as a percentage of the x-values in the original data. I tried to do this with your code, and came up with the following:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
# Create numpy data arrays
'''
# Comment out original data
#x = np.array([821,576,473,377,326])
#y = np.array([255,235,208,166,157])
'''
# Re-calculate x values as a percentage of the first (maximum)
# original x value above
x = np.array([1.000,0.702,0.576,0.459,0.397])
# Recalculate y values as a percentage of their respective x values
# from original data above
y = np.array([0.311,0.408,0.440,0.440,0.482])
def sigmoid(p,x):
x0,y0,c,k=p
y = c / (1 + np.exp(-k*(x-x0))) + y0
return y
def residuals(p,x,y):
return y - sigmoid(p,x)
p_guess=(600,200,100,0.01)
(p,
cov,
infodict,
mesg,
ier)=scipy.optimize.leastsq(residuals,p_guess,args=(x,y),full_output=1,warning=True)
'''
# comment out original xp to allow for better scaling of
# new values
#xp = np.linspace(100, 1600, 1500)
'''
xp = np.linspace(0, 1.1, 1100)
pxp=sigmoid(p,xp)
x0,y0,c,k=p
print('''\
x0 = {x0}
y0 = {y0}
c = {c}
k = {k}
'''.format(x0=x0,y0=y0,c=c,k=k))
# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.ylim(0,1)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()
Can you show me how to fix this revised code?
NOTE: By re-casting the data, I have essentially rotated the 2d (x,y) sigmoid about the z-axis by 180 degrees. Also, the 1.000 is not really a maximum of the x values. Instead, 1.000 is a mean of the range of values from different test participants in a maximum test condition.
SECOND EDIT BELOW:
Thank you, ubuntu. I carefully read through your code and looked aspects of it up in the scipy documentation. Since your name seems to pop up as a writer of the scipy documentation, I am hoping you can answer the following questions:
1.) Does leastsq() call residuals(), which then returns the difference between the input y-vector and the y-vector returned by the sigmoid() function? If so, how does it account for the difference in the lengths of the input y-vector and the y-vector returned by the sigmoid() function?
2.) It looks like I can call leastsq() for any math equation, as long as I access that math equation through a residuals function, which in turn calls the math function. Is this true?
3.) Also, I notice that p_guess has the same number of elements as p. Does this mean that the four elements of p_guess correspond in order, respectively, with the values returned by x0,y0,c, and k?
4.) Is the p that is sent as an argument to the residuals() and sigmoid() functions the same p that will be output by leastsq(), and the leastsq() function is using that p internally before returning it?
5.) Can p and p_guess have any number of elements, depending on the complexity of the equation being used as a model, as long as the number of elements in p is equal to the number of elements in p_guess?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用 scipy.optimize.leastsq< /a>:
产生
带有 sigmoid 参数的
请注意,对于较新版本的 scipy(例如 0.9),还有 < a href="http://docs.scipy.org/doc/scipy/reference/ generated/scipy.optimize.curve_fit.html#scipy-optimize-curve-fit" rel="noreferrer">scipy.optimize.curve_fit< /a> 函数比
leastsq
更容易使用。使用curve_fit
拟合 sigmoid 的相关讨论可以在 这里。编辑:添加了
resize
函数,以便可以重新缩放和移动原始数据以适应任何所需的边界框。免责声明:我不是 scipy 文档的作者。我只是一个用户,而且还是一个新手。我对
leastsq
的了解大部分来自于阅读 本教程,由 Travis Oliphant 编写。是的!确切地。
长度是相同的:
Numpy 的奇妙之处之一是它允许您编写对整个数组进行操作的“向量”方程。
可能看起来它适用于浮点数(确实如此),但如果你将
x
设为 numpy 数组,并且c
,k
,x0
,y0
浮点数,则方程将y
定义为与x
形状相同的 numpy 数组。因此 sigmoid(p,x) 返回一个 numpy 数组。 numpybook 中有关于它如何工作的更完整的解释(需要阅读numpy 的忠实用户)。真的。
leastsq
尝试最小化残差(差值)的平方和。它搜索参数空间(p
的所有可能值),寻找使平方和最小的p
。发送到residuals
的x
和y
是您的原始数据值。它们是固定的。他们没有改变。leastsq
试图最小化的是p
(sigmoid 函数中的参数)。正是如此!与牛顿法一样,
leastsq
需要对p
进行初始猜测。您将其作为p_guess
提供。当您看到时,您可以认为,作为第一遍的 lesssq 算法(实际上是 Levenburg-Marquardt 算法)的一部分,leastsq 调用
residuals(p_guess,x,y)
。之间的视觉相似性
请注意和
,它可能会帮助您记住
leastsq
参数的顺序和含义。residuals
,如sigmoid
返回一个numpy数组。对数组中的值进行平方,然后求和。这是要击败的数字。然后,随着leastsq
寻找一组使残差(p_guess,x,y) 最小化的值,p_guess
会发生变化。嗯,不完全是。正如您现在所知,随着
leastsq
搜索使residuals(p,x,y) 最小化的
。发送到p
值,p_guess
会发生变化。leastsq
的p
(呃,p_guess
)与返回的p
具有相同的形状由leastsq
提供。显然,这些值应该是不同的,除非你是一个猜测者:)是的。我还没有对
leastsq
对大量参数进行压力测试,但它是一个非常强大的工具。Using scipy.optimize.leastsq:
yields
with sigmoid parameters
Note that for newer versions of scipy (e.g. 0.9) there is also the scipy.optimize.curve_fit function which is easier to use than
leastsq
. A relevant discussion of fitting sigmoids usingcurve_fit
can be found here.Edit: A
resize
function was added so that the raw data could be rescaled and shifted to fit any desired bounding box.DISCLAIMER: I am not a writer of scipy documentation. I am just a user, and a novice at that. Much of what I know about
leastsq
comes from reading this tutorial, written by Travis Oliphant.Yes! exactly.
The lengths are the same:
One of the wonderful things about Numpy is that it allows you to write "vector" equations that operate on entire arrays.
might look like it works on floats (indeed it would) but if you make
x
a numpy array, andc
,k
,x0
,y0
floats, then the equation definesy
to be a numpy array of the same shape asx
. Sosigmoid(p,x)
returns a numpy array. There is a more complete explanation of how this works in the numpybook (required reading for serious users of numpy).True.
leastsq
attempts to minimize the sum of the squares of the residuals (differences). It searches the parameter-space (all possible values ofp
) looking for thep
which minimizes that sum of squares. Thex
andy
sent toresiduals
, are your raw data values. They are fixed. They don't change. It's thep
s (the parameters in the sigmoid function) thatleastsq
tries to minimize.Exactly so! Like Newton's method,
leastsq
needs an initial guess forp
. You supply it asp_guess
. When you seeyou can think that as part of the leastsq algorithm (really the Levenburg-Marquardt algorithm) as a first pass, leastsq calls
residuals(p_guess,x,y)
.Notice the visual similarity between
and
It may help you remember the order and meaning of the arguments to
leastsq
.residuals
, likesigmoid
returns a numpy array. The values in the array are squared, and then summed. This is the number to beat.p_guess
is then varied asleastsq
looks for a set of values which minimizesresiduals(p_guess,x,y)
.Well, not exactly. As you know by now,
p_guess
is varied asleastsq
searches for thep
value that minimizesresiduals(p,x,y)
. Thep
(er,p_guess
) that is sent toleastsq
has the same shape as thep
that is returned byleastsq
. Obviously the values should be different unless you are a hell of a guesser :)Yes. I haven't stress-tested
leastsq
for very large numbers of parameters, but it is a thrillingly powerful tool.正如 @unutbu 上面所指出的,
scipy
现在提供 scipy.optimize.curve_fit 其调用不太复杂。如果有人想要快速了解相同流程在这些术语中的样子,我将在下面提供一个最小的示例:其结果如下图所示:
As pointed out by @unutbu above
scipy
now provides scipy.optimize.curve_fit which possess a less complicated call. If someone wants a quick version of how the same process would look like in those terms I present a minimal example below:The result of this is shown in the next figure:
我认为任何阶数的多项式拟合都不会得到好的结果——因为
对于足够大和足够小的 X,所有多项式都会趋于无穷大,但 sigmoid 曲线将在每个方向上渐近逼近某个有限值。
我不是Python程序员,所以我不知道numpy是否有更通用的曲线拟合
常规。如果您必须自己动手,也许这篇关于逻辑回归的文章会给您一些想法。
I don't think you're going to get good results with a polynomial fit of any degree -- since
all polynomials go to infinity for sufficiently large and small X, but a sigmoid curve will asymptotically approach some finite value in each direction.
I'm not a Python programmer, so I don't know if numpy has a more general curve fitting
routine. If you have to roll your own, perhaps this article on Logistic regression will give you some ideas.
对于 Python 中的逻辑回归,scikits-learn 公开了高性能拟合代码:
http://scikit-learn.sourceforge.net/modules/linear_model.html#logistic-regression< /a>
For logistic regression in Python, the scikits-learn exposes high-performance fitting code:
http://scikit-learn.sourceforge.net/modules/linear_model.html#logistic-regression