截距是逻辑回归中实际值的一半

发布于 2025-02-11 21:16:12 字数 805 浏览 1 评论 0原文

对于一项科学研究，我需要使用Python和Sci-Kit学习分析传统的逻辑回归。在将我的回归模型与“ nunthy ='none'”一起拟合后，我可以获得正确的系数，但截距是实际值的一半。我的代码主要如下：

df = pd.read_excel("data.xlsx")
train, test = train_test_split(df, train_size = 0.8, random_state = 42)
train = train.drop(["Unnamed: 0"], axis = 1)
test = test.drop(["Unnamed: 0"], axis = 1)
x_train = train.drop(["GRUP"], axis = 1)
x_train = sm.add_constant(x_train)
y_train = train["GRUP"]
x_test = test.drop(["GRUP"], axis = 1)
x_test = sm.add_constant(x_test)
y_test = test["GRUP"]
model = sm.Logit(y_train, x_train).fit()
model.summary()
log = LogisticRegression(penalty = "none")
log.fit(x_train, y_train)
log.intercept_

使用统计模型，我获得了截距（常数）“ 28.7140”，但使用SCI-KIT学习“ 14.35698738”。其他系数相同。我在SPSS上验证了它，第一个是正确的值。我不想仅将StatsModels用于逻辑回归。你能帮忙吗？

PS：没有拦截模型效果很好。

原文

For a scientific study, I need to analyze the traditional logistic regression using python and sci-kit learn. After fitting my regression model with "penalty='none'", I can get the correct coefficients but the intercept is the half of the real value. My code is mostly as follows:

df = pd.read_excel("data.xlsx")
train, test = train_test_split(df, train_size = 0.8, random_state = 42)
train = train.drop(["Unnamed: 0"], axis = 1)
test = test.drop(["Unnamed: 0"], axis = 1)
x_train = train.drop(["GRUP"], axis = 1)
x_train = sm.add_constant(x_train)
y_train = train["GRUP"]
x_test = test.drop(["GRUP"], axis = 1)
x_test = sm.add_constant(x_test)
y_test = test["GRUP"]
model = sm.Logit(y_train, x_train).fit()
model.summary()
log = LogisticRegression(penalty = "none")
log.fit(x_train, y_train)
log.intercept_

With statsmodels I get the intercept (constant) "28.7140" but with the sci-kit learn "14.35698738". Other coefficients are same. I verified it on SPSS and the first one is the correct value. I don't want to use statsmodels only for logistic regression. Could you please help?

PS: Without intercept model works fine.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

请止步禁区 2025-02-18 21:16:12

此处的问题是，在您发布的代码中，您将常数项（1列）添加到x_train带有x_train = sm.add_constant（x_train）。然后，您将相同的x_train对象传递给Sklearn的logisticRegression（）方法，其中fit_intercept =的默认值是true> true。因此，在那个阶段，您最终会创建另一个恒定的术语，从而导致估计系数的差异。

因此，您应该在sklearn代码中关闭fit_intercept =，或者离开fit_intercept = true，但使用x_train数组没有添加的恒定项。