系数相反的“标志”对于两个逻辑回归
我正在尝试使用距离(来自目标)作为特征构建XG模型,而目标变量是一个虚拟可观的,该变量指示射击是否导致目标。因此,我正在尝试进行简单的逻辑回归。我试图复制一个模型,其中拟合使用STATSMODELS包装,这导致正系数为0.16,截距为-0.5。
当我使用Scikit -Learn拟合线时,系数为-0.16。截距也发生了同样的情况,约为0.5。因此,系数以某种方式“翻转”。
数据集示例:
Goal X Y C Distance Angle
1 12 41 9.0 13.891814 0.474451
0 15 52 2.0 15.803560 0.453823
0 19 33 17.0 22.805811 0.280597
0 25 30 20.0 29.292704 0.223680
0 10 39 11.0 12.703248 0.479051
Scikit-Learn代码:
feature_cols = ['Distance']
X = shots_model[feature_cols] # Features
y = shots_model['Goal'] # Target
y = y.astype('category')
m1 = LogisticRegression()
m1.fit(X_train, y_train)
StatsModels代码:
test_model = smf.glm(formula="Goal ~ " + model, data=shots_model,
family=sm.families.Binomial()).fit()
print(test_model.summary())
b=test_model.params
我可能缺少一些简单的东西,因为我对机器学习非常陌生,这已经使我感到困惑了一段时间。请帮忙。
I am trying to build an xG-model using Distance (from goal) as feature and the target variable is a dummy-variable indicating whether the shot resulted in a goal or not. So I am trying to make a simple logistic regression. I tried to replicate a model where the fitting was done with the statsmodels-package, which resulted in a positive coefficient of 0.16 and an intercept of -0.5.
When I fitted the line using scikit-learn the coefficient was -0.16. The same happened with the intercept, which was around 0.5. So somehow the coefficients have "flipped".
Dataset example:
Goal X Y C Distance Angle
1 12 41 9.0 13.891814 0.474451
0 15 52 2.0 15.803560 0.453823
0 19 33 17.0 22.805811 0.280597
0 25 30 20.0 29.292704 0.223680
0 10 39 11.0 12.703248 0.479051
scikit-learn code:
feature_cols = ['Distance']
X = shots_model[feature_cols] # Features
y = shots_model['Goal'] # Target
y = y.astype('category')
m1 = LogisticRegression()
m1.fit(X_train, y_train)
statsmodels code:
test_model = smf.glm(formula="Goal ~ " + model, data=shots_model,
family=sm.families.Binomial()).fit()
print(test_model.summary())
b=test_model.params
I am probably missing something simple, as I am pretty new to Machine Learning, and this has been puzzling me for some time now. Please help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不确定您的输出是什么。但是,您现在可以做的是在新的测试数据上测试模型。获得的预测是分数值(0到1之间),表示导致目标的概率。然后将这些值舍入以获得1或0的离散值。在此之后,您可以使用混淆矩阵或精确函数函数来测试模型的准确性。有关更详细的代码,您可以参考本文。 https://www.geeksforgeeks.orgss.orgs.org/logistic-logistic-rogission-regression-regression-using-statsmodels/ <-statsmodels/ /a>
我认为,如果您可以获得两个模块的相应二进制结果,并且两个型号的准确性很接近,那么您就不必担心翻转系数。基本上,我的想法是,如果您可以在两种方法上获得准确的预测(1或0),那么一切都很好。希望我的答案对您有帮助!
I am not sure what your outputs are. However, what you can do now is to test your model on new test data. The predictions obtained are fractional values(between 0 and 1) which denote the probability of resulting in a goal. Then round these values to obtain the discrete values of 1 or 0. After that, you can use a confusion matrix or the accuracy_score function to test the accuracy of your models. For more detailed code, you can refer to this article. https://www.geeksforgeeks.org/logistic-regression-using-statsmodels/
I think if you can get corresponding binary outcomes of your two modules, and the accuracy of your two models is close, then you do not need to worry much about the flipped coefficient. Basically, my idea is if you can get accurate predictions(1 or 0) both on your two methods, then everything is fine. Hope my answer is helpful to you!
logistic回归的算法在StatsModels和Sklearn中是不同的。 Sklearn默认使用L2惩罚,这意味着该系数的损失函数具有二次术语,以使其尽可能接近零。
关于您的系数为何会翻转,当您将目标变量的编码倒置时,这可能会发生。
例如,如果对Statmodels模型进行了训练,而0是一个失误,而1个目标是一个目标,那么对Sklearn模型进行了训练,而0是一个目标,而1个是失败的。
老实说,从您给我们的信息中很难看出。但是,关于您发布的代码,请记住以下几件事:
m1
正在使用您发布的代码中不存在的对象进行培训(x_train
和<代码> y_train 未声明)。x
,x_train
或x_test
)中没有只有一列。y.astype('cattory')
。基本上,请确保
y
,y_train
和y_test
均以相同的方式编码,并设置m1 = m1 = logisticRegress(nunnty = logisty = '无')
。我认为,
距离
的系数是负面的,因为得分的目标应该变得越远,因此距离越远。The algorithms for Logistic Regression are different in statsmodels and sklearn. Sklearn uses L2 penalty by default, which means that the loss function has a quadratic term for the coefficients to drive them as close to zero as possible.
Regarding why your coefficients have flipped, this can happen when you invert the encoding of your target variable.
For example, if the statmodels model was trained with 0 being a miss and 1 being a goal, and the sklearn model being trained with 0 being a goal and 1 being a miss.
To be honest, it's hard to tell from the info you have given us. However, here's a couple of things to keep in mind regarding the code you posted:
m1
is being trained with objects that do not exist in the code you posted (X_train
andy_train
are not declared).X
,X_train
orX_test
).y.astype('category')
.Basically, make sure that
y
,y_train
andy_test
are encoded the same way for both models and setm1 = LogisticRegression(penalty='none')
.In my opinion, it makes sense that the coefficient for
Distance
is negative because scoring a goal should become less likely the farther away you are.