从StatsModels获得OLS拟合的预测

发布于 2025-02-04 15:48:38 字数 761 浏览 3 评论 0原文

我试图从下面的 ols 拟合中进行样本预测，

import numpy as np
import pandas as pd
import statsmodels.api as sm

macrodata = sm.datasets.macrodata.load_pandas().data
macrodata.index = pd.period_range('1959Q1', '2009Q3', freq='Q')
mod = sm.OLS(macrodata['realgdp'], sm.add_constant(macrodata[['realdpi', 'realinv', 'tbilrate', 'unemp']])).fit()
mod.get_prediction(sm.add_constant(macrodata[['realdpi', 'realinv', 'tbilrate', 'unemp']])).summary_frame(0.95).head()

这很好。但是，如果我在mod.get_prediction中更改回归器的位置，我会得到不同的估计，

mod.get_prediction(sm.add_constant(macrodata[['tbilrate', 'unemp', 'realdpi', 'realinv']])).summary_frame(0.95).head()

这是令人惊讶的。 mod.get_prediction无法根据列名识别回归剂吗？

原文

I am trying to get in sample predictions from an OLS fit as below,

import numpy as np
import pandas as pd
import statsmodels.api as sm

macrodata = sm.datasets.macrodata.load_pandas().data
macrodata.index = pd.period_range('1959Q1', '2009Q3', freq='Q')
mod = sm.OLS(macrodata['realgdp'], sm.add_constant(macrodata[['realdpi', 'realinv', 'tbilrate', 'unemp']])).fit()
mod.get_prediction(sm.add_constant(macrodata[['realdpi', 'realinv', 'tbilrate', 'unemp']])).summary_frame(0.95).head()

This is fine. But if I alter the positions of regressors in mod.get_prediction, I get different estimates,

mod.get_prediction(sm.add_constant(macrodata[['tbilrate', 'unemp', 'realdpi', 'realinv']])).summary_frame(0.95).head()

This is surprising. Can't mod.get_prediction identify the regressors based on column names?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

长发绾君心 2025-02-11 15:48:38

如注释中指出的那样，sm.ols将您的数据框架转换为拟合的数组，并且同样进行预测，它希望预测变量的顺序相同。

如果您希望使用列名，可以使用公式接口，请参见有关更多详细信息。下面我应用您的示例：

import statsmodels.api as sm
import statsmodels.formula.api as smf

macrodata = sm.datasets.macrodata.load_pandas().data
mod = smf.ols(formula='realgdp ~ realdpi + realinv + tbilrate + unemp', data=macrodata)
res = mod.fit()

按照提供的顺序：

res.get_prediction(macrodata[['realdpi', 'realinv', 'tbilrate', 'unemp']]).summary_frame(0.95).head()

          mean    mean_se  mean_ci_lower  mean_ci_upper  obs_ci_lower  obs_ci_upper
0  2716.423418  14.608110    2715.506229    2717.340607   2710.782460   2722.064376
1  2802.820840  13.714821    2801.959737    2803.681943   2797.188729   2808.452951
2  2781.041564  12.615903    2780.249458    2781.833670   2775.419588   2786.663539
3  2786.894138  12.387428    2786.116377    2787.671899   2781.274166   2792.514110
4  2848.982580  13.394688    2848.141577    2849.823583   2843.353507   2854.611653

如果我们翻转列，则结果相同：

res.get_prediction(macrodata[['tbilrate', 'unemp', 'realdpi', 'realinv']]).summary_frame(0.95).head()

          mean    mean_se  mean_ci_lower  mean_ci_upper  obs_ci_lower  obs_ci_upper
0  2716.423418  14.608110    2715.506229    2717.340607   2710.782460   2722.064376
1  2802.820840  13.714821    2801.959737    2803.681943   2797.188729   2808.452951
2  2781.041564  12.615903    2780.249458    2781.833670   2775.419588   2786.663539
3  2786.894138  12.387428    2786.116377    2787.671899   2781.274166   2792.514110
4  2848.982580  13.394688    2848.141577    2849.823583   2843.353507   2854.611653

As noted in the comments, sm.OLS will convert your data frame into an array for fitting, and likewise for prediction, it expects the predictors to be in the same order.

If you would like the column names to be used, you can use the formula interface, see the documentation for more details. Below I apply your example :

import statsmodels.api as sm
import statsmodels.formula.api as smf

macrodata = sm.datasets.macrodata.load_pandas().data
mod = smf.ols(formula='realgdp ~ realdpi + realinv + tbilrate + unemp', data=macrodata)
res = mod.fit()

In the order provided :

res.get_prediction(macrodata[['realdpi', 'realinv', 'tbilrate', 'unemp']]).summary_frame(0.95).head()

          mean    mean_se  mean_ci_lower  mean_ci_upper  obs_ci_lower  obs_ci_upper
0  2716.423418  14.608110    2715.506229    2717.340607   2710.782460   2722.064376
1  2802.820840  13.714821    2801.959737    2803.681943   2797.188729   2808.452951
2  2781.041564  12.615903    2780.249458    2781.833670   2775.419588   2786.663539
3  2786.894138  12.387428    2786.116377    2787.671899   2781.274166   2792.514110
4  2848.982580  13.394688    2848.141577    2849.823583   2843.353507   2854.611653

Results are the same if we flip the columns:

res.get_prediction(macrodata[['tbilrate', 'unemp', 'realdpi', 'realinv']]).summary_frame(0.95).head()

          mean    mean_se  mean_ci_lower  mean_ci_upper  obs_ci_lower  obs_ci_upper
0  2716.423418  14.608110    2715.506229    2717.340607   2710.782460   2722.064376
1  2802.820840  13.714821    2801.959737    2803.681943   2797.188729   2808.452951
2  2781.041564  12.615903    2780.249458    2781.833670   2775.419588   2786.663539
3  2786.894138  12.387428    2786.116377    2787.671899   2781.274166   2792.514110
4  2848.982580  13.394688    2848.141577    2849.823583   2843.353507   2854.611653

回复收藏 0 原文

~没有更多了~