使用 rpy2 从回归中获取标准误差

发布于 2024-12-27 02:16:14 字数 1224 浏览 5 评论 0原文

我正在使用 rpy2 进行回归。返回的对象有一个列表，其中包括系数、残差、拟合值、拟合模型的排名等。）

但是我在拟合对象中找不到标准误差（也找不到 R^2）。在 R 中直接运行 lm 模型，使用摘要命令显示标准错误，但我无法直接在模型的数据框中访问它们。

我如何使用 rpy2 提取此信息？

示例 python 代码是

from scipy import random
from numpy import hstack, array, matrix
from rpy2 import robjects 
from rpy2.robjects.packages import importr

def test_regress():
    stats=importr('stats')
    x=random.uniform(0,1,100).reshape([100,1])
    y=1+x+random.uniform(0,1,100).reshape([100,1])
    x_in_r=create_r_matrix(x, x.shape[1])
    y_in_r=create_r_matrix(y, y.shape[1])
    formula=robjects.Formula('y~x')
    env = formula.environment
    env['x']=x_in_r
    env['y']=y_in_r
    fit=stats.lm(formula)
    coeffs=array(fit[0])
    resids=array(fit[1])
    fitted_vals=array(fit[4])
    return(coeffs, resids, fitted_vals) 

def create_r_matrix(py_array, ncols):
    if type(py_array)==type(matrix([1])) or type(py_array)==type(array([1])):
        py_array=py_array.tolist()
    r_vector=robjects.FloatVector(flatten_list(py_array))
    r_matrix=robjects.r['matrix'](r_vector, ncol=ncols)
    return r_matrix

def flatten_list(source):
    return([item for sublist in source for item in sublist])

test_regress()

原文

I am using rpy2 for regressions. The returned object has a list that includes coefficients, residuals, fitted values, rank of the fitted model, etc.)

However I can't find the standard errors (nor the R^2) in the fit object. Running lm directly model in R, standard errors are displayed with the summary command, but I can't access them directly in the model's data frame.

How can I get extract this info using rpy2?

Sample python code is

from scipy import random
from numpy import hstack, array, matrix
from rpy2 import robjects 
from rpy2.robjects.packages import importr

def test_regress():
    stats=importr('stats')
    x=random.uniform(0,1,100).reshape([100,1])
    y=1+x+random.uniform(0,1,100).reshape([100,1])
    x_in_r=create_r_matrix(x, x.shape[1])
    y_in_r=create_r_matrix(y, y.shape[1])
    formula=robjects.Formula('y~x')
    env = formula.environment
    env['x']=x_in_r
    env['y']=y_in_r
    fit=stats.lm(formula)
    coeffs=array(fit[0])
    resids=array(fit[1])
    fitted_vals=array(fit[4])
    return(coeffs, resids, fitted_vals) 

def create_r_matrix(py_array, ncols):
    if type(py_array)==type(matrix([1])) or type(py_array)==type(array([1])):
        py_array=py_array.tolist()
    r_vector=robjects.FloatVector(flatten_list(py_array))
    r_matrix=robjects.r['matrix'](r_vector, ncol=ncols)
    return r_matrix

def flatten_list(source):
    return([item for sublist in source for item in sublist])

test_regress()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最偏执的依靠 2025-01-03 02:16:14

所以这似乎对我有用：

def test_regress():
    stats=importr('stats')
    x=random.uniform(0,1,100).reshape([100,1])
    y=1+x+random.uniform(0,1,100).reshape([100,1])
    x_in_r=create_r_matrix(x, x.shape[1])
    y_in_r=create_r_matrix(y, y.shape[1])
    formula=robjects.Formula('y~x')
    env = formula.environment
    env['x']=x_in_r
    env['y']=y_in_r
    fit=stats.lm(formula)
    coeffs=array(fit[0])
    resids=array(fit[1])
    fitted_vals=array(fit[4])
    modsum = base.summary(fit)
    rsquared = array(modsum[7])
    se = array(modsum.rx2('coefficients')[2:4])
    return(coeffs, resids, fitted_vals, rsquared, se)

尽管正如我所说，这实际上是我第一次尝试 RPy2，所以可能有更好的方法来做到这一点。但此版本似乎输出包含 R 平方值以及标准误差的数组。

您可以使用 print(modsum.names) 查看 R 对象的组件名称（类似于 R 中的 names(modsum)），然后 .rx 和 .rx2 相当于 R 中的 [ 和 [[。

So this seems to work for me:

def test_regress():
    stats=importr('stats')
    x=random.uniform(0,1,100).reshape([100,1])
    y=1+x+random.uniform(0,1,100).reshape([100,1])
    x_in_r=create_r_matrix(x, x.shape[1])
    y_in_r=create_r_matrix(y, y.shape[1])
    formula=robjects.Formula('y~x')
    env = formula.environment
    env['x']=x_in_r
    env['y']=y_in_r
    fit=stats.lm(formula)
    coeffs=array(fit[0])
    resids=array(fit[1])
    fitted_vals=array(fit[4])
    modsum = base.summary(fit)
    rsquared = array(modsum[7])
    se = array(modsum.rx2('coefficients')[2:4])
    return(coeffs, resids, fitted_vals, rsquared, se)

Although, as I said, this is literally my first foray into RPy2, so there may be a better way to do that. But this version appears to output arrays containing the R-squared value along with the standard errors.

You can use print(modsum.names) to see the names of the components of the R object (kind of like names(modsum) in R) and then .rx and .rx2 are the equivalent of [ and [[ in R.

回复收藏 0 原文

夕色琉璃 2025-01-03 02:16:14

@joran：非常好。我想说这几乎就是做到这一点的方法。

from rpy2 import robjects 
from rpy2.robjects.packages import importr

base = importr('base')
stats = importr('stats') # import only once !

def test_regress():
    x = base.matrix(stats.runif(100), nrow = 100)
    y = (x.ro + base.matrix(stats.runif(100), nrow = 100)).ro + 1 # not so nice
    formula = robjects.Formula('y~x')
    env = formula.environment
    env['x'] = x
    env['y'] = y
    fit = stats.lm(formula)
    coefs = stats.coef(fit)
    resids = stats.residuals(fit)    
    fitted_vals = stats.fitted(fit)
    modsum = base.summary(fit)
    rsquared = modsum.rx2('r.squared')
    se = modsum.rx2('coefficients')[2:4]
    return (coefs, resids, fitted_vals, rsquared, se)

@joran: Pretty good. I'd say that it is pretty much the way to do it.

from rpy2 import robjects 
from rpy2.robjects.packages import importr

base = importr('base')
stats = importr('stats') # import only once !

def test_regress():
    x = base.matrix(stats.runif(100), nrow = 100)
    y = (x.ro + base.matrix(stats.runif(100), nrow = 100)).ro + 1 # not so nice
    formula = robjects.Formula('y~x')
    env = formula.environment
    env['x'] = x
    env['y'] = y
    fit = stats.lm(formula)
    coefs = stats.coef(fit)
    resids = stats.residuals(fit)    
    fitted_vals = stats.fitted(fit)
    modsum = base.summary(fit)
    rsquared = modsum.rx2('r.squared')
    se = modsum.rx2('coefficients')[2:4]
    return (coefs, resids, fitted_vals, rsquared, se)

回复收藏 0 原文

~没有更多了~