S3和课程顺序

发布于 2024-11-16 22:42:21 字数 2509 浏览 1 评论 0原文

我一直很难理解有关如何调用 S3 方法的文档,而这一次它又让我烦恼了。

对于提出多个问题,我首先表示歉意,但它们都是密切相关的。在一组复杂函数的核心深处,我创建了许多 glmnet 拟合,特别是逻辑拟合。现在,glmnet 文档指定其返回值具有类“glmnet”和(对于逻辑回归)“lognet”。事实上,这些都是按照这个顺序指定的。

但是,查看 glmnet 实现的末尾,即在调用(内部函数)lognet 之后,设置了 fit 的类> 到“lognet”,我在返回(变量 fit)之前看到这行代码:

class(fit) = c(class(fit), "glmnet")

由此,我得出结论,类的顺序实际上是“lognet”,“ GLM网”。

不幸的是,我的配合(就像文档所建议的那样):

> class(myfit)
[1] "glmnet" "lognet"

问题在于为其调度 S3 方法的方式,特别是预测。这是 predict.lognet 的代码:

function (object, newx, s = NULL, type = c("link", "response", 
    "coefficients", "class", "nonzero"), exact = FALSE, offset, 
    ...) 
{
    type = match.arg(type)
    nfit = NextMethod("predict") #<- supposed to call predict.glmnet, I think
    switch(type, response = {
        pp = exp(-nfit)
        1/(1 + pp)
    }, class = ifelse(nfit > 0, 2, 1), nfit)
}

我添加了一条注释来解释我的推理。现在,当我使用新的数据矩阵 mydatatype="response" 对此 myfit 进行预测时,如下所示:

predict(myfit, newx=mydata, type="response")

,我不这样做,因为根据文档,获取预测概率,但是线性组合,这正是立即调用 predict.glmnet 的结果。

我尝试颠倒类的顺序,如下所示:

orgclass<-class(myfit)
class(myfit)<-rev(orgclass)

然后再次进行预测调用:你瞧:它有效!我确实得到了概率。

那么,这里有一些问题:

  1. 我“了解到”这一点是否正确? S3方法按顺序调度 班级的外观?
  2. 我假设代码正确吗 glmnet会导致错误的顺序 为了正确调度 预测
  3. 在我的代码中没有什么 操纵类 据我所知,明确地/明显地。 什么可能导致订单 改变?

为了完整起见:这里有一些示例代码可以使用(就像我现在自己做的那样):

library(glmnet)
y<-factor(sample(2, 100, replace=TRUE))
xs<-matrix(runif(100), ncol=1)
colnames(xs)<-"x"
myfit<-glmnet(xs, y, family="binomial")
mydata<-matrix(runif(10), ncol=1)
colnames(mydata)<-"x"
class(myfit)
predict(myfit, newx=mydata, type="response")
class(myfit)<-rev(class(myfit))
class(myfit)
predict(myfit, newx=mydata, type="response")
class(myfit)<-rev(class(myfit))#set it back
class(myfit)

根据生成的数据,差异或多或少是明显的(在我的真实数据集中,我注意到所谓的概率中存在负值) ,这就是我发现问题的方式),但您确实应该看到差异。

感谢您的任何意见。

编辑

我刚刚发现了可怕的事实:任一顺序在 glmnet 1.5.2 中都有效(它存在于我运行实际代码的服务器上,导致与类顺序相反),但 1.6 版本的代码要求顺序为“lognet”、“glmnet”。我还没有检查 1.7 中发生了什么。

感谢@Aaron 提醒我信息学的基础知识(除了“如果一切都失败,请重新启动”:“检查您的版本”)。我错误地认为统计学习之神的软件包将受到保护,免受此类错误的影响),并感谢 @Gavin 确认我对 S3 工作原理的重建。

I've alway had trouble understanding the the documentation on how S3 methods are called, and this time it's biting me back.

I'll apologize up front for asking more than one question, but they are all closely related. Deep in the heart of a complex set of functions, I create a lot of glmnet fits, in particular logistic ones. Now, glmnet documentation specifies its return value to have both classes "glmnet" and (for logistic regression) "lognet". In fact, these are specified in this order.

However, looking at the end of the implementation of glmnet, righter after the call to (the internal function) lognet, that sets the class of fit to "lognet", I see this line of code just before the return (of the variable fit):

class(fit) = c(class(fit), "glmnet")

From this, I would conclude that the order of the classes is in fact "lognet", "glmnet".

Unfortunately, the fit I had, had (like the doc suggests):

> class(myfit)
[1] "glmnet" "lognet"

The problem with this is the way S3 methods are dispatched for it, in particular predict. Here's the code for predict.lognet:

function (object, newx, s = NULL, type = c("link", "response", 
    "coefficients", "class", "nonzero"), exact = FALSE, offset, 
    ...) 
{
    type = match.arg(type)
    nfit = NextMethod("predict") #<- supposed to call predict.glmnet, I think
    switch(type, response = {
        pp = exp(-nfit)
        1/(1 + pp)
    }, class = ifelse(nfit > 0, 2, 1), nfit)
}

I've added a comment to explain my reasoning. Now when I call predict on this myfit with a new datamatrix mydata and type="response", like this:

predict(myfit, newx=mydata, type="response")

, I do not, as per the documentation, get the predicted probabilities, but the linear combinations, which is exactly the result of calling predict.glmnet immediately.

I've tried reversing the order of the classes, like so:

orgclass<-class(myfit)
class(myfit)<-rev(orgclass)

And then doing the predict call again: lo and behold: it works! I do get the probabilities.

So, here come some questions:

  1. Am I right in 'having learned' that
    S3 methods are dispatched in order
    of appearance of the classes?
  2. Am I right in assuming the code in
    glmnetwould cause the wrong order
    for correct dispatching of
    predict?
  3. In my code there is nothing that
    manipulates classes
    explicitly/visibly to my knowledge.
    What could cause the order to
    change?

For completeness' sake: here's some sample code to play around with (as I'm doing myself now):

library(glmnet)
y<-factor(sample(2, 100, replace=TRUE))
xs<-matrix(runif(100), ncol=1)
colnames(xs)<-"x"
myfit<-glmnet(xs, y, family="binomial")
mydata<-matrix(runif(10), ncol=1)
colnames(mydata)<-"x"
class(myfit)
predict(myfit, newx=mydata, type="response")
class(myfit)<-rev(class(myfit))
class(myfit)
predict(myfit, newx=mydata, type="response")
class(myfit)<-rev(class(myfit))#set it back
class(myfit)

Depending on the data generated, the difference is more or less obvious (in my true dataset I noticed negative values in the so called probabilities, which is how I picked up the problem), but you should indeed see a difference.

Thanks for any input.

Edit:

I just found out the horrible truth: either order worked in glmnet 1.5.2 (which is present on the server where I ran the actual code, resulting in the fit with the class order reversed), but the code from 1.6 requires the order to be "lognet", "glmnet". I have yet to check what happens in 1.7.

Thanks to @Aaron for reminding me of the basics of informatics (besides 'if all else fails, restart': 'check your versions'). I had mistakenly assumed that a package by the gods of statistical learning would be protected from this type of error), and to @Gavin for confirming my reconstruction of how S3 works.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

冷心人i 2024-11-23 22:42:21

是的,调度的顺序就是类在类属性中列出的顺序。在简单的日常情况下,是的,第一个声明的类是方法分派首先选择的类,并且只有当它无法找到该类的方法(或调用 NextMethod)时才会它会转到第二类,或者搜索 default 方法失败。

不,我不认为你是对的,代码中类的顺序是错误的。该文档似乎有误。其目的显然是首先调用 predict.lognet(),使用主力 predict.glmnet() 对安装的所有类型的套索/弹性网络模型进行基本计算通过glmnet,最后对这些一般预测进行一些后处理。 predict.glmnet() 不是从 glmnet 命名空间导出的,而其他方法则可能也能说明问题。

我不确定您为什么认为此输出:

predict(myfit, newx=mydata, type="response")

是错误的?我得到一个 10 行 21 列的矩阵,其中的列与仅截距模型预测以及 20 个 lambda 值的预测相关,在该值下计算了沿套索/弹性网络路径的模型系数。这些似乎不是线性组合,而是您要求的响应尺度之一。

课程顺序没有改变。我认为您误解了代码应该如何工作。文档中存在错误,因为其中的顺序说明错误。但代码正在按照我的想法工作。

Yes, the order of dispatch is in the order in which the classes are listed in the class attribute. In the simple, every-day case, yes, the first stated class is the one chosen first by method dispatch, and only if it fails to find a method for that class (or NextMethod is called) will it move on to the second class, or failing that search for a default method.

No, I do not think you are right that the order of the classes is wrong in the code. The documentation appears wrong. The intent is clearly to call predict.lognet() first, use the workhorse predict.glmnet() to do the basic computations for all types of lasso/elastic net models fitted by glmnet, and finally do some post processing of those general predictions. That predict.glmnet() is not exported from the glmnet NAMESPACE whilst the other methods are is perhaps telling, also.

I'm not sure why you think the output from this:

predict(myfit, newx=mydata, type="response")

is wrong? I get a matrix of 10 rows and 21 columns, with the columns relating to the intercept-only model prediction plus predictions at 20 values of lambda at which model coefficients along the lasso/elastic net path have been computed. These do not seem to be linear combinations and are one the response scale as you requested.

The order of the classes is not changing. I think you are misunderstanding how the code is supposed to work. There is a bug in the documentation, as the ordering is stated wrong there. But the code is working as I think it should.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文