R 中是否有具有超过 3 个参数预测变量的 loess 实现或达到类似效果的技巧?
召集所有局部回归和/或R专家!
我遇到了 R 中标准 loess 函数的限制,希望您能提供一些建议。当前的实现仅支持 1-4 个预测器。让我列出我们的应用场景,以说明为什么一旦我们想要使用全局拟合参数协变量,这很容易成为一个问题。
本质上,我们有一个空间扭曲s(x,y)覆盖在许多测量z上:
z_i = s(x_i,y_i) + v_{g_i}
这些测量z可以按每个组g具有相同的基础未失真测量值v。每个测量的组成员资格 g_i 是已知的,但组的基本未失真测量值 v_g 未知,应通过(全局,而不是局部)回归来确定。
我们需要估计二维空间趋势 s(x,y),然后将其删除。在我们的应用程序中,假设在最简单的情况下有 20 组,每组至少 35 个测量值。测量值是随机放置的。以第一组为参考,因此有 19 个未知偏移。
下面的玩具数据代码(具有一维空间趋势x)适用于两个或三个偏移组。
不幸的是,对于四个或更多偏移组,loess 调用失败,并显示错误消息
Error in simpleLoess(y, x, w, span, degree, parametric, drop.square,
normalize, :
only 1-4 predictors are allowed"
我尝试覆盖限制并得到
k>d2MAX in ehg136. Need to recompile with increased dimensions.
How easy will that be to do?我在任何地方都找不到 d2MAX 的定义,而且这似乎可能是硬编码的 - 该错误显然是由 loessf.f 中的第 #1359 行触发的
if(k .gt. 15) call ehg182(105)
或者,有人知道吗可以应用于此处的具有全局(参数)偏移组的局部回归的实现?
或者有更好的方法来处理这个问题吗?我尝试使用相关结构lme,但这似乎慢得多。
任何意见将不胜感激!
非常感谢,
大卫
###
#
# loess with parametric offsets - toy data demo
#
x<-seq(0,9,.1);
x.N<-length(x);
o<-c(0.4,-0.8,1.2#,-0.2 # works for three but not four
); # these are the (unknown) offsets
o.N<-length(o);
f<-sapply(seq(o.N),
function(n){
ifelse((seq(x.N)<= n *x.N/(o.N+1) &
seq(x.N)> (n-1)*x.N/(o.N+1)),
1,0);
});
f<-f[sample(NROW(f)),];
y<-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs<-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')});
s<-paste(c('y~x',s.fs),collapse='+');
d<-data.frame(x,y,f)
names(d)<-c('x','y',s.fs);
l<-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,
span=0.4);
yp<-predict(l,newdata=d);
plot(x,y,pch='+',ylim=c(-3,3),col='red'); # input data
points(x,yp,pch='o',col='blue'); # fit of that
d0<-d; d0$f1<-d0$f2<-d0$f3<-0;
yp0<-predict(l,newdata=d0);
points(x,y-f%*%o); # spatial distortion
lines(x,yp0,pch='+'); # estimate of that
op<-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]});
cat("Demo offsets:",o,"\n");
cat("Estimated offsets:",format(op,digits=1),"\n");
Calling all experts on local regression and/or R!
I have run into a limitation of the standard loess function in R and hope you have some advice. The current implementation supports only 1-4 predictors. Let me set out our application scenario to show why this can easily become a problem as soon as we want to employ globally fit parametric covariables.
Essentially, we have a spatial distortion s(x,y) overlaid over a number of measurements z:
z_i = s(x_i,y_i) + v_{g_i}
These measurements z can be grouped by the same underlying undistorted measurement value v for each group g. The group membership g_i is known for each measurement, but the underlying undistorted measurement values v_g for the groups are not known and should be determined by (global, not local) regression.
We need to estimate the two-dimensional spatial trend s(x,y), which we then want to remove. In our application, say there are 20 groups of at least 35 measurements each, in the most simple scenario. The measurements are randomly placed. Taking the first group as reference, there are thus 19 unknown offsets.
The below code for toy data (with a spatial trend in one dimension x) works for two or three offset groups.
Unfortunately, the loess call fails for four or more offset groups with the error message
Error in simpleLoess(y, x, w, span, degree, parametric, drop.square,
normalize, :
only 1-4 predictors are allowed"
I tried overriding the restriction and got
k>d2MAX in ehg136. Need to recompile with increased dimensions.
How easy would that be to do? I cannot find a definition of d2MAX anywhere, and it seems this might be hardcoded -- the error is apparently triggered by line #1359 in loessf.f
if(k .gt. 15) call ehg182(105)
Alternatively, does anyone know of an implementation of local regression with global (parametric) offset groups that could be applied here?
Or is there a better way of dealing with this? I tried lme with correlation structures but that seems to be much, much slower.
Any comments would be greatly appreciated!
Many thanks,
David
###
#
# loess with parametric offsets - toy data demo
#
x<-seq(0,9,.1);
x.N<-length(x);
o<-c(0.4,-0.8,1.2#,-0.2 # works for three but not four
); # these are the (unknown) offsets
o.N<-length(o);
f<-sapply(seq(o.N),
function(n){
ifelse((seq(x.N)<= n *x.N/(o.N+1) &
seq(x.N)> (n-1)*x.N/(o.N+1)),
1,0);
});
f<-f[sample(NROW(f)),];
y<-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs<-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')});
s<-paste(c('y~x',s.fs),collapse='+');
d<-data.frame(x,y,f)
names(d)<-c('x','y',s.fs);
l<-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,
span=0.4);
yp<-predict(l,newdata=d);
plot(x,y,pch='+',ylim=c(-3,3),col='red'); # input data
points(x,yp,pch='o',col='blue'); # fit of that
d0<-d; d0$f1<-d0$f2<-d0$f3<-0;
yp0<-predict(l,newdata=d0);
points(x,y-f%*%o); # spatial distortion
lines(x,yp0,pch='+'); # estimate of that
op<-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]});
cat("Demo offsets:",o,"\n");
cat("Estimated offsets:",format(op,digits=1),"\n");
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为什么不使用附加模型呢?包 mgcv 将处理这种模型,如果我理解你的问题,那就好了。我可能有这个错误,但是您显示的代码与 x ~ y 相关,但您的问题提到了 z ~ s(x, y) + g。下面我展示的
gam()
是通过x
和y
中的空间平滑建模的响应z
,其中g
被参数化估计,g
作为一个因子存储在数据框中:或者我误解了你想要的东西?如果您想发布一小段数据,我可以使用 mgcv 给出一个正确的示例...?
Why don't you use an additive model for this? Package mgcv will handle this sort of model, if I understand your Question, just fine. I might have this wrong, but the code you show is relating x ~ y, but your Question mentions z ~ s(x, y) + g. What I show below for
gam()
is for responsez
modelled by a spatial smooth inx
andy
withg
being estimated parametrically, withg
stored as a factor in the data frame:Or have I misunderstood what you wanted? If you want to post a small snippet of data I can give a proper example using mgcv...?