如何将 spline() 应用于大型数据框
我是 R 的新手,我正在尝试将 smooth.spline() 应用于大型数据帧。我查看了相关线程(“将 n 个函数的列表应用于数据帧的每一行”,“如何应用样条基础矩阵”,...)。这是我的数据框以及到目前为止我尝试过的:
> dim(mUnique)
[1] 4565 9
> str(mUnique)
'data.frame': 4565 obs. of 9 variables:
$ Group.1: Factor w/ 4565 levels "mal_mito_1","mal_mito_2",..: 1 2 3 4 5 6 7 8 9 10 ...
$ h0 : num 0.18 -0.025 0.212 0.015 0.12 ...
$ h6 : num -0.04 -0.305 -0.188 -0.185 -0.09 ...
$ h12 : num -0.86 -1.1 -1.01 -1.04 -0.91 ...
$ h18 : num -0.73 -1.215 -1.222 -0.355 -0.65 ...
$ h24 : num 0.04 0.025 -0.143 0.295 0.09 ...
$ h30 : num -0.14 1.275 0.732 -0.015 -0.27 ...
$ h36 : num 1.44 1.795 1.627 0.385 0.91 ...
$ h42 : num 1.49 1.385 1.397 0.305 1.12 ...
> head(mUnique)
ID h0 h6 h12 h18 h24 h30 h36 h42
1 mal_mito_1 0.1800 -0.0400 -0.8600 -0.7300 0.0400 -0.1400 1.4400 1.4900
2 mal_mito_2 -0.0250 -0.3050 -1.1050 -1.2150 0.0250 1.2750 1.7950 1.3850
3 mal_mito_3 0.2125 -0.1875 -1.0075 -1.2225 -0.1425 0.7325 1.6275 1.3975
4 mal_rna_10_rRNA 0.0150 -0.1850 -1.0450 -0.3550 0.2950 -0.0150 0.3850 0.3050
5 mal_rna_11_rRNA 0.1200 -0.0900 -0.9100 -0.6500 0.0900 -0.2700 0.9100 1.1200
6 mal_rna_14_rRNA 0.0200 -0.0200 -0.8400 -0.6600 0.1700 -0.0900 0.6200 0.0800
我可以在每一行上独立应用 smooth.spline
,并且到目前为止 spline()
看起来不错(我想要48 分。稍后我会弄清楚如何使用 smoooth.spline
spar
):
> time <- c(0,6,12,18,24,30,36,42)
> plot(time, mUnique[1, 2:9])
> smooth <- smooth.spline(time, mUnique[1, 2:9])
> lines(smooth, col="blue")
> splin <-spline(time, mUnique[1, 2:9], n=48)
> lines(splin, col="blue")
我想这是基本问题,但如何应用 smooth .spline()
或 spline()
到整个数据帧,并返回一个矩阵 4565 * 49,其中我有平滑样条曲线每个结的坐标?我真的不关心绘制这些数据。
我尝试过:
> smooth <- smooth.spline(time, mUnique[, 2:9]|factor(ID))
现在,不知道该怎么办。这是制作循环的问题吗?
先感谢您
I am a newbie to R and I am trying to apply smooth.spline()
to a large dataframe. I've looked at the related threads ("Apply a list of n functions to each row of a dataframe,", "How to apply a spline basis matrix",...). Here is my dataframe and what I've tried so far:
> dim(mUnique)
[1] 4565 9
> str(mUnique)
'data.frame': 4565 obs. of 9 variables:
$ Group.1: Factor w/ 4565 levels "mal_mito_1","mal_mito_2",..: 1 2 3 4 5 6 7 8 9 10 ...
$ h0 : num 0.18 -0.025 0.212 0.015 0.12 ...
$ h6 : num -0.04 -0.305 -0.188 -0.185 -0.09 ...
$ h12 : num -0.86 -1.1 -1.01 -1.04 -0.91 ...
$ h18 : num -0.73 -1.215 -1.222 -0.355 -0.65 ...
$ h24 : num 0.04 0.025 -0.143 0.295 0.09 ...
$ h30 : num -0.14 1.275 0.732 -0.015 -0.27 ...
$ h36 : num 1.44 1.795 1.627 0.385 0.91 ...
$ h42 : num 1.49 1.385 1.397 0.305 1.12 ...
> head(mUnique)
ID h0 h6 h12 h18 h24 h30 h36 h42
1 mal_mito_1 0.1800 -0.0400 -0.8600 -0.7300 0.0400 -0.1400 1.4400 1.4900
2 mal_mito_2 -0.0250 -0.3050 -1.1050 -1.2150 0.0250 1.2750 1.7950 1.3850
3 mal_mito_3 0.2125 -0.1875 -1.0075 -1.2225 -0.1425 0.7325 1.6275 1.3975
4 mal_rna_10_rRNA 0.0150 -0.1850 -1.0450 -0.3550 0.2950 -0.0150 0.3850 0.3050
5 mal_rna_11_rRNA 0.1200 -0.0900 -0.9100 -0.6500 0.0900 -0.2700 0.9100 1.1200
6 mal_rna_14_rRNA 0.0200 -0.0200 -0.8400 -0.6600 0.1700 -0.0900 0.6200 0.0800
I can apply smooth.spline
on each row independently and it looks good with spline()
so far (I want 48 points. I'll figure out later how to do it with smoooth.spline
spar
):
> time <- c(0,6,12,18,24,30,36,42)
> plot(time, mUnique[1, 2:9])
> smooth <- smooth.spline(time, mUnique[1, 2:9])
> lines(smooth, col="blue")
> splin <-spline(time, mUnique[1, 2:9], n=48)
> lines(splin, col="blue")
My question is I suppose basic, but how to I apply smooth.spline()
or spline()
to the whole dataframe, and get back a matrix 4565 * 49 where I have the coordinates for each knots of the smoothed spline? I don't really care about plotting that data.
I tried:
> smooth <- smooth.spline(time, mUnique[, 2:9]|factor(ID))
Now, don't know what to do. Is that a matter of making loops?
Thank you in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是您要找的吗?
它应该为您提供您所描述的矩阵。
额外说明:
我删除了第一列 (
mUnique[-1]
)。这是列表方式,您也可以使用mUnique[,-1]
,这相当于矩阵。两者都适用于数据框。然后我告诉 apply 将函数应用到行上,这是第一个边距。
我定义的函数
是一个两线函数:
seq(min(time),max(time),length.out=49)< /code>),并获取该预测的 y 值。
此函数定义中的 x 是传递的参数。在本例中,它代表 apply 函数传递的一行。
最后,我转置矩阵 (
t
) 以使其达到您请求的格式。该代码与以下测试用例完美运行:
确保在运行我的代码之前定义
时间
...Is this what you're looking for?
It should give you the matrix as you described.
Extra explanation :
I drop the first column (
mUnique[-1]
). This is the list way of doing it, you can also domUnique[,-1]
, which is the matrix equivalent. Both work for dataframes.Then I tell apply to apply the function over the rows, which is the first margin.
The function I define,
is a two-liner :
seq(min(time),max(time),length.out=49)
), and take the y values of that prediction.The x in this function definition is the argument that is passed. In this case it represents one row that is passed by the apply function.
Finally, I transpose the matrix (
t
) to get it in the format you requested.The code runs perfectly with the following testcase :
Make sure you define
time
before running my code...使用对象
dat
中的数据片段,我们可以做(我认为)您想要的事情。首先,我们编写一个通过smooth.spline()
拟合平滑样条线的小包装函数,然后预测该样条线对一组n
位置的响应。您要求n = 48
,因此我们将使用它作为默认值。这是一个这样的包装函数:
我们检查这是否适用于数据的第一行:
它给出:
这似乎有效,所以现在我们可以将函数应用于数据集,仅保留
SSpline()
返回的对象的$y
组件。为此,我们使用apply()
:现在
res2
包含 48 行和 6 列,这 6 列指的是此处使用的dat
的每一行。如果您想要相反,只需转置res2
:t(res2)
。我们可以通过一个简单的
matplot()
调用看到已经完成的操作:它会生成:
Using your snippet of data in object
dat
, we can do what (I think) you want. First we write a little wrapper function that fits a smoothing spline viasmooth.spline()
, and then predicts the response from this spline for a set ofn
locations. You ask forn = 48
so we'll use that as the default.Here is one such wrapper function:
We check this works for the first row of your data:
which gives:
That seems to work, so now we can apply the function over the set of data, keep only the
$y
component of the object returned bySSpline()
. For that we useapply()
:Now
res2
contains 48 rows and 6 columns, the 6 columns refer to each row ofdat
used here. If you want it the other way round, just transposeres2
:t(res2)
.We can see what has been done via a simple
matplot()
call:which produces: