如何将 spline() 应用于大型数据框

发布于 2024-10-27 19:29:12 字数 2065 浏览 2 评论 0原文

我是 R 的新手,我正在尝试将 smooth.spline() 应用于大型数据帧。我查看了相关线程(“将 n 个函数的列表应用于数据帧的每一行”,“如何应用样条基础矩阵”,...)。这是我的数据框以及到目前为止我尝试过的:

> dim(mUnique)  
[1] 4565    9  
> str(mUnique)  
'data.frame':   4565 obs. of  9 variables:  
 $ Group.1: Factor w/ 4565 levels "mal_mito_1","mal_mito_2",..: 1 2 3 4 5 6 7 8 9 10 ...  
 $ h0     : num  0.18 -0.025 0.212 0.015 0.12 ...  
 $ h6     : num  -0.04 -0.305 -0.188 -0.185 -0.09 ...  
 $ h12    : num  -0.86 -1.1 -1.01 -1.04 -0.91 ...  
 $ h18    : num  -0.73 -1.215 -1.222 -0.355 -0.65 ...  
 $ h24    : num  0.04 0.025 -0.143 0.295 0.09 ...  
 $ h30    : num  -0.14 1.275 0.732 -0.015 -0.27 ...  
 $ h36    : num  1.44 1.795 1.627 0.385 0.91 ...  
 $ h42    : num  1.49 1.385 1.397 0.305 1.12 ...  

> head(mUnique)  
          ID      h0      h6     h12     h18     h24     h30    h36    h42  
1      mal_mito_1  0.1800 -0.0400 -0.8600 -0.7300  0.0400 -0.1400 1.4400 1.4900  
2      mal_mito_2 -0.0250 -0.3050 -1.1050 -1.2150  0.0250  1.2750 1.7950 1.3850  
3      mal_mito_3  0.2125 -0.1875 -1.0075 -1.2225 -0.1425  0.7325 1.6275 1.3975  
4 mal_rna_10_rRNA  0.0150 -0.1850 -1.0450 -0.3550  0.2950 -0.0150 0.3850 0.3050  
5 mal_rna_11_rRNA  0.1200 -0.0900 -0.9100 -0.6500  0.0900 -0.2700 0.9100 1.1200  
6 mal_rna_14_rRNA  0.0200 -0.0200 -0.8400 -0.6600  0.1700 -0.0900 0.6200 0.0800 

我可以在每一行上独立应用 smooth.spline ,并且到目前为止 spline() 看起来不错(我想要48 分。稍后我会弄清楚如何使用 smoooth.spline spar):

> time <- c(0,6,12,18,24,30,36,42)  
> plot(time, mUnique[1, 2:9])  
> smooth <- smooth.spline(time, mUnique[1, 2:9])  
> lines(smooth, col="blue")  
> splin <-spline(time, mUnique[1, 2:9], n=48)  
> lines(splin, col="blue")  

我想这是基本问题,但如何应用 smooth .spline()spline() 到整个数据帧,并返回一个矩阵 4565 * 49,其中我有平滑样条曲线每个结的坐标?我真的不关心绘制这些数据。

我尝试过:

> smooth <- smooth.spline(time, mUnique[, 2:9]|factor(ID))

现在,不知道该怎么办。这是制作循环的问题吗?

先感谢您

I am a newbie to R and I am trying to apply smooth.spline() to a large dataframe. I've looked at the related threads ("Apply a list of n functions to each row of a dataframe,", "How to apply a spline basis matrix",...). Here is my dataframe and what I've tried so far:

> dim(mUnique)  
[1] 4565    9  
> str(mUnique)  
'data.frame':   4565 obs. of  9 variables:  
 $ Group.1: Factor w/ 4565 levels "mal_mito_1","mal_mito_2",..: 1 2 3 4 5 6 7 8 9 10 ...  
 $ h0     : num  0.18 -0.025 0.212 0.015 0.12 ...  
 $ h6     : num  -0.04 -0.305 -0.188 -0.185 -0.09 ...  
 $ h12    : num  -0.86 -1.1 -1.01 -1.04 -0.91 ...  
 $ h18    : num  -0.73 -1.215 -1.222 -0.355 -0.65 ...  
 $ h24    : num  0.04 0.025 -0.143 0.295 0.09 ...  
 $ h30    : num  -0.14 1.275 0.732 -0.015 -0.27 ...  
 $ h36    : num  1.44 1.795 1.627 0.385 0.91 ...  
 $ h42    : num  1.49 1.385 1.397 0.305 1.12 ...  

> head(mUnique)  
          ID      h0      h6     h12     h18     h24     h30    h36    h42  
1      mal_mito_1  0.1800 -0.0400 -0.8600 -0.7300  0.0400 -0.1400 1.4400 1.4900  
2      mal_mito_2 -0.0250 -0.3050 -1.1050 -1.2150  0.0250  1.2750 1.7950 1.3850  
3      mal_mito_3  0.2125 -0.1875 -1.0075 -1.2225 -0.1425  0.7325 1.6275 1.3975  
4 mal_rna_10_rRNA  0.0150 -0.1850 -1.0450 -0.3550  0.2950 -0.0150 0.3850 0.3050  
5 mal_rna_11_rRNA  0.1200 -0.0900 -0.9100 -0.6500  0.0900 -0.2700 0.9100 1.1200  
6 mal_rna_14_rRNA  0.0200 -0.0200 -0.8400 -0.6600  0.1700 -0.0900 0.6200 0.0800 

I can apply smooth.spline on each row independently and it looks good with spline() so far (I want 48 points. I'll figure out later how to do it with smoooth.spline spar):

> time <- c(0,6,12,18,24,30,36,42)  
> plot(time, mUnique[1, 2:9])  
> smooth <- smooth.spline(time, mUnique[1, 2:9])  
> lines(smooth, col="blue")  
> splin <-spline(time, mUnique[1, 2:9], n=48)  
> lines(splin, col="blue")  

My question is I suppose basic, but how to I apply smooth.spline() or spline() to the whole dataframe, and get back a matrix 4565 * 49 where I have the coordinates for each knots of the smoothed spline? I don't really care about plotting that data.

I tried:

> smooth <- smooth.spline(time, mUnique[, 2:9]|factor(ID))

Now, don't know what to do. Is that a matter of making loops?

Thank you in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

邮友 2024-11-03 19:29:12

这是您要找的吗?

time <- c(0,6,12,18,24,30,36,42)

t(
  apply(mUnique[-1],1,
    function(x){
      tmp <- smooth.spline(time,x)
      predict(tmp,seq(min(time),max(time),length.out=49))$y
    }
  )
)

它应该为您提供您所描述的矩阵。

额外说明:

我删除了第一列 (mUnique[-1])。这是列表方式,您也可以使用mUnique[,-1],这相当于矩阵。两者都适用于数据框。

然后我告诉 apply 将函数应用到行上,这是第一个边距。

我定义的函数

function(x){
        tmp <- smooth.spline(time,x)
        predict(tmp,seq(min(time),max(time),length.out=49))$y
    }

是一个两线函数:

  • 我计算平滑样条线
  • 我计算 49 个点的规则序列的预测 (seq(min(time),max(time),length.out=49)< /code>),并获取该预测的 y 值。

此函数定义中的 x 是传递的参数。在本例中,它代表 apply 函数传递的一行。

最后,我转置矩阵 (t) 以使其达到您请求的格式。

该代码与以下测试用例完美运行:

mUnique <- read.table(textConnection("
         ID      h0      h6     h12     h18     h24     h30    h36    h42
     mal_mito_1  0.1800 -0.0400 -0.8600 -0.7300  0.0400 -0.1400 1.4400 1.4900
     mal_mito_2 -0.0250 -0.3050 -1.1050 -1.2150  0.0250  1.2750 1.7950 1.3850
     mal_mito_3  0.2125 -0.1875 -1.0075 -1.2225 -0.1425  0.7325 1.6275 1.3975
mal_rna_10_rRNA  0.0150 -0.1850 -1.0450 -0.3550  0.2950 -0.0150 0.3850 0.3050
mal_rna_11_rRNA  0.1200 -0.0900 -0.9100 -0.6500  0.0900 -0.2700 0.9100 1.1200
mal_rna_14_rRNA  0.0200 -0.0200 -0.8400 -0.6600  0.1700 -0.0900 0.6200 0.0800 ")
,header=T)

time <- c(0,6,12,18,24,30,36,42)

确保在运行我的代码之前定义时间...

Is this what you're looking for?

time <- c(0,6,12,18,24,30,36,42)

t(
  apply(mUnique[-1],1,
    function(x){
      tmp <- smooth.spline(time,x)
      predict(tmp,seq(min(time),max(time),length.out=49))$y
    }
  )
)

It should give you the matrix as you described.

Extra explanation :

I drop the first column (mUnique[-1]). This is the list way of doing it, you can also do mUnique[,-1], which is the matrix equivalent. Both work for dataframes.

Then I tell apply to apply the function over the rows, which is the first margin.

The function I define,

function(x){
        tmp <- smooth.spline(time,x)
        predict(tmp,seq(min(time),max(time),length.out=49))$y
    }

is a two-liner :

  • I calculate the smooth spline
  • I calculate the predictions for a regular sequence of 49 points (seq(min(time),max(time),length.out=49)), and take the y values of that prediction.

The x in this function definition is the argument that is passed. In this case it represents one row that is passed by the apply function.

Finally, I transpose the matrix (t) to get it in the format you requested.

The code runs perfectly with the following testcase :

mUnique <- read.table(textConnection("
         ID      h0      h6     h12     h18     h24     h30    h36    h42
     mal_mito_1  0.1800 -0.0400 -0.8600 -0.7300  0.0400 -0.1400 1.4400 1.4900
     mal_mito_2 -0.0250 -0.3050 -1.1050 -1.2150  0.0250  1.2750 1.7950 1.3850
     mal_mito_3  0.2125 -0.1875 -1.0075 -1.2225 -0.1425  0.7325 1.6275 1.3975
mal_rna_10_rRNA  0.0150 -0.1850 -1.0450 -0.3550  0.2950 -0.0150 0.3850 0.3050
mal_rna_11_rRNA  0.1200 -0.0900 -0.9100 -0.6500  0.0900 -0.2700 0.9100 1.1200
mal_rna_14_rRNA  0.0200 -0.0200 -0.8400 -0.6600  0.1700 -0.0900 0.6200 0.0800 ")
,header=T)

time <- c(0,6,12,18,24,30,36,42)

Make sure you define time before running my code...

丶视觉 2024-11-03 19:29:12

使用对象 dat 中的数据片段,我们可以做(我认为)您想要的事情。首先,我们编写一个通过 smooth.spline() 拟合平滑样条线的小包装函数,然后预测该样条线对一组 n 位置的响应。您要求 n = 48,因此我们将使用它作为默认值。

这是一个这样的包装函数:

SSpline <- function(x, y, n = 48, ...) {
    ## fit the spline to x, and y
    mod <- smooth.spline(x, y, ...)
    ## predict from mod for n points over range of x
    pred.dat <- seq(from = min(x), to = max(x), length.out = n)
    ## predict
    preds <- predict(mod, x = pred.dat)
    ## return
    preds
}

我们检查这是否适用于数据的第一行:

> res <- SSpline(time, dat[1, 2:9])
> res
$x
 [1]  0.000000  0.893617  1.787234  2.680851  3.574468  4.468085  5.361702
 [8]  6.255319  7.148936  8.042553  8.936170  9.829787 10.723404 11.617021
[15] 12.510638 13.404255 14.297872 15.191489 16.085106 16.978723 17.872340
[22] 18.765957 19.659574 20.553191 21.446809 22.340426 23.234043 24.127660
[29] 25.021277 25.914894 26.808511 27.702128 28.595745 29.489362 30.382979
[36] 31.276596 32.170213 33.063830 33.957447 34.851064 35.744681 36.638298
[43] 37.531915 38.425532 39.319149 40.212766 41.106383 42.000000

$y
 [1]  0.052349585  0.001126837 -0.049851737 -0.100341294 -0.150096991
 [6] -0.198873984 -0.246427429 -0.292510695 -0.336721159 -0.378381377
[11] -0.416785932 -0.451229405 -0.481006377 -0.505411429 -0.523759816
[16] -0.535714043 -0.541224748 -0.540251293 -0.532753040 -0.518689349
[21] -0.498019582 -0.470750611 -0.437182514 -0.397727107 -0.352796426
[26] -0.302802508 -0.248157388 -0.189272880 -0.126447574 -0.059682959
[31]  0.011067616  0.085850805  0.164713260  0.247701633  0.334851537
[36]  0.425833795  0.519879613  0.616194020  0.713982047  0.812448724
[41]  0.910799082  1.008296769  1.104781306  1.200419068  1.295380186
[46]  1.389834788  1.483953003  1.577904960

> plot(time, dat[1, 2:9])
> lines(res, col = "blue")

它给出:

plot offitting spline

这似乎有效,所以现在我们可以将函数应用于数据集,仅保留 SSpline() 返回的对象的 $y 组件。为此,我们使用 apply()

> res2 <- apply(dat[, 2:9], 1,
+               function(y, x, ...) { SSpline(x, y, ...)$y },
+               x = time)
> head(res2)
                1           2           3           4           5           6
[1,]  0.052349585 -0.02500000  0.21250000 -0.06117869 -0.02153366 -0.02295792
[2,]  0.001126837 -0.04293509  0.17175460 -0.10994988 -0.06538250 -0.06191095
[3,] -0.049851737 -0.06407856  0.12846458 -0.15838412 -0.10899505 -0.10074427
[4,] -0.100341294 -0.09168227  0.08005550 -0.20614476 -0.15213426 -0.13933920
[5,] -0.150096991 -0.12899810  0.02395291 -0.25289514 -0.19456304 -0.17757705
[6,] -0.198873984 -0.17927793 -0.04241763 -0.29829862 -0.23604434 -0.21533911

现在 res2 包含 48 行和 6 列,这 6 列指的是此处使用的 dat 的每一行。如果您想要相反,只需转置 res2t(res2)

我们可以通过一个简单的 matplot() 调用看到已经完成的操作:

> matplot(x = seq(min(time), max(time), length = 48), 
+         y = res2, type = "l")

它会生成:

fitted splines

Using your snippet of data in object dat, we can do what (I think) you want. First we write a little wrapper function that fits a smoothing spline via smooth.spline(), and then predicts the response from this spline for a set of n locations. You ask for n = 48 so we'll use that as the default.

Here is one such wrapper function:

SSpline <- function(x, y, n = 48, ...) {
    ## fit the spline to x, and y
    mod <- smooth.spline(x, y, ...)
    ## predict from mod for n points over range of x
    pred.dat <- seq(from = min(x), to = max(x), length.out = n)
    ## predict
    preds <- predict(mod, x = pred.dat)
    ## return
    preds
}

We check this works for the first row of your data:

> res <- SSpline(time, dat[1, 2:9])
> res
$x
 [1]  0.000000  0.893617  1.787234  2.680851  3.574468  4.468085  5.361702
 [8]  6.255319  7.148936  8.042553  8.936170  9.829787 10.723404 11.617021
[15] 12.510638 13.404255 14.297872 15.191489 16.085106 16.978723 17.872340
[22] 18.765957 19.659574 20.553191 21.446809 22.340426 23.234043 24.127660
[29] 25.021277 25.914894 26.808511 27.702128 28.595745 29.489362 30.382979
[36] 31.276596 32.170213 33.063830 33.957447 34.851064 35.744681 36.638298
[43] 37.531915 38.425532 39.319149 40.212766 41.106383 42.000000

$y
 [1]  0.052349585  0.001126837 -0.049851737 -0.100341294 -0.150096991
 [6] -0.198873984 -0.246427429 -0.292510695 -0.336721159 -0.378381377
[11] -0.416785932 -0.451229405 -0.481006377 -0.505411429 -0.523759816
[16] -0.535714043 -0.541224748 -0.540251293 -0.532753040 -0.518689349
[21] -0.498019582 -0.470750611 -0.437182514 -0.397727107 -0.352796426
[26] -0.302802508 -0.248157388 -0.189272880 -0.126447574 -0.059682959
[31]  0.011067616  0.085850805  0.164713260  0.247701633  0.334851537
[36]  0.425833795  0.519879613  0.616194020  0.713982047  0.812448724
[41]  0.910799082  1.008296769  1.104781306  1.200419068  1.295380186
[46]  1.389834788  1.483953003  1.577904960

> plot(time, dat[1, 2:9])
> lines(res, col = "blue")

which gives:

plot of fitted spline

That seems to work, so now we can apply the function over the set of data, keep only the $y component of the object returned by SSpline(). For that we use apply():

> res2 <- apply(dat[, 2:9], 1,
+               function(y, x, ...) { SSpline(x, y, ...)$y },
+               x = time)
> head(res2)
                1           2           3           4           5           6
[1,]  0.052349585 -0.02500000  0.21250000 -0.06117869 -0.02153366 -0.02295792
[2,]  0.001126837 -0.04293509  0.17175460 -0.10994988 -0.06538250 -0.06191095
[3,] -0.049851737 -0.06407856  0.12846458 -0.15838412 -0.10899505 -0.10074427
[4,] -0.100341294 -0.09168227  0.08005550 -0.20614476 -0.15213426 -0.13933920
[5,] -0.150096991 -0.12899810  0.02395291 -0.25289514 -0.19456304 -0.17757705
[6,] -0.198873984 -0.17927793 -0.04241763 -0.29829862 -0.23604434 -0.21533911

Now res2 contains 48 rows and 6 columns, the 6 columns refer to each row of dat used here. If you want it the other way round, just transpose res2: t(res2).

We can see what has been done via a simple matplot() call:

> matplot(x = seq(min(time), max(time), length = 48), 
+         y = res2, type = "l")

which produces:

fitted splines

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文