for循环中的for循环?

发布于 2024-11-27 05:39:33 字数 520 浏览 0 评论 0原文

我有两个数据帧:

df1<- as.data.frame(matrix(1:15, ncol=5))
df2<- as.data.frame(matrix(30:44,ncol=5))

通过使用这两个数据帧,我想计算 zscore。功能是:

z = (X - u)/ O

df1 包含所有 X 值,df2 数据帧的每一行包含用于计算平均值和标准差的值。 我生成一个循环,计算 df1 第一列中每个值的 z 分数。但现在我的问题是:如何计算整个数据框的 z 分数?

test <- list()
for (i in 1:nrow(df1) {
  zscore<- (df1[i,1] - (apply(df2[i,],1,mean))) / (apply(df2[i,],1,sd))
  test[[i]] <- matrix(zscore)
  i <- 1+1
}

谢谢大家!

I have two dataframes:

df1<- as.data.frame(matrix(1:15, ncol=5))
df2<- as.data.frame(matrix(30:44,ncol=5))

By using the two dataframes I want to calculate the zscore. The functions is:

z = (X - u)/ O

df1 contains all the X values, and each row of the df2 dataframe contains values to calculate the mean and the sd.
I generate a loop that calculate for each value in the first column of df1 the z score. But now my question is: How can I calculate the z score for the whole dataframe?

test <- list()
for (i in 1:nrow(df1) {
  zscore<- (df1[i,1] - (apply(df2[i,],1,mean))) / (apply(df2[i,],1,sd))
  test[[i]] <- matrix(zscore)
  i <- 1+1
}

Thank you all!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

物价感观 2024-12-04 05:39:33

[我认为你这里的行/列是倒着的。 z 分数通常应用于变量,R 期望这些变量位于列中。我下面写的内容遵循通常的惯例。如果您确实想按行标准化,请进行相应更改。]

sweep() 是您的通用朋友。我们计算平均值和标准差,然后将它们从数据帧 df1 中清除(在本例中为减去):

## compute column means and sd
mns <- colMeans(df2)     ## rowMeans if by rows
sds <- apply(df2, 2, sd) ## 2 -> 1 if by rows

## Subtract the respective mean from each column
df3 <- sweep(df1, 2, mns, "-")  ## 2 -> 1 if by rows
## Divide by the respective sd
df3 <- sweep(df3, 2, sds, "/")  ## 2 -> 1 if by rows

这给出:

R> df3
   V1  V2  V3  V4  V5
1 -30 -30 -30 -30 -30
2 -29 -29 -29 -29 -29
3 -28 -28 -28 -28 -28

我们可以通过对 < 的第一列进行计算来检查这是否有效。 code>df3 以矢量化方式:

R> (df1[,1] - mean(df2[,1])) / sd(df2[,1])
[1] -30 -29 -28

对于这种特殊情况,还可以使用 scale() 函数并提供您自己的 centerscale,各自的平均值和标准差

R> scale(df1, center = mns, scale = sds)
      V1  V2  V3  V4  V5
[1,] -30 -30 -30 -30 -30
[2,] -29 -29 -29 -29 -29
[3,] -28 -28 -28 -28 -28
attr(,"scaled:center")
V1 V2 V3 V4 V5 
31 34 37 40 43 
attr(,"scaled:scale")
V1 V2 V3 V4 V5 
 1  1  1  1  1

[I think you have the row/cols backwards here. z-scores are usually applied to variables, which R would expect to be in columns. What I write below follows the usual convention. Change accordingly if you really want to standardise by rows.]

sweep() is your general purpose friend. We compute the means and standard deviations and then sweep (subtract in this case) them out of the data frame df1:

## compute column means and sd
mns <- colMeans(df2)     ## rowMeans if by rows
sds <- apply(df2, 2, sd) ## 2 -> 1 if by rows

## Subtract the respective mean from each column
df3 <- sweep(df1, 2, mns, "-")  ## 2 -> 1 if by rows
## Divide by the respective sd
df3 <- sweep(df3, 2, sds, "/")  ## 2 -> 1 if by rows

which gives:

R> df3
   V1  V2  V3  V4  V5
1 -30 -30 -30 -30 -30
2 -29 -29 -29 -29 -29
3 -28 -28 -28 -28 -28

We can check this has worked by doing the computations for the first column of df3 in a vectorised fashion:

R> (df1[,1] - mean(df2[,1])) / sd(df2[,1])
[1] -30 -29 -28

For this particular situation, one can also use the scale() function and supply your own center and scale, the respective means and standard deviations

R> scale(df1, center = mns, scale = sds)
      V1  V2  V3  V4  V5
[1,] -30 -30 -30 -30 -30
[2,] -29 -29 -29 -29 -29
[3,] -28 -28 -28 -28 -28
attr(,"scaled:center")
V1 V2 V3 V4 V5 
31 34 37 40 43 
attr(,"scaled:scale")
V1 V2 V3 V4 V5 
 1  1  1  1  1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文