将数据帧的各个子集乘以不同的向量

发布于 2024-11-27 07:28:05 字数 1685 浏览 0 评论 0原文

我想将数据框中的几列乘以值向量。特定的值向量根据另一列中的值而变化。

--编辑--

如果我使数据集更复杂,即超过 2 个条件,并且条件在数据集周围随机洗牌,该怎么办?

这是我的数据集的一个示例:

df=data.frame(
  Treatment=(rep(LETTERS[1:4],each=2)),
  Species=rep(1:4,each=2),
  Value1=c(0,0,1,3,4,2,0,0),
  Value2=c(0,0,3,4,2,1,4,5),
  Value3=c(0,2,4,5,2,1,4,5),
  Condition=c("A","B","A","C","B","A","B","C")
  )

看起来像:

 Treatment Species Value1 Value2 Value3 Condition
     A       1      0      0      0         A
     A       1      0      0      2         B 
     B       2      1      3      4         A
     B       2      3      4      5         C
     C       3      4      2      2         B
     C       3      2      1      1         A
     D       4      0      4      4         B
     D       4      0      5      5         C

如果 Condition=="A",我想将第 3-5 列乘以向量 c(1,2,3 )。如果 Condition=="B",我想将第 3-5 列乘以向量 c(4,5,6)。如果 Condition=="C",我想将第 3-5 列乘以向量 c(0,1,0)。因此,生成的数据框将如下所示:

 Treatment Species Value1 Value2 Value3 Condition
     A       1      0      0      0         A
     A       1      0      0     12         B 
     B       2      1      6     12         A
     B       2      0      4      0         C
     C       3     16     10     12         B
     C       3      2      2      3         A
     D       4      0     20     24         B
     D       4      0      5      0         C

我尝试对数据框进行子集化并乘以向量:

t(t(subset(df[,3:5],df[,6]=="A")) * c(1,2,3))

但我无法将子集化的数据框返回到原始数据框。有没有什么方法可以在不对数据框进行子集化的情况下执行此操作,以便保留其他列(例如,处理、物种)?

I would like to multiply several columns in my data frame by a vector of values. The specific vector of values changes depending on the value in another column.

--EDIT--

What if I make the data set more complicated, i.e., more than 2 conditions and the conditions are randomly shuffled around the data set?

Here is an example of my data set:

df=data.frame(
  Treatment=(rep(LETTERS[1:4],each=2)),
  Species=rep(1:4,each=2),
  Value1=c(0,0,1,3,4,2,0,0),
  Value2=c(0,0,3,4,2,1,4,5),
  Value3=c(0,2,4,5,2,1,4,5),
  Condition=c("A","B","A","C","B","A","B","C")
  )

Which looks like:

 Treatment Species Value1 Value2 Value3 Condition
     A       1      0      0      0         A
     A       1      0      0      2         B 
     B       2      1      3      4         A
     B       2      3      4      5         C
     C       3      4      2      2         B
     C       3      2      1      1         A
     D       4      0      4      4         B
     D       4      0      5      5         C

If Condition=="A", I would like to multiply columns 3-5 by the vector c(1,2,3). If Condition=="B", I would like to multiply columns 3-5 by the vector c(4,5,6). If Condition=="C", I would like to multiply columns 3-5 by the vector c(0,1,0). The resulting data frame would therefore look like this:

 Treatment Species Value1 Value2 Value3 Condition
     A       1      0      0      0         A
     A       1      0      0     12         B 
     B       2      1      6     12         A
     B       2      0      4      0         C
     C       3     16     10     12         B
     C       3      2      2      3         A
     D       4      0     20     24         B
     D       4      0      5      0         C

I have tried subsetting the data frame and multiplying by the vector:

t(t(subset(df[,3:5],df[,6]=="A")) * c(1,2,3))

But I can't return the subsetted data frame to the original. Is there any way to perform this operation without subsetting the data frame, so that other columns (e.g., Treatment, Species) are preserved?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

策马西风 2024-12-04 07:28:06

编辑以反映评论中的一些注释

假设条件是一个因素,您可以这样做:

#Modified to reflect OP's edit - the same solution works just fine
m <- matrix(c(1:6,0,1,0),3,3,byrow = TRUE)
df[,3:5] <- with(df,df[,3:5] * m[Condition,])

这利用了相当快的矢量化乘法。显然,将其包装在 with 中并不是绝对必要的,这只是我脑子里突然想到的。另请注意 Backlin 下面的子集化评论。

更全局地说,请记住,您可以使用 subset 执行的每个子集设置也可以使用 [ 执行,最重要的是,[ 支持通过 进行分配[<-。因此,如果您想更改数据框或矩阵的一部分,您始终可以使用这种类型的习惯用法:

df[rowCondition,colCondition] <- <replacement values>

当然假设 与您的 df。否则它可能会起作用,但你会违反 R 的回收规则,并且 R 可能会退回警告。

Edited to reflect some notes from the comments

Assuming that Condition is a factor, you could do this:

#Modified to reflect OP's edit - the same solution works just fine
m <- matrix(c(1:6,0,1,0),3,3,byrow = TRUE)
df[,3:5] <- with(df,df[,3:5] * m[Condition,])

which makes use of fairly quick vectorized multiplication. And obviously, wrapping this in with isn't strictly necessary, it's just what popped out of my brain. Also note the subsetting comment below by Backlin.

More globally, remember that every subsetting you can do with subset you can also do with [, and crucially, [ support assignment via [<-. So if you want to alter a portion of a data frame or matrix, you can always use this type of idiom:

df[rowCondition,colCondition] <- <replacement values>

assuming of course that <replacement values> is the same dimension as your subset of df. It may work otherwise, but you will run afoul of R's recycling rules and R may kick back a warning.

独闯女儿国 2024-12-04 07:28:06
df[3:5] <- df[3:5] * t(sapply(df$Condition, function(x) if(x=="B") 4:6 else 1:3))

或者通过向量乘法

df[3:5] <- df[3:5] * (3*(df$Condition == "B") %*% matrix(1, 1, 3)
                      + matrix(1:3, nrow(df), 3, byrow=T))
df[3:5] <- df[3:5] * t(sapply(df$Condition, function(x) if(x=="B") 4:6 else 1:3))

Or by vector multiplication

df[3:5] <- df[3:5] * (3*(df$Condition == "B") %*% matrix(1, 1, 3)
                      + matrix(1:3, nrow(df), 3, byrow=T))
缱倦旧时光 2024-12-04 07:28:05

这是一个相当通用的解决方案,您应该能够对其进行调整以满足您的需求。

请注意,outer 调用中的第一个参数是逻辑向量,第二个参数是数字,因此在乘法之前,TRUEFALSE 会转换为1 和 0。我们可以添加 outer 结果,因为条件不重叠,并且 FALSE 元素将为零。

multiples <-
  outer(df$Condition=="A",c(1,2,3)) +
  outer(df$Condition=="B",c(4,5,6)) +
  outer(df$Condition=="C",c(0,1,0))

df[,3:5] <- df[,3:5] * multiples

Here's a fairly general solution that you should be able to adapt to fit your needs.

Note the first argument in the outer call is a logical vector and the second is numeric, so before multiplication TRUE and FALSE are converted to 1 and 0, respectively. We can add the outer results because the conditions are non-overlapping and the FALSE elements will be zero.

multiples <-
  outer(df$Condition=="A",c(1,2,3)) +
  outer(df$Condition=="B",c(4,5,6)) +
  outer(df$Condition=="C",c(0,1,0))

df[,3:5] <- df[,3:5] * multiples
神也荒唐 2024-12-04 07:28:05

这是一个非矢量化但易于理解的解决方案:

 replaceFunction <- function(v){
   m <- as.numeric(v[3:5])
   if (v[6]=="A")
     out <- m * c(1,2,3)
   else if (v[6]=="B")
     out <- m * c(4,5,6)
   else
     out <- m
   return(out)
 }

 g <- apply(df, 1, replaceFunction)
 df[3:5] <- t(g)
 df

Here's a non-vectorized, but easy to understand solution:

 replaceFunction <- function(v){
   m <- as.numeric(v[3:5])
   if (v[6]=="A")
     out <- m * c(1,2,3)
   else if (v[6]=="B")
     out <- m * c(4,5,6)
   else
     out <- m
   return(out)
 }

 g <- apply(df, 1, replaceFunction)
 df[3:5] <- t(g)
 df
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文