使用数组结果作为原始数据帧的乘数

发布于 2024-12-22 02:52:51 字数 902 浏览 3 评论 0原文

对于给定的数据框,我想将数组的值乘以数据框的列。数据框由行组成,包含名称、数值和两个因子值:

name credit gender group
n1 10 m A
n2 20 f B
n3 30 m A
n4 40 m B
n5 50 f C

可以使用以下命令生成此数据框:

name    <- c('n1','n2','n3','n4','n5')
credit  <- c(10,20,30,40,50)
gender  <- c('m','f','m','m','f')
group   <- c('A','B','A','B','C')
DF      <-data.frame(cbind(name,credit,gender,group))
# binds columns together and uses it as a data frame

此外,我们还有一个从数据框派生的矩阵(在更复杂的情况下,这将是一个数组) 。该矩阵包含属于特定类别的所有合约的总价值(以 m/f 和 A​​/B/C 为特征):

   m f
A 40 NA
B 40 20
C NA 50

目标是使用分配给 DF$credit 中每个类别的相应值来乘以 DF$credit 中的值。矩阵,例如 DF 中第一行的值 10 将乘以 40(由 m 和 A 定义的类别)。

结果看起来像:

name credit gender group result
n1 10 m A 400
n2 20 f B 400
n3 30 m A 1200
n4 40 m B 1600
n5 50 f C 2500

如果可能的话,我想使用 R 基础包来执行此操作,但我愿意接受任何有效的有用解决方案。

for a given data frame I would like to multiply values of an array to a column of the data frame. The data frame consists of rows, containing a name, a numerical value and two factor values:

name credit gender group
n1 10 m A
n2 20 f B
n3 30 m A
n4 40 m B
n5 50 f C

This data frame can be generated using the commands:

name    <- c('n1','n2','n3','n4','n5')
credit  <- c(10,20,30,40,50)
gender  <- c('m','f','m','m','f')
group   <- c('A','B','A','B','C')
DF      <-data.frame(cbind(name,credit,gender,group))
# binds columns together and uses it as a data frame

Additionally we have a matrix derived from the data frame (in more complex cases this will be an array). This matrix contains the sum value of all contracts that fall into a particular category (characterized by m/f and A/B/C):

   m f
A 40 NA
B 40 20
C NA 50

The goal is to multiply the values in DF$credit by using the corresponding value assigned to each category in the matrix, e.g. the value 10 of the first row in DF would be multiplied by 40 (the category defined by m and A).

The result would look like:

name credit gender group result
n1 10 m A 400
n2 20 f B 400
n3 30 m A 1200
n4 40 m B 1600
n5 50 f C 2500

If possible, I would like to perform this using the R base package but I am open for any helpful solutions that work nicely.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

白况 2024-12-29 02:52:51

您可以通过使用 DF$groupDF$gender 创建索引矩阵,将一组索引构造为 衍生(即您的衍生矩阵) 。 as.character 存在的原因是 DF$groupDF$gender 是因素,而我只想要字符索引。

>idx = matrix( c(as.character(DF$group),as.character(DF$gender)),ncol=2)
>idx
[,1] [,2]
[1,] "A"  "m" 
[2,] "B"  "f" 
[3,] "A"  "m" 
[4,] "B"  "m" 
[5,] "C"  "f" 
>DF$result = DF$credit * derived[idx]

请注意最后一行,使用上面的代码生成 DF,您的数字列将作为因子(即 DF$credit 是一个因子)。在这种情况下,您需要执行 as.numeric(DF$credit)*衍生[idx]。但是,我想在您的实际数据中,您的数据框没有 DF$credit 作为因子,而是作为数字。

You can construct a set of indices into derived (being your derived matrix) by making an index matrix out of DF$group and DF$gender. The reason the as.character is there is because DF$group and DF$gender are factors, whereas I just want character indices.

>idx = matrix( c(as.character(DF$group),as.character(DF$gender)),ncol=2)
>idx
[,1] [,2]
[1,] "A"  "m" 
[2,] "B"  "f" 
[3,] "A"  "m" 
[4,] "B"  "m" 
[5,] "C"  "f" 
>DF$result = DF$credit * derived[idx]

Note with that last line, using the code you have above to generate DF, your numeric columns turn out as factors (ie DF$credit is a factor). In that case you need to do as.numeric(DF$credit)*derived[idx]. However, I imagine that in your actual data your data frame doesn't have DF$credit as a factor but instead as a numeric.

尛丟丟 2024-12-29 02:52:51

创建 data.frame 对象时,不要使用 cbind,它不是必需的,它会强制信用变量成为一个因素。

只需使用 DF <- data.frame(name,credit,gender,group)

然后运行一个 for 循环来遍历 data.frame 对象中的每一行。

n <- length(DF$credit)
result <- rep(0, n)
for(i in 1:n) {
  result[i] <- DF$credit[i] * sum(DF$credit[DF$gender==DF$gender[i] & DF$group==DF$group[i]])
}

将您的 data.frame 对象替换为包含您的结果的新对象。

DF <- data.frame(name, credit, gender, group, result)

When you create the data.frame object, don't use cbind, it's not necessary and it forces the credit variable to become a factor.

Just use DF <- data.frame(name, credit, gender, group)

Then run a for loop that goes through each row in your data.frame object.

n <- length(DF$credit)
result <- rep(0, n)
for(i in 1:n) {
  result[i] <- DF$credit[i] * sum(DF$credit[DF$gender==DF$gender[i] & DF$group==DF$group[i]])
}

Replace your data.frame object with this new one that includes your results.

DF <- data.frame(name, credit, gender, group, result)
九命猫 2024-12-29 02:52:51

我推荐 plyr 包,但您可以使用基本 < code>by 函数:

> by(DF, DF['name'], function (row) row$credit * m[as.character(row$group), as.character(row$gender)])
name: n1
[1] 400
--------------------------------------------------------------------- 
name: n2
[1] 400
--------------------------------------------------------------------- 
name: n3
[1] 1200
--------------------------------------------------------------------- 
name: n4
[1] 1600
--------------------------------------------------------------------- 
name: n5
[1] 2500

plyr 可以以数据框的形式给出结果,这很好:

> ddply(DF, .(name), function (row) row$credit * m[as.character(row$group), as.character(row$gender)])
  name   V1
1   n1  400
2   n2  400
3   n3 1200
4   n4 1600
5   n5 2500

I recommend the plyr package, but you can do this using the base by function:

> by(DF, DF['name'], function (row) row$credit * m[as.character(row$group), as.character(row$gender)])
name: n1
[1] 400
--------------------------------------------------------------------- 
name: n2
[1] 400
--------------------------------------------------------------------- 
name: n3
[1] 1200
--------------------------------------------------------------------- 
name: n4
[1] 1600
--------------------------------------------------------------------- 
name: n5
[1] 2500

plyr can give you the result as a data frame which is nice:

> ddply(DF, .(name), function (row) row$credit * m[as.character(row$group), as.character(row$gender)])
  name   V1
1   n1  400
2   n2  400
3   n3 1200
4   n4 1600
5   n5 2500
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文