从绝对数到二级数据中的比例(R!SAC?plyr?)
我将数据嵌套在各个级别中:
L1 L2 x1 x2 x3 x4
A This 20 14 12 15
A That 11 NA 8 16
A Bat Na 22 13 9
B This 10 9 11 6
B That 3 3 1 NA
B Bat 4 10 2 8
现在我想要一些简单的东西 - 我觉得我上个月就能够做到这一点。但我脑子里缺少了一些东西:我想要百分比(忽略 NA),L1 中每个变量的总和为 100
L1 L2 x1 x2 x3 x4
A This 65% 39% 36% 38%
A That 35% 0% 24% 40%
A Bat 0% 61% 40% 22%
我可以得到我需要的总数
cast(L1~variable, data=melt(d, na.rm=T),sum)
但我想应该可以编写一个函数来给我我想要的东西想? 我用cast和plyr尝试了各种方法……但看来圣诞节已经给我脆弱的大脑带来了很多啤酒。
任何帮助将不胜感激 - 任何不投反对票的行为也将不胜感激。
谢谢
这是我的数据:
d <- structure(list(level1 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L,
6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L,
11L, 11L, 11L, 11L, 11L, 11L, 9L, 9L, 9L, 9L, 9L, 12L, 12L, 12L,
12L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L,
15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L,
17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L,
19L, 19L), .Label = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s"), class = "factor"),
level2 = structure(c(6L, 2L, 1L, 3L, 5L, 6L, 1L, 3L, 5L,
6L, 1L, 3L, 5L, 6L, 5L, 6L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L,
5L, 6L, 1L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 6L, 1L, 3L, 5L, 6L,
2L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 6L, 1L, 3L, 5L, 6L,
2L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L,
5L, 6L, 2L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 6L, 2L, 1L,
3L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 4L), .Label = c("This", "That",
"Phat", "Bat", "Man", "Hat"), class = "factor"), X2002 = c(28L,
9L, 17L, 8L, 95L, 18L, NA, NA, 36L, 40L, 15L, 10L, 71L, NA,
14L, 25L, 18L, NA, 56L, 5L, 29L, 5L, 13L, 8L, 65L, 23L, 8L,
34L, NA, 14L, 5L, 5L, NA, 51L, 18L, NA, 5L, 56L, 30L, 8L,
9L, 11L, 77L, 5L, 53L, 12L, 16L, 13L, 114L, 30L, 8L, NA,
52L, 38L, NA, 12L, 5L, 87L, 5L, 35L, NA, 10L, 6L, 92L, 10L,
41L, NA, 22L, 8L, 115L, 27L, 6L, 9L, NA, 47L, 9L, 29L, 6L,
11L, NA, 56L, 38L, 7L, 10L, NA, 93L, 6L, 22L, 9L, 9L, NA,
59L, 5L), X2003 = c(32L, NA, 16L, 9L, 76L, 10L, NA, 5L, 24L,
22L, 12L, 9L, 63L, 12L, 9L, 36L, 9L, 6L, 83L, 5L, 35L, NA,
12L, 8L, 82L, 19L, 5L, 53L, 5L, 10L, NA, 7L, NA, 35L, 15L,
6L, 6L, 40L, 30L, NA, 10L, 8L, 85L, 9L, 46L, NA, 14L, 9L,
106L, 24L, 6L, 7L, 56L, 33L, NA, 12L, 9L, 106L, NA, 37L,
7L, 11L, 8L, 79L, 5L, 54L, 5L, 10L, 6L, 100L, 25L, 9L, 5L,
6L, 49L, NA, 31L, NA, 13L, 10L, 79L, 46L, NA, 14L, NA, 82L,
5L, 21L, 7L, 11L, NA, 69L, NA), X2004 = c(35L, 6L, 13L, 8L,
82L, 12L, 5L, NA, 35L, 34L, 5L, 6L, 75L, 9L, 9L, 40L, 13L,
9L, 70L, NA, 41L, NA, 17L, 10L, 83L, 10L, 6L, 40L, NA, 18L,
NA, 6L, NA, 34L, 10L, NA, NA, 45L, 38L, 6L, 11L, NA, 74L,
NA, 45L, 5L, 12L, 9L, 131L, 34L, NA, NA, 64L, 28L, 5L, NA,
NA, 93L, NA, 32L, NA, 9L, 11L, 99L, NA, 40L, NA, 18L, 8L,
104L, 14L, NA, 13L, 6L, 67L, NA, 23L, NA, 6L, 8L, 85L, 49L,
NA, 19L, 7L, 102L, NA, 28L, 5L, 7L, 7L, 74L, NA), X2005 = c(36L,
NA, 20L, 10L, 93L, 22L, NA, NA, 35L, 38L, 13L, 9L, 99L, NA,
14L, 48L, 17L, 7L, 70L, NA, 35L, NA, 13L, 9L, 103L, 16L,
5L, 49L, NA, 12L, NA, 5L, 8L, 51L, 15L, 7L, 5L, 45L, 40L,
NA, 12L, 5L, 102L, NA, 40L, NA, 21L, 16L, 141L, 25L, 9L,
10L, 70L, 41L, NA, 10L, NA, 111L, NA, 37L, NA, 10L, 9L, 124L,
NA, 37L, NA, 12L, 12L, 124L, 32L, NA, 16L, 6L, 45L, NA, 33L,
NA, 8L, NA, 101L, 51L, NA, 19L, 5L, 117L, NA, 17L, NA, 11L,
5L, 73L, NA), X2006 = c(38L, NA, 22L, 13L, 103L, 15L, NA,
7L, 44L, 39L, 11L, 6L, 95L, NA, 15L, 53L, 16L, 9L, 89L, NA,
41L, NA, 12L, 13L, 87L, 30L, 6L, 43L, NA, 14L, NA, 6L, 5L,
50L, 19L, 5L, NA, 63L, 23L, NA, 6L, NA, 75L, NA, 38L, NA,
12L, 19L, 142L, 32L, 7L, 7L, 64L, 49L, NA, 13L, 12L, 114L,
NA, 48L, NA, 23L, 5L, 136L, NA, 52L, NA, 15L, 16L, 127L,
24L, NA, 6L, NA, 57L, NA, 32L, NA, NA, 13L, 96L, 20L, NA,
10L, 21L, 102L, NA, 31L, NA, 5L, 12L, 93L, NA)), .Names = c("level1",
"level2", "X2002", "X2003", "X2004", "X2005", "X2006"), row.names = c(NA,
-93L), class = "data.frame")
I have data nested in to levels:
L1 L2 x1 x2 x3 x4
A This 20 14 12 15
A That 11 NA 8 16
A Bat Na 22 13 9
B This 10 9 11 6
B That 3 3 1 NA
B Bat 4 10 2 8
Now I want something simply - and I feel I have been able to do this just last month. But something has gone missing in my head: I want percentages (ignoring NA), summing to 100 for each variable in L1
L1 L2 x1 x2 x3 x4
A This 65% 39% 36% 38%
A That 35% 0% 24% 40%
A Bat 0% 61% 40% 22%
I can get the totals I need with
cast(L1~variable, data=melt(d, na.rm=T),sum)
But I guess it should be possible to cook up a function that gives me what I want?
I tried various approaches with cast and plyr... But it seams xmas has already brought to many beers to my frail brain.
Any help will be appreciated - as will any refrain from a downvote.
Thanx
this is my data:
d <- structure(list(level1 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L,
6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L,
11L, 11L, 11L, 11L, 11L, 11L, 9L, 9L, 9L, 9L, 9L, 12L, 12L, 12L,
12L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L,
15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L,
17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L,
19L, 19L), .Label = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s"), class = "factor"),
level2 = structure(c(6L, 2L, 1L, 3L, 5L, 6L, 1L, 3L, 5L,
6L, 1L, 3L, 5L, 6L, 5L, 6L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L,
5L, 6L, 1L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 6L, 1L, 3L, 5L, 6L,
2L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 6L, 1L, 3L, 5L, 6L,
2L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L,
5L, 6L, 2L, 1L, 3L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 6L, 2L, 1L,
3L, 5L, 4L, 6L, 2L, 1L, 3L, 5L, 4L), .Label = c("This", "That",
"Phat", "Bat", "Man", "Hat"), class = "factor"), X2002 = c(28L,
9L, 17L, 8L, 95L, 18L, NA, NA, 36L, 40L, 15L, 10L, 71L, NA,
14L, 25L, 18L, NA, 56L, 5L, 29L, 5L, 13L, 8L, 65L, 23L, 8L,
34L, NA, 14L, 5L, 5L, NA, 51L, 18L, NA, 5L, 56L, 30L, 8L,
9L, 11L, 77L, 5L, 53L, 12L, 16L, 13L, 114L, 30L, 8L, NA,
52L, 38L, NA, 12L, 5L, 87L, 5L, 35L, NA, 10L, 6L, 92L, 10L,
41L, NA, 22L, 8L, 115L, 27L, 6L, 9L, NA, 47L, 9L, 29L, 6L,
11L, NA, 56L, 38L, 7L, 10L, NA, 93L, 6L, 22L, 9L, 9L, NA,
59L, 5L), X2003 = c(32L, NA, 16L, 9L, 76L, 10L, NA, 5L, 24L,
22L, 12L, 9L, 63L, 12L, 9L, 36L, 9L, 6L, 83L, 5L, 35L, NA,
12L, 8L, 82L, 19L, 5L, 53L, 5L, 10L, NA, 7L, NA, 35L, 15L,
6L, 6L, 40L, 30L, NA, 10L, 8L, 85L, 9L, 46L, NA, 14L, 9L,
106L, 24L, 6L, 7L, 56L, 33L, NA, 12L, 9L, 106L, NA, 37L,
7L, 11L, 8L, 79L, 5L, 54L, 5L, 10L, 6L, 100L, 25L, 9L, 5L,
6L, 49L, NA, 31L, NA, 13L, 10L, 79L, 46L, NA, 14L, NA, 82L,
5L, 21L, 7L, 11L, NA, 69L, NA), X2004 = c(35L, 6L, 13L, 8L,
82L, 12L, 5L, NA, 35L, 34L, 5L, 6L, 75L, 9L, 9L, 40L, 13L,
9L, 70L, NA, 41L, NA, 17L, 10L, 83L, 10L, 6L, 40L, NA, 18L,
NA, 6L, NA, 34L, 10L, NA, NA, 45L, 38L, 6L, 11L, NA, 74L,
NA, 45L, 5L, 12L, 9L, 131L, 34L, NA, NA, 64L, 28L, 5L, NA,
NA, 93L, NA, 32L, NA, 9L, 11L, 99L, NA, 40L, NA, 18L, 8L,
104L, 14L, NA, 13L, 6L, 67L, NA, 23L, NA, 6L, 8L, 85L, 49L,
NA, 19L, 7L, 102L, NA, 28L, 5L, 7L, 7L, 74L, NA), X2005 = c(36L,
NA, 20L, 10L, 93L, 22L, NA, NA, 35L, 38L, 13L, 9L, 99L, NA,
14L, 48L, 17L, 7L, 70L, NA, 35L, NA, 13L, 9L, 103L, 16L,
5L, 49L, NA, 12L, NA, 5L, 8L, 51L, 15L, 7L, 5L, 45L, 40L,
NA, 12L, 5L, 102L, NA, 40L, NA, 21L, 16L, 141L, 25L, 9L,
10L, 70L, 41L, NA, 10L, NA, 111L, NA, 37L, NA, 10L, 9L, 124L,
NA, 37L, NA, 12L, 12L, 124L, 32L, NA, 16L, 6L, 45L, NA, 33L,
NA, 8L, NA, 101L, 51L, NA, 19L, 5L, 117L, NA, 17L, NA, 11L,
5L, 73L, NA), X2006 = c(38L, NA, 22L, 13L, 103L, 15L, NA,
7L, 44L, 39L, 11L, 6L, 95L, NA, 15L, 53L, 16L, 9L, 89L, NA,
41L, NA, 12L, 13L, 87L, 30L, 6L, 43L, NA, 14L, NA, 6L, 5L,
50L, 19L, 5L, NA, 63L, 23L, NA, 6L, NA, 75L, NA, 38L, NA,
12L, 19L, 142L, 32L, 7L, 7L, 64L, 49L, NA, 13L, 12L, 114L,
NA, 48L, NA, 23L, 5L, 136L, NA, 52L, NA, 15L, 16L, 127L,
24L, NA, 6L, NA, 57L, NA, 32L, NA, NA, 13L, 96L, 20L, NA,
10L, 21L, 102L, NA, 31L, NA, 5L, 12L, 93L, NA)), .Names = c("level1",
"level2", "X2002", "X2003", "X2004", "X2005", "X2006"), row.names = c(NA,
-93L), class = "data.frame")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为这应该可以解决问题:
如果您希望所有内容都在一个数据框中,则可以对其运行
do.call(rbind,...)
。This should do the trick I think:
You can run a
do.call(rbind,...)
on that if you want everything in one data frame.据我了解这个问题,你有总数,使用:
...并且你想将它们转换为百分比。 (请注意,您在问题文本中将第一列称为“L1”,但数据结构将列称为“level1”。)
从总计到百分比比您想象的更简单。
As I understand the question, you have the totals, using:
... and you want to convert them to percentages. (Note that you called the first column "L1" in your question text, but the data structure calls the column "level1".)
Going from totals to percentages is more straightforward than you think.