R 问题中的分组依据:标题、小数位数和显示样本大小?
这是我的数据框“fulldays”(列标题 Plastic 指的是第二列数字,第一列只是 R 放入的行的编号列表,我不知道如何删除):
Plastic Age Ones Zeros Nonzeros CellsCounted AllDaysAvail
1 2 10 2 5 5 10 TRUE
2 57 8 4 2 8 10 TRUE
3 3 9 2 4 6 10 TRUE
4 81 9 3 1 9 10 TRUE
5 131 20 8 1 9 10 TRUE
6 5 8 5 5 5 10 TRUE
7 26 10 4 4 6 10 TRUE
8 76 12 2 6 4 10 TRUE
9 9 9 8 2 8 10 TRUE
10 36 14 2 5 5 10 TRUE
11 64 12 3 4 6 10 TRUE
12 74 22 5 4 6 10 TRUE
13 10 10 1 4 6 10 TRUE
14 21 9 7 3 7 10 TRUE
15 16 9 5 3 7 10 TRUE
17 18 8 4 3 7 10 TRUE
18 23 22 6 4 6 10 TRUE
19 106 11 2 1 9 10 TRUE
20 113 9 1 4 6 10 TRUE
21 24 11 2 5 5 10 TRUE
22 29 9 3 2 8 10 TRUE
23 85 9 6 4 6 10 TRUE
24 403 19 1 6 4 10 TRUE
25 25 19 1 2 8 10 TRUE
26 27 10 3 3 7 10 TRUE
27 121 7 7 3 7 10 TRUE
29 35 12 1 4 6 10 TRUE
30 39 18 2 6 4 10 TRUE
31 37 8 5 1 9 10 TRUE
32 63 7 8 2 8 10 TRUE
33 122 11 3 2 8 10 TRUE
34 148 9 4 4 6 10 TRUE
37 42 13 2 3 7 10 TRUE
38 144 12 0 9 1 10 TRUE
39 43 12 1 2 8 10 TRUE
40 47 20 6 4 6 10 TRUE
41 90 12 2 5 5 10 TRUE
42 119 12 2 4 6 10 TRUE
43 138 7 7 3 7 10 TRUE
44 56 4 7 3 7 10 TRUE
45 58 12 2 5 5 10 TRUE
46 60 22 3 4 6 10 TRUE
47 71 9 2 5 5 10 TRUE
48 288 18 0 10 0 10 TRUE
49 66 22 1 5 5 10 TRUE
50 67 9 0 8 2 10 TRUE
51 149 12 0 5 5 10 TRUE
52 70 14 5 4 6 10 TRUE
53 72 12 1 4 6 10 TRUE
54 78 12 0 4 6 10 TRUE
59 79 12 4 3 7 10 TRUE
60 83 11 4 4 6 10 TRUE
61 87 8 6 4 6 10 TRUE
63 92 11 1 4 6 10 TRUE
64 96 8 0 5 5 10 TRUE
65 125 7 7 3 7 10 TRUE
66 98 9 3 4 6 10 TRUE
67 107 6 2 3 7 10 TRUE
68 102 11 5 3 7 10 TRUE
69 103 10 0 1 9 10 TRUE
72 108 12 3 3 7 10 TRUE
73 153 12 4 3 7 10 TRUE
74 109 12 3 4 6 10 TRUE
75 118 10 4 5 5 10 TRUE
77 133 12 0 4 6 10 TRUE
79 157 8 0 10 0 10 TRUE
81 318 14 2 5 5 10 TRUE
我有以下代码:
new_data <- fulldays %>%
group_by(Age) %>%
summarize(OnesMean=mean(Ones), ZerosMean=mean(Zeros), NonZeroMean=mean(Nonzeros))
这是输出“new_data”(同样,Age 从第二列开始,而不是第一列):
Age OnesMean ZerosMean NonZeroMean
<int> <dbl> <dbl> <dbl>
1 4 7 3 7
2 6 2 3 7
3 7 7.25 2.75 7.25
4 8 3.43 4.29 5.71
5 9 3.67 3.67 6.33
6 10 2.33 3.67 6.33
7 11 2.83 3.17 6.83
8 12 1.75 4.31 5.69
9 13 2 3 7
10 14 3 4.67 5.33
11 18 1 8 2
12 19 1 4 6
13 20 7 2.5 7.5
14 22 3.75 4.25 5.75
我有三个问题:
- 为什么 groupby 创建 tibble 而不是 dataframe?
- 为什么当我在“数据”部分中单击“new_data”作为对象时,它会显示带有这么多小数位的值(见下图)? [1]: https://i.sstatic.net/724G9.png
- 我怎样才能将一列添加/绑定到 new_data 以显示每个年龄的样本大小?换句话说,我想知道每个人中有多少人对每列的平均分数做出了贡献(理想情况下我会在“年龄”和“OnesMean”之间添加此列)
非常感谢,如果有,请告诉我有什么问题吗!
Here's my dataframe "fulldays" (The column header Plastic refers to the second column of numbers, the first column is just a numbered list of the rows that R puts in that I didn't know how to remove):
Plastic Age Ones Zeros Nonzeros CellsCounted AllDaysAvail
1 2 10 2 5 5 10 TRUE
2 57 8 4 2 8 10 TRUE
3 3 9 2 4 6 10 TRUE
4 81 9 3 1 9 10 TRUE
5 131 20 8 1 9 10 TRUE
6 5 8 5 5 5 10 TRUE
7 26 10 4 4 6 10 TRUE
8 76 12 2 6 4 10 TRUE
9 9 9 8 2 8 10 TRUE
10 36 14 2 5 5 10 TRUE
11 64 12 3 4 6 10 TRUE
12 74 22 5 4 6 10 TRUE
13 10 10 1 4 6 10 TRUE
14 21 9 7 3 7 10 TRUE
15 16 9 5 3 7 10 TRUE
17 18 8 4 3 7 10 TRUE
18 23 22 6 4 6 10 TRUE
19 106 11 2 1 9 10 TRUE
20 113 9 1 4 6 10 TRUE
21 24 11 2 5 5 10 TRUE
22 29 9 3 2 8 10 TRUE
23 85 9 6 4 6 10 TRUE
24 403 19 1 6 4 10 TRUE
25 25 19 1 2 8 10 TRUE
26 27 10 3 3 7 10 TRUE
27 121 7 7 3 7 10 TRUE
29 35 12 1 4 6 10 TRUE
30 39 18 2 6 4 10 TRUE
31 37 8 5 1 9 10 TRUE
32 63 7 8 2 8 10 TRUE
33 122 11 3 2 8 10 TRUE
34 148 9 4 4 6 10 TRUE
37 42 13 2 3 7 10 TRUE
38 144 12 0 9 1 10 TRUE
39 43 12 1 2 8 10 TRUE
40 47 20 6 4 6 10 TRUE
41 90 12 2 5 5 10 TRUE
42 119 12 2 4 6 10 TRUE
43 138 7 7 3 7 10 TRUE
44 56 4 7 3 7 10 TRUE
45 58 12 2 5 5 10 TRUE
46 60 22 3 4 6 10 TRUE
47 71 9 2 5 5 10 TRUE
48 288 18 0 10 0 10 TRUE
49 66 22 1 5 5 10 TRUE
50 67 9 0 8 2 10 TRUE
51 149 12 0 5 5 10 TRUE
52 70 14 5 4 6 10 TRUE
53 72 12 1 4 6 10 TRUE
54 78 12 0 4 6 10 TRUE
59 79 12 4 3 7 10 TRUE
60 83 11 4 4 6 10 TRUE
61 87 8 6 4 6 10 TRUE
63 92 11 1 4 6 10 TRUE
64 96 8 0 5 5 10 TRUE
65 125 7 7 3 7 10 TRUE
66 98 9 3 4 6 10 TRUE
67 107 6 2 3 7 10 TRUE
68 102 11 5 3 7 10 TRUE
69 103 10 0 1 9 10 TRUE
72 108 12 3 3 7 10 TRUE
73 153 12 4 3 7 10 TRUE
74 109 12 3 4 6 10 TRUE
75 118 10 4 5 5 10 TRUE
77 133 12 0 4 6 10 TRUE
79 157 8 0 10 0 10 TRUE
81 318 14 2 5 5 10 TRUE
I have this code:
new_data <- fulldays %>%
group_by(Age) %>%
summarize(OnesMean=mean(Ones), ZerosMean=mean(Zeros), NonZeroMean=mean(Nonzeros))
This is the output "new_data" (again, Age starts on the second column, not the first):
Age OnesMean ZerosMean NonZeroMean
<int> <dbl> <dbl> <dbl>
1 4 7 3 7
2 6 2 3 7
3 7 7.25 2.75 7.25
4 8 3.43 4.29 5.71
5 9 3.67 3.67 6.33
6 10 2.33 3.67 6.33
7 11 2.83 3.17 6.83
8 12 1.75 4.31 5.69
9 13 2 3 7
10 14 3 4.67 5.33
11 18 1 8 2
12 19 1 4 6
13 20 7 2.5 7.5
14 22 3.75 4.25 5.75
I have three questions:
- Why is groupby creating a tibble and not a dataframe?
- Why when I click on "new_data" as an object in the "Data" section does it display the values with so many decimal places (see image below)?
[1]: https://i.sstatic.net/724G9.png - How can I add/bind a column to new_data that shows the sample size for each age? In other words, I want to know how many of each individual is contributing to the mean score for each column (ideally I would add this column between "Age" and "OnesMean")
Thank you so much, and please let me know if there are any questions!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
tidyverse
与tibble
一起使用,它们仍然是data.frame
,但有一些差异。因此,当您在data.frame
上使用dplyr
函数时,它将变成tibble
data.frame
。mean
函数不会对结果进行四舍五入。如果您希望对它们进行舍入,则需要要求 R 执行此操作,例如使用OnesMean = round(mean(Ones), 2)
。n()
添加为summarise()
的参数之一。即,summarise(OnesMean = Mean(Ones), <...>, n())
。tidyverse
works withtibble
s, which are stilldata.frame
s with a couple of differences. So when you usedplyr
functions on adata.frame
, it will become atibble
data.frame
.mean
function doesn't round the result. If you want them to be rounded, you need to ask R to do that, withOnesMean = round(mean(Ones), 2)
, for example.n()
as one of the arguments ofsummarise()
. That is,summarise(OnesMean = mean(Ones), <...>, n())
.