GGPLOT2代码错误:数据集中找不到的对象
我正在作为初学者学习 R,今天尝试使用以下代码生成一个图:
> dailyActivity_merged_2 %>%
+ group_by(ActivityDate) %>%
+ select(Actlevl == "High") %>%
+ summarise(average_distance = mean(TotalDistance)) %>%
+ ggplot() + geom_col(mapping= aes(x=ActivityDate, y=average_distance, fill = average_distance)) + scale_fill_gradient(low = "yellow", high = "red") +
+ theme(axis.text.x = element_text(angle = 90)) +
+ labs(title="Average Distance vs. Time")
返回的结果包含以下消息,但我非常确定我想在数据集中选择的列名为“Actlevl”。我不知道为什么它一直说找不到对象。 select(Actlevl == "High") 中的错误:未找到对象“Actlevl”
我做错了什么吗?也许我不应该使用 select() 来选择数据值? 我正在尝试选择 Actlevl 列中带有“高”的行。
非常感谢您的帮助。
数据集图像如下:
Sebset 数据示例:
> dput(dailyActivity_merged_2[1:35,c(1:5)])
structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1624580081, 1624580081, 1624580081,
1624580081), Actlevl = c("High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "Low", "Low", "Low", "Low"), ActivityDate = c("4/12/2016",
"4/13/2016", "4/14/2016", "4/15/2016", "4/16/2016", "4/17/2016",
"4/18/2016", "4/19/2016", "4/20/2016", "4/21/2016", "4/22/2016",
"4/23/2016", "4/24/2016", "4/25/2016", "4/26/2016", "4/27/2016",
"4/28/2016", "4/29/2016", "4/30/2016", "5/1/2016", "5/2/2016",
"5/3/2016", "5/4/2016", "5/5/2016", "5/6/2016", "5/7/2016", "5/8/2016",
"5/9/2016", "5/10/2016", "5/11/2016", "5/12/2016", "4/12/2016",
"4/13/2016", "4/14/2016", "4/15/2016"), TotalSteps = c(13162,
10735, 10460, 9762, 12669, 9705, 13019, 15506, 10544, 9819, 12764,
14371, 10039, 15355, 13755, 18134, 13154, 11181, 14673, 10602,
14727, 15103, 11100, 14070, 12159, 11992, 10060, 12022, 12207,
12770, 0, 8163, 7007, 9107, 1510), TotalDistance = c(8.5, 6.96999979,
6.739999771, 6.28000021, 8.159999847, 6.480000019, 8.590000153,
9.880000114, 6.679999828, 6.340000153, 8.130000114, 9.039999962,
6.409999847, 9.800000191, 8.789999962, 12.21000004, 8.529999733,
7.150000095, 9.25, 6.809999943, 9.710000038, 9.659999847, 7.150000095,
8.899999619, 8.029999733, 7.710000038, 6.579999924, 7.71999979,
7.769999981, 8.130000114, 0, 5.309999943, 4.550000191, 5.920000076,
0.9800000191)), row.names = c(NA, -35L), class = c("tbl_df",
"tbl", "data.frame"))
I tried to write the ggplot2 code as above but it keeps running error.
I am learning R as a beginner and am trying to generate a plot today by using the following code:
> dailyActivity_merged_2 %>%
+ group_by(ActivityDate) %>%
+ select(Actlevl == "High") %>%
+ summarise(average_distance = mean(TotalDistance)) %>%
+ ggplot() + geom_col(mapping= aes(x=ActivityDate, y=average_distance, fill = average_distance)) + scale_fill_gradient(low = "yellow", high = "red") +
+ theme(axis.text.x = element_text(angle = 90)) +
+ labs(title="Average Distance vs. Time")
The outcome returned with the following message, but I am very sure the column I would like to choose in the dataset is named "Actlevl". I am not sure why it keeps saying object not found.
Error in select(Actlevl == "High") : object 'Actlevl' not found
Did I do something wrong? Maybe I should not use select() to choose the data value?
I am trying to select the rows with "High" in column Actlevl.
Thank you so much for your help.
Dataset image is like below:
Sebset data example:
> dput(dailyActivity_merged_2[1:35,c(1:5)])
structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1624580081, 1624580081, 1624580081,
1624580081), Actlevl = c("High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "Low", "Low", "Low", "Low"), ActivityDate = c("4/12/2016",
"4/13/2016", "4/14/2016", "4/15/2016", "4/16/2016", "4/17/2016",
"4/18/2016", "4/19/2016", "4/20/2016", "4/21/2016", "4/22/2016",
"4/23/2016", "4/24/2016", "4/25/2016", "4/26/2016", "4/27/2016",
"4/28/2016", "4/29/2016", "4/30/2016", "5/1/2016", "5/2/2016",
"5/3/2016", "5/4/2016", "5/5/2016", "5/6/2016", "5/7/2016", "5/8/2016",
"5/9/2016", "5/10/2016", "5/11/2016", "5/12/2016", "4/12/2016",
"4/13/2016", "4/14/2016", "4/15/2016"), TotalSteps = c(13162,
10735, 10460, 9762, 12669, 9705, 13019, 15506, 10544, 9819, 12764,
14371, 10039, 15355, 13755, 18134, 13154, 11181, 14673, 10602,
14727, 15103, 11100, 14070, 12159, 11992, 10060, 12022, 12207,
12770, 0, 8163, 7007, 9107, 1510), TotalDistance = c(8.5, 6.96999979,
6.739999771, 6.28000021, 8.159999847, 6.480000019, 8.590000153,
9.880000114, 6.679999828, 6.340000153, 8.130000114, 9.039999962,
6.409999847, 9.800000191, 8.789999962, 12.21000004, 8.529999733,
7.150000095, 9.25, 6.809999943, 9.710000038, 9.659999847, 7.150000095,
8.899999619, 8.029999733, 7.710000038, 6.579999924, 7.71999979,
7.769999981, 8.130000114, 0, 5.309999943, 4.550000191, 5.920000076,
0.9800000191)), row.names = c(NA, -35L), class = c("tbl_df",
"tbl", "data.frame"))
I tried to write the ggplot2 code as above but it keeps running error.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我可以发现两个问题:
您正在使用
选择
而不是filter
。选择是选择一列,filter
选择与某个要求匹配的行。当您使用
总结
时,您将丢失所有以前未在group_by
中列出的列。这是我解决问题的尝试。它有效,但使用
right_join
和filter
再次恢复丢失的列。谁能使情况变得更好?输出:
there are two issues that I can spot:
You're using
select
instead offilter
. Select is to pick a column,filter
to pick rows that match a certain requirement.When you use
summarise
, you lose all previous columns that are not listed ingroup_by
.This is my attempt at fixing the issue. It works but it's a bit verbose, using
right_join
andfilter
ing again in order to recover the lost columns. Can anyone make this better?Output: