GGPLOT2代码错误：数据集中找不到的对象

发布于 2025-01-18 07:33:51 字数 2950 浏览 0 评论 0原文

我正在作为初学者学习 R，今天尝试使用以下代码生成一个图：

> dailyActivity_merged_2 %>%
+     group_by(ActivityDate) %>%
+     select(Actlevl == "High") %>%
+     summarise(average_distance = mean(TotalDistance)) %>%
+     ggplot() + geom_col(mapping= aes(x=ActivityDate, y=average_distance, fill = average_distance)) + scale_fill_gradient(low = "yellow", high = "red") +
+     theme(axis.text.x = element_text(angle = 90)) +
+     labs(title="Average Distance vs. Time")

返回的结果包含以下消息，但我非常确定我想在数据集中选择的列名为“Actlevl”。我不知道为什么它一直说找不到对象。 select(Actlevl == "High") 中的错误：未找到对象“Actlevl”

我做错了什么吗？也许我不应该使用 select() 来选择数据值？我正在尝试选择 Actlevl 列中带有“高”的行。

非常感谢您的帮助。

数据集图像如下：在此处输入图像描述

Sebset 数据示例：

> dput(dailyActivity_merged_2[1:35,c(1:5)])
structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1624580081, 1624580081, 1624580081, 
1624580081), Actlevl = c("High", "High", "High", "High", "High", 
"High", "High", "High", "High", "High", "High", "High", "High", 
"High", "High", "High", "High", "High", "High", "High", "High", 
"High", "High", "High", "High", "High", "High", "High", "High", 
"High", "High", "Low", "Low", "Low", "Low"), ActivityDate = c("4/12/2016", 
"4/13/2016", "4/14/2016", "4/15/2016", "4/16/2016", "4/17/2016", 
"4/18/2016", "4/19/2016", "4/20/2016", "4/21/2016", "4/22/2016", 
"4/23/2016", "4/24/2016", "4/25/2016", "4/26/2016", "4/27/2016", 
"4/28/2016", "4/29/2016", "4/30/2016", "5/1/2016", "5/2/2016", 
"5/3/2016", "5/4/2016", "5/5/2016", "5/6/2016", "5/7/2016", "5/8/2016", 
"5/9/2016", "5/10/2016", "5/11/2016", "5/12/2016", "4/12/2016", 
"4/13/2016", "4/14/2016", "4/15/2016"), TotalSteps = c(13162, 
10735, 10460, 9762, 12669, 9705, 13019, 15506, 10544, 9819, 12764, 
14371, 10039, 15355, 13755, 18134, 13154, 11181, 14673, 10602, 
14727, 15103, 11100, 14070, 12159, 11992, 10060, 12022, 12207, 
12770, 0, 8163, 7007, 9107, 1510), TotalDistance = c(8.5, 6.96999979, 
6.739999771, 6.28000021, 8.159999847, 6.480000019, 8.590000153, 
9.880000114, 6.679999828, 6.340000153, 8.130000114, 9.039999962, 
6.409999847, 9.800000191, 8.789999962, 12.21000004, 8.529999733, 
7.150000095, 9.25, 6.809999943, 9.710000038, 9.659999847, 7.150000095, 
8.899999619, 8.029999733, 7.710000038, 6.579999924, 7.71999979, 
7.769999981, 8.130000114, 0, 5.309999943, 4.550000191, 5.920000076, 
0.9800000191)), row.names = c(NA, -35L), class = c("tbl_df", 
"tbl", "data.frame"))

I tried to write the ggplot2 code as above but it keeps running error.

原文

I am learning R as a beginner and am trying to generate a plot today by using the following code:

> dailyActivity_merged_2 %>%
+     group_by(ActivityDate) %>%
+     select(Actlevl == "High") %>%
+     summarise(average_distance = mean(TotalDistance)) %>%
+     ggplot() + geom_col(mapping= aes(x=ActivityDate, y=average_distance, fill = average_distance)) + scale_fill_gradient(low = "yellow", high = "red") +
+     theme(axis.text.x = element_text(angle = 90)) +
+     labs(title="Average Distance vs. Time")

The outcome returned with the following message, but I am very sure the column I would like to choose in the dataset is named "Actlevl". I am not sure why it keeps saying object not found.
Error in select(Actlevl == "High") : object 'Actlevl' not found

Did I do something wrong? Maybe I should not use select() to choose the data value?
I am trying to select the rows with "High" in column Actlevl.

Thank you so much for your help.

Dataset image is like below:
enter image description here

Sebset data example:

> dput(dailyActivity_merged_2[1:35,c(1:5)])
structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1624580081, 1624580081, 1624580081, 
1624580081), Actlevl = c("High", "High", "High", "High", "High", 
"High", "High", "High", "High", "High", "High", "High", "High", 
"High", "High", "High", "High", "High", "High", "High", "High", 
"High", "High", "High", "High", "High", "High", "High", "High", 
"High", "High", "Low", "Low", "Low", "Low"), ActivityDate = c("4/12/2016", 
"4/13/2016", "4/14/2016", "4/15/2016", "4/16/2016", "4/17/2016", 
"4/18/2016", "4/19/2016", "4/20/2016", "4/21/2016", "4/22/2016", 
"4/23/2016", "4/24/2016", "4/25/2016", "4/26/2016", "4/27/2016", 
"4/28/2016", "4/29/2016", "4/30/2016", "5/1/2016", "5/2/2016", 
"5/3/2016", "5/4/2016", "5/5/2016", "5/6/2016", "5/7/2016", "5/8/2016", 
"5/9/2016", "5/10/2016", "5/11/2016", "5/12/2016", "4/12/2016", 
"4/13/2016", "4/14/2016", "4/15/2016"), TotalSteps = c(13162, 
10735, 10460, 9762, 12669, 9705, 13019, 15506, 10544, 9819, 12764, 
14371, 10039, 15355, 13755, 18134, 13154, 11181, 14673, 10602, 
14727, 15103, 11100, 14070, 12159, 11992, 10060, 12022, 12207, 
12770, 0, 8163, 7007, 9107, 1510), TotalDistance = c(8.5, 6.96999979, 
6.739999771, 6.28000021, 8.159999847, 6.480000019, 8.590000153, 
9.880000114, 6.679999828, 6.340000153, 8.130000114, 9.039999962, 
6.409999847, 9.800000191, 8.789999962, 12.21000004, 8.529999733, 
7.150000095, 9.25, 6.809999943, 9.710000038, 9.659999847, 7.150000095, 
8.899999619, 8.029999733, 7.710000038, 6.579999924, 7.71999979, 
7.769999981, 8.130000114, 0, 5.309999943, 4.550000191, 5.920000076, 
0.9800000191)), row.names = c(NA, -35L), class = c("tbl_df", 
"tbl", "data.frame"))

I tried to write the ggplot2 code as above but it keeps running error.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

箜明 2025-01-25 07:33:51

我可以发现两个问题：

您正在使用选择而不是filter。选择是选择一列，filter选择与某个要求匹配的行。
当您使用总结时，您将丢失所有以前未在group_by中列出的列。

这是我解决问题的尝试。它有效，但使用right_join和filter再次恢复丢失的列。谁能使情况变得更好？

library(ggplot2)

dailyActivity_merged_2 %>%
  group_by(ActivityDate) %>%
  filter(Actlevl == "High") %>%
  summarise(average_distance = mean(TotalDistance)) %>%
  right_join(dailyActivity_merged_2) %>% 
  filter(Actlevl == "High") %>%
  ggplot() +
  geom_col(mapping = aes(x = ActivityDate, y = average_distance, fill = average_distance)) +
  scale_fill_gradient(low = "yellow", high = "red") +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(title = "Average Distance vs. Time")

输出：

there are two issues that I can spot:

You're using select instead of filter. Select is to pick a column, filter to pick rows that match a certain requirement.
When you use summarise, you lose all previous columns that are not listed in group_by.

This is my attempt at fixing the issue. It works but it's a bit verbose, using right_join and filtering again in order to recover the lost columns. Can anyone make this better?

library(ggplot2)

dailyActivity_merged_2 %>%
  group_by(ActivityDate) %>%
  filter(Actlevl == "High") %>%
  summarise(average_distance = mean(TotalDistance)) %>%
  right_join(dailyActivity_merged_2) %>% 
  filter(Actlevl == "High") %>%
  ggplot() +
  geom_col(mapping = aes(x = ActivityDate, y = average_distance, fill = average_distance)) +
  scale_fill_gradient(low = "yellow", high = "red") +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(title = "Average Distance vs. Time")

Output: