合并行在选定的列中共享相同的观察结果
我正在清洁数据集,并且在清洁重复项后,我想合并在特定列中共享相同观察的行(例如ID列)。
我希望合并/聚合,以便每个选定的观察结果只有一排(即:每个ID 一行)。 如果可能的话,汇总行将总结所有观察值,但选择合并的观测值(ID)。
这将是假设的设置:
set.seed(18)
dat <- data.frame(ID=c(1,2,1,2,2,3),value=c(5,5,7,8,3,2),location=c("NY","LA","NY","LA","LA","LA"))
dat
我想知道如何获得
set.seed(9)
dat1 <- data.frame(id=c(1,2,3),value=c(5+7,5+8+3,2),location=c("NY","LA","LA"))
dat1
与ID相对于ID的汇总,将观测值“值”总结并选择相应的位置。
另外,我想知道是否可以将数据框架分组有关位置,例如获取:
set.seed(6)
dat2 <- data.frame(location=c("NY","LA"),value=c(5+7,5+8+3+2),meanvalue=c(mean(5+7),mean(5+8+3+2)))
dat2
我没有将ID放入该表中,因为在这种情况下,它并不重要:可以求和或删除,它是不会考虑任何进一步的计算。 我知道我的卑鄙的输出是错误的:我希望获得所有行共享相同位置的平均值(即洛杉矶和纽约的平均值)。如果您还可以在这一方面纠正我,我将不胜感激。
感谢您的帮助!
I'm cleaning a data set and after cleaning duplicates, I would like to merge the rows that share the same observation in a specific column (e.g. ID column).
I am looking to merge/aggregate so that I only have one row per chosen observation (i.e. here: one row per ID) left.
If possible, the aggregate row would sum-up all observations but the chosen one to merge (ID).
This would be hypothetical settings:
set.seed(18)
dat <- data.frame(ID=c(1,2,1,2,2,3),value=c(5,5,7,8,3,2),location=c("NY","LA","NY","LA","LA","LA"))
dat
And I would like to know how to obtain
set.seed(9)
dat1 <- data.frame(id=c(1,2,3),value=c(5+7,5+8+3,2),location=c("NY","LA","LA"))
dat1
Which aggregate with respect to ID, sum the observations "value" and pick the corresponding location.
Also, I would like to know if it's possible to group the dataframe with respect to location, such as to obtain:
set.seed(6)
dat2 <- data.frame(location=c("NY","LA"),value=c(5+7,5+8+3+2),meanvalue=c(mean(5+7),mean(5+8+3+2)))
dat2
I did not put ID in this table because in this case, it does not matter: it can be summed or deleted, it's not going to be taken into account for any further computation.
I know that my output for meanvalue is wrong: I am looking to get the mean of all rows sharing the same location (i.e. mean for LA and NY). I would appreciate if you also can correct me on this one.
Thank you for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我看到您包括
set.seed
,但没有看到任何采样或随机过程(除非我错过了什么)。使用
tidyverse
的一种方法是以下内容。让我知道这是否是您的想法。对于第一部分,请使用
group_by
基于value
基于ID
和位置
:输出 /strong>
在第二部分中,如果您
group_by
位置
,则可以使用sum
and 和 mean总结
:输出
I see that you included
set.seed
but did not see any sampling or randomized procedures (unless I missed something).One approach with
tidyverse
is the following. Let me know if this is what you had in mind.For the first part, use
group_by
to sum thevalue
based onID
andlocation
:Output
For the second part, if you
group_by
thelocation
, you can then usesum
andmean
withsummarise
:Output