data.frame子集长格式
我想这个问题会有一个非常简单的答案。但就这样吧。
长格式数据。像这样
d <- data.frame(cbind(numbers = rnorm(10),
year = rep(c(2008, 2009), 5),
name = c("john", "David", "Tom", "Kristin", "Lisa","Eve","David","Tom","Kristin","Lisa")))
我如何获得一个仅包含 2008 年和 2009 年出现的名称行的新数据框? (即只有大卫、克里斯汀、丽莎和汤姆)。
提前致谢
I guess there will be a very simple answer to this. But here goes.
Data in long format. like this
d <- data.frame(cbind(numbers = rnorm(10),
year = rep(c(2008, 2009), 5),
name = c("john", "David", "Tom", "Kristin", "Lisa","Eve","David","Tom","Kristin","Lisa")))
How do I get a new dataframe only with rows for names that occur in both 2008 and 2009? (i.e. with only David, Kristin, Lisa and Tom).
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
简单的方法:
Simple way:
一种方法是使用 reshape 包创建一个 data.frame,其中列中包含年份,行中包含名称:
然后您可以使用
complete.cases
提取感兴趣的行。One approach is to use the reshape package to create a data.frame with years in columns and names in rows:
You could then use
complete.cases
to extract the rows of interest.如果一年只有一条记录,只需统计每个人在数据集中出现的次数:
然后查找出现两次的每个人:
If there is only one record per year, just count up the number of times each person appears in the dataset:
Then look for everyone who appeared twice:
这是另一个仅使用基本 R 的解决方案,并且不对一个人每年拥有的记录数做出任何假设:
Here's another solution that uses just base R and doesn't make any assumptions about the number of records a person has per year: