data.frame子集长格式

发布于 2024-08-04 19:25:37 字数 373 浏览 0 评论 0原文

我想这个问题会有一个非常简单的答案。但就这样吧。

长格式数据。像这样

d <- data.frame(cbind(numbers = rnorm(10), 
                         year = rep(c(2008, 2009), 5), 
                         name = c("john", "David", "Tom", "Kristin", "Lisa","Eve","David","Tom","Kristin","Lisa")))

我如何获得一个仅包含 2008 年和 2009 年出现的名称行的新数据框? (即只有大卫、克里斯汀、丽莎和汤姆)。

提前致谢

I guess there will be a very simple answer to this. But here goes.

Data in long format. like this

d <- data.frame(cbind(numbers = rnorm(10), 
                         year = rep(c(2008, 2009), 5), 
                         name = c("john", "David", "Tom", "Kristin", "Lisa","Eve","David","Tom","Kristin","Lisa")))

How do I get a new dataframe only with rows for names that occur in both 2008 and 2009? (i.e. with only David, Kristin, Lisa and Tom).

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

榕城若虚 2024-08-11 19:25:37

简单的方法:

subset(
    d,
    name %in% intersect(name[year==2008], name[year==2009])
)

Simple way:

subset(
    d,
    name %in% intersect(name[year==2008], name[year==2009])
)
爱的故事 2024-08-11 19:25:37

一种方法是使用 reshape 包创建一个 data.frame,其中列中包含年份,行中包含名称:

library(reshape)
cast(d, name ~ year, value = "numbers")

然后您可以使用 complete.cases 提取感兴趣的行。

One approach is to use the reshape package to create a data.frame with years in columns and names in rows:

library(reshape)
cast(d, name ~ year, value = "numbers")

You could then use complete.cases to extract the rows of interest.

听闻余生 2024-08-11 19:25:37

如果一年只有一条记录,只需统计每个人在数据集中出现的次数:

counts <- as.data.frame(table(name = d$name))

然后查找出现两次的每个人:

subset(counts, Freq == 2)

If there is only one record per year, just count up the number of times each person appears in the dataset:

counts <- as.data.frame(table(name = d$name))

Then look for everyone who appeared twice:

subset(counts, Freq == 2)
仙女 2024-08-11 19:25:37

这是另一个仅使用基本 R 的解决方案,并且不对一个人每年拥有的记录数做出任何假设:

d <- data.frame(cbind(numbers = rnorm(10), 
                      year = rep(c(2008, 2009), 5),
                      name = c("john", "David", "Tom", "Kristin",
                               "Lisa","Eve","David","Tom","Kristin",
                               "Lisa")))
# split data into 2 data.frames (1 for each year)
by.year <- split(d, d$year, drop=T)

# find the names that appear in both years
keep <- intersect(by.year[['2008']]$name, by.year[['2009']]$name)
# Or, if you had several years, use Reduce as a more general solution:
keep <- Reduce(intersect, lapply(by.year, '[[', 'name'))

# show the rows of the original dataset only if their $name field
# is in our 'keep' vector
d[d$name %in% keep,]

Here's another solution that uses just base R and doesn't make any assumptions about the number of records a person has per year:

d <- data.frame(cbind(numbers = rnorm(10), 
                      year = rep(c(2008, 2009), 5),
                      name = c("john", "David", "Tom", "Kristin",
                               "Lisa","Eve","David","Tom","Kristin",
                               "Lisa")))
# split data into 2 data.frames (1 for each year)
by.year <- split(d, d$year, drop=T)

# find the names that appear in both years
keep <- intersect(by.year[['2008']]$name, by.year[['2009']]$name)
# Or, if you had several years, use Reduce as a more general solution:
keep <- Reduce(intersect, lapply(by.year, '[[', 'name'))

# show the rows of the original dataset only if their $name field
# is in our 'keep' vector
d[d$name %in% keep,]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文