比较 R 中的 2 个数据集

发布于 2024-12-05 01:53:13 字数 451 浏览 1 评论 0原文

我从名为baby2009的数据集中提取了2个数据集(3个向量计数、姓名、性别),

其中一个是girls2009,包含所有女孩和其他boys2009。 我想知道男孩和女孩之间有哪些相似的名字。

我尝试过这个

common.names = (boys2009$name %in% girls2009$name)

当我尝试时,

babies2009[common.names, ] [1:10, ]

我得到的只是女孩的名字而不是常见的名字。

通过抽取 10 个样本,我已经确认这两个数据集确实分别包含男孩和女孩...

boys2009 [1:10,]
girsl2009 [1:10,]

我还能如何比较这两个数据集并确定它们共享的值。 谢谢,

I have 2 extracted data sets from a dataset called babies2009( 3 vectors count, name, gender )

One is girls2009 containing all the girls and the other boys2009.
I want to find out what similar names exist between boys and girls.

I tried this

common.names = (boys2009$name %in% girls2009$name)

When I try

babies2009[common.names, ] [1:10, ]

all I get is the girl names not the common names.

I have confirmed that both data sets indeed contain boys and girls respectively by doing taking a 10 sample...

boys2009 [1:10,]
girsl2009 [1:10,]

How else can I compare the 2 datasets and determine what values they both share.
Thanks,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小嗷兮 2024-12-12 01:53:13

common.names = (boys2009$name %in% girls2009$name) 为您提供长度为 length(boys2009$name) 的逻辑向量。因此,当您尝试从更长的 data.frame babies2009[common.names, ] [1:10, ] 中进行选择时,您会发现毫无意义。

解决方案:在正确的 data.frame 上使用该逻辑向量!

boys2009 <- data.frame( names=c("Billy","Bob"),data=runif(2), gender="M" , stringsAsFactors=FALSE)
girls2009 <- data.frame( names=c("Billy","Mae","Sue"),data=runif(3), gender="F" , stringsAsFactors=FALSE)
babies2009 <- rbind(boys2009,girls2009)

common.names <- (boys2009$name %in% girls2009$name)

> boys2009[common.names, ]$names
[1] "Billy"

common.names = (boys2009$name %in% girls2009$name) gives you a logical vector of length length(boys2009$name). So when you try selecting from a much longer data.frame babies2009[common.names, ] [1:10, ], you wind up with nonsense.

Solution: use that logical vector on the proper data.frame!

boys2009 <- data.frame( names=c("Billy","Bob"),data=runif(2), gender="M" , stringsAsFactors=FALSE)
girls2009 <- data.frame( names=c("Billy","Mae","Sue"),data=runif(3), gender="F" , stringsAsFactors=FALSE)
babies2009 <- rbind(boys2009,girls2009)

common.names <- (boys2009$name %in% girls2009$name)

> boys2009[common.names, ]$names
[1] "Billy"
怪我鬧 2024-12-12 01:53:13

由于您想要相似性但没有指定精确匹配,因此您应该考虑 agrep

sapply(boys2009$name , agrep,  girls2009$name, max = 0.1)

您可以调整 max.distance 参数以满足您的需要。

Since you want similarities but did not specify exact matches, you should consider agrep

sapply(boys2009$name , agrep,  girls2009$name, max = 0.1)

You can adjust the max.distance argument to suit your needs.

寂寞陪衬 2024-12-12 01:53:13

如何使用设置函数

list(
    `only boys` = setdiff(boys2009$name, girls2009$name),
    `common` = intersect(boys2009$name, girls2009$name),
    `only girls` = setdiff(girls2009$name, boys2009$name)
)

How about using set functions:

list(
    `only boys` = setdiff(boys2009$name, girls2009$name),
    `common` = intersect(boys2009$name, girls2009$name),
    `only girls` = setdiff(girls2009$name, boys2009$name)
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文