比较 R 中的 2 个数据集
我从名为baby2009的数据集中提取了2个数据集(3个向量计数、姓名、性别),
其中一个是girls2009,包含所有女孩和其他boys2009。 我想知道男孩和女孩之间有哪些相似的名字。
我尝试过这个
common.names = (boys2009$name %in% girls2009$name)
当我尝试时,
babies2009[common.names, ] [1:10, ]
我得到的只是女孩的名字而不是常见的名字。
通过抽取 10 个样本,我已经确认这两个数据集确实分别包含男孩和女孩...
boys2009 [1:10,]
girsl2009 [1:10,]
我还能如何比较这两个数据集并确定它们共享的值。 谢谢,
I have 2 extracted data sets from a dataset called babies2009( 3 vectors count, name, gender )
One is girls2009 containing all the girls and the other boys2009.
I want to find out what similar names exist between boys and girls.
I tried this
common.names = (boys2009$name %in% girls2009$name)
When I try
babies2009[common.names, ] [1:10, ]
all I get is the girl names not the common names.
I have confirmed that both data sets indeed contain boys and girls respectively by doing taking a 10 sample...
boys2009 [1:10,]
girsl2009 [1:10,]
How else can I compare the 2 datasets and determine what values they both share.
Thanks,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
common.names = (boys2009$name %in% girls2009$name)
为您提供长度为length(boys2009$name)
的逻辑向量。因此,当您尝试从更长的 data.framebabies2009[common.names, ] [1:10, ]
中进行选择时,您会发现毫无意义。解决方案:在正确的 data.frame 上使用该逻辑向量!
common.names = (boys2009$name %in% girls2009$name)
gives you a logical vector of lengthlength(boys2009$name)
. So when you try selecting from a much longer data.framebabies2009[common.names, ] [1:10, ]
, you wind up with nonsense.Solution: use that logical vector on the proper data.frame!
由于您想要相似性但没有指定精确匹配,因此您应该考虑
agrep
您可以调整 max.distance 参数以满足您的需要。
Since you want similarities but did not specify exact matches, you should consider
agrep
You can adjust the max.distance argument to suit your needs.
如何使用设置函数:
How about using set functions: