将列添加到内部联接(data.table)
我将列添加到内部联接中,而不理解结果。
考虑表A,b:
a< - data.table(id = c(1,2,3),x_val = c(“ x1”,“ x2”,“ x3”)))
id x_val
# 1: 1 x1
# 2: 2 x2
# 3: 3 x3
< <代码> b&lt; - data.table(id = c(1,2,4),y_val = c(“ y1”,“ y2”,“ y3”))
# id y_val
# 1: 1 y1
# 2: 2 y2
# 3: 4 y3
现在考虑这些加入,前两个结合完全感觉。
a [b,on =。(id)]
# rows=3 This join is what I expect. The last row of B is included, but col A no has match.
# id x_val y_val
# <num> <char> <char>
# 1: 1 x1 y1
# 2: 2 x2 y2
# 3: 4 <NA> y3
#
a [b,on =。(id),nomatch = null]
# rows=2 To remove the unmatching row use nomatch=NULL (ie inner join)
# id x_val y_val
# <num> <char> <char>
# 1: 1 x1 y1
# 2: 2 x2 y2
现在惊喜 。
使用行完成,现在专注于列
:
在下面的每个列
和标签的情况下,行的数量不是。我希望每种情况下有2行。
我想念什么?
a [b,。(a.id = a $ id),on =。(id),nomatch = null]
# A.id
# <num>
# 1: 1
# 2: 2
# 3: 3
A[B, .(B$id), on=.(id), nomatch=NULL]#
# V1
# <num>
# 1: 1
# 2: 2
# 3: 4
A[B, .(A.id = A$id, B.id= B$id, A.x_val = A$x_val, B$y_val), on=.(id), nomatch=NULL]
# A.id B.id A.x_val V4
# <num> <num> <char> <char>
# 1: 1 1 x1 y1
# 2: 2 2 x2 y2
# 3: 3 4 x3 y3
I am adding columns to an inner join and do not understanding the result.
Consider tables A,B:
A <- data.table(id=c(1,2,3), x_val = c("x1", "x2", "x3"))
id x_val
# 1: 1 x1
# 2: 2 x2
# 3: 3 x3
B <- data.table(id=c(1,2,4), y_val = c("y1", "y2", "y3"))
# id y_val
# 1: 1 y1
# 2: 2 y2
# 3: 4 y3
Now consider these joins, first two make complete sense.
A[B, on=.(id)]
# rows=3 This join is what I expect. The last row of B is included, but col A no has match.
# id x_val y_val
# <num> <char> <char>
# 1: 1 x1 y1
# 2: 2 x2 y2
# 3: 4 <NA> y3
#
A[B, on=.(id), nomatch=NULL]
# rows=2 To remove the unmatching row use nomatch=NULL (ie inner join)
# id x_val y_val
# <num> <char> <char>
# 1: 1 x1 y1
# 2: 2 x2 y2
Now the surprise.
Done with rows, now focus on columns
:
In the cases below each of the columns
and labels are expected, but the number of rows is not. I expect 2 rows in each case.
What am I missing?
A[B, .(A.id = A$id), on=.(id), nomatch=NULL]
# A.id
# <num>
# 1: 1
# 2: 2
# 3: 3
A[B, .(B$id), on=.(id), nomatch=NULL]#
# V1
# <num>
# 1: 1
# 2: 2
# 3: 4
A[B, .(A.id = A$id, B.id= B$id, A.x_val = A$x_val, B$y_val), on=.(id), nomatch=NULL]
# A.id B.id A.x_val V4
# <num> <num> <char> <char>
# 1: 1 1 x1 y1
# 2: 2 2 x2 y2
# 3: 3 4 x3 y3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要在合并两个
data.table
s的同时选择列(就像您在这样做一样),您不应使用美元符号。您可以通过x。
和i。分别(见下文)。
您缺少的是,在示例中,您选择了原始数据集中的列(具有3行),而不是(内部)加入的数据集中的列,该数据集具有2行。
To select columns while merging two
data.table
s (like you are doing), you should not use a dollar symbol. You can prefix the names of the columns in A and B (in the merge of theA[B]
) byx.
andi.
respectively (see below).What you're missing is that, in your examples, you are selecting the columns in the original dataset (which has 3 rows) and not in the (inner)joined dataset which has 2 rows.