将列添加到内部联接（data.table）

发布于 2025-01-22 20:04:41 字数 1669 浏览 0 评论 0原文

我将列添加到内部联接中，而不理解结果。

考虑表A，b：

a＆lt; - data.table（id = c（1,2,3），x_val = c（“ x1”，“ x2”，“ x3”）））

    id x_val  
# 1:  1    x1  
# 2:  2    x2 
# 3:  3    x3

< <代码> b＆lt; - data.table（id = c（1,2,4），y_val = c（“ y1”，“ y2”，“ y3”））

#    id y_val
# 1:  1    y1
# 2:  2    y2
# 3:  4    y3

现在考虑这些加入，前两个结合完全感觉。

a [b，on =。（id）]

# rows=3 This join is what I expect.  The last row of B is included, but col A no has match. 

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2
# 3:     4   <NA>     y3
#

a [b，on =。（id），nomatch = null]

#   rows=2 To remove the unmatching row use nomatch=NULL (ie inner join)

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2

现在惊喜。
使用行完成，现在专注于列：
在下面的每个列和标签的情况下，行的数量不是。我希望每种情况下有2行。

我想念什么？

a [b，。（a.id = a $ id），on =。（id），nomatch = null]

#     A.id
#    <num>
# 1:     1
# 2:     2
# 3:     3
A[B, .(B$id), on=.(id), nomatch=NULL]#
#       V1
#    <num>
# 1:     1
# 2:     2
# 3:     4
A[B, .(A.id = A$id, B.id= B$id,  A.x_val = A$x_val, B$y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val     V4
#    <num> <num>  <char> <char>
# 1:     1     1      x1     y1
# 2:     2     2      x2     y2
# 3:     3     4      x3     y3

原文

I am adding columns to an inner join and do not understanding the result.

Consider tables A,B:

A <- data.table(id=c(1,2,3), x_val = c("x1", "x2", "x3"))

    id x_val  
# 1:  1    x1  
# 2:  2    x2 
# 3:  3    x3

B <- data.table(id=c(1,2,4), y_val = c("y1", "y2", "y3"))

#    id y_val
# 1:  1    y1
# 2:  2    y2
# 3:  4    y3

Now consider these joins, first two make complete sense.

A[B, on=.(id)]

# rows=3 This join is what I expect.  The last row of B is included, but col A no has match. 

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2
# 3:     4   <NA>     y3
#

A[B, on=.(id), nomatch=NULL]

#   rows=2 To remove the unmatching row use nomatch=NULL (ie inner join)

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2

Now the surprise.
Done with rows, now focus on columns:
In the cases below each of the columns and labels are expected, but the number of rows is not. I expect 2 rows in each case.

What am I missing?

A[B, .(A.id = A$id), on=.(id), nomatch=NULL]

#     A.id
#    <num>
# 1:     1
# 2:     2
# 3:     3
A[B, .(B$id), on=.(id), nomatch=NULL]#
#       V1
#    <num>
# 1:     1
# 2:     2
# 3:     4
A[B, .(A.id = A$id, B.id= B$id,  A.x_val = A$x_val, B$y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val     V4
#    <num> <num>  <char> <char>
# 1:     1     1      x1     y1
# 2:     2     2      x2     y2
# 3:     3     4      x3     y3

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

终陌 2025-01-29 20:04:42

要在合并两个data.table s的同时选择列（就像您在这样做一样），您不应使用美元符号。您可以通过x。和i。分别（见下文）。您缺少的是，在示例中，您选择了原始数据集中的列（具有3行），而不是（内部）加入的数据集中的列，该数据集具有2行。

A[B, .(A.id = x.id), on=.(id), nomatch=NULL]  # prefixing id with x. select id from A
#     A.id
# 1:     1
# 2:     2
A[B, .(i.id), on=.(id), nomatch=NULL]         # prefixing id with i. select id from B
#     i.id
# 1:     1
# 2:     2
A[B, .(A.id = x.id, B.id= i.id,  A.x_val = x.x_val, i.y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val i.y_val
# 1:     1     1      x1      y1
# 2:     2     2      x2      y2

To select columns while merging two data.tables (like you are doing), you should not use a dollar symbol. You can prefix the names of the columns in A and B (in the merge of the A[B]) by x. and i. respectively (see below).
What you're missing is that, in your examples, you are selecting the columns in the original dataset (which has 3 rows) and not in the (inner)joined dataset which has 2 rows.

A[B, .(A.id = x.id), on=.(id), nomatch=NULL]  # prefixing id with x. select id from A
#     A.id
# 1:     1
# 2:     2
A[B, .(i.id), on=.(id), nomatch=NULL]         # prefixing id with i. select id from B
#     i.id
# 1:     1
# 2:     2
A[B, .(A.id = x.id, B.id= i.id,  A.x_val = x.x_val, i.y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val i.y_val
# 1:     1     1      x1      y1
# 2:     2     2      x2      y2

回复收藏 0 原文

~没有更多了~