将列添加到内部联接(data.table)

发布于 2025-01-22 20:04:41 字数 1669 浏览 0 评论 0原文

我将列添加到内部联接中,而不理解结果。

考虑表A,b:

a< - data.table(id = c(1,2,3),x_val = c(“ x1”,“ x2”,“ x3”)))

    id x_val  
# 1:  1    x1  
# 2:  2    x2 
# 3:  3    x3

< <代码> b&lt; - data.table(id = c(1,2,4),y_val = c(“ y1”,“ y2”,“ y3”))

#    id y_val
# 1:  1    y1
# 2:  2    y2
# 3:  4    y3

现在考虑这些加入,前两个结合完全感觉。

a [b,on =。(id)]

# rows=3 This join is what I expect.  The last row of B is included, but col A no has match. 

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2
# 3:     4   <NA>     y3
#


a [b,on =。(id),nomatch = null]

#   rows=2 To remove the unmatching row use nomatch=NULL (ie inner join)

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2

现在惊喜
使用行完成,现在专注于
在下面的每个和标签的情况下,行的数量不是。我希望每种情况下有2行。

我想念什么?

a [b,。(a.id = a $ id),on =。(id),nomatch = null]

#     A.id
#    <num>
# 1:     1
# 2:     2
# 3:     3
A[B, .(B$id), on=.(id), nomatch=NULL]#
#       V1
#    <num>
# 1:     1
# 2:     2
# 3:     4
A[B, .(A.id = A$id, B.id= B$id,  A.x_val = A$x_val, B$y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val     V4
#    <num> <num>  <char> <char>
# 1:     1     1      x1     y1
# 2:     2     2      x2     y2
# 3:     3     4      x3     y3

I am adding columns to an inner join and do not understanding the result.

Consider tables A,B:

A <- data.table(id=c(1,2,3), x_val = c("x1", "x2", "x3"))

    id x_val  
# 1:  1    x1  
# 2:  2    x2 
# 3:  3    x3

B <- data.table(id=c(1,2,4), y_val = c("y1", "y2", "y3"))

#    id y_val
# 1:  1    y1
# 2:  2    y2
# 3:  4    y3

Now consider these joins, first two make complete sense.

A[B, on=.(id)]

# rows=3 This join is what I expect.  The last row of B is included, but col A no has match. 

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2
# 3:     4   <NA>     y3
#


A[B, on=.(id), nomatch=NULL]

#   rows=2 To remove the unmatching row use nomatch=NULL (ie inner join)

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2

Now the surprise.
Done with rows, now focus on columns:
In the cases below each of the columns and labels are expected, but the number of rows is not. I expect 2 rows in each case.

What am I missing?

A[B, .(A.id = A$id), on=.(id), nomatch=NULL]

#     A.id
#    <num>
# 1:     1
# 2:     2
# 3:     3
A[B, .(B$id), on=.(id), nomatch=NULL]#
#       V1
#    <num>
# 1:     1
# 2:     2
# 3:     4
A[B, .(A.id = A$id, B.id= B$id,  A.x_val = A$x_val, B$y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val     V4
#    <num> <num>  <char> <char>
# 1:     1     1      x1     y1
# 2:     2     2      x2     y2
# 3:     3     4      x3     y3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

终陌 2025-01-29 20:04:42

要在合并两个data.table s的同时选择列(就像您在这样做一样),您不应使用美元符号。您可以通过x。i。分别(见下文)。
您缺少的是,在示例中,您选择了原始数据集中的列(具有3行),而不是(内部)加入的数据集中的列,该数据集具有2行。

A[B, .(A.id = x.id), on=.(id), nomatch=NULL]  # prefixing id with x. select id from A
#     A.id
# 1:     1
# 2:     2
A[B, .(i.id), on=.(id), nomatch=NULL]         # prefixing id with i. select id from B
#     i.id
# 1:     1
# 2:     2
A[B, .(A.id = x.id, B.id= i.id,  A.x_val = x.x_val, i.y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val i.y_val
# 1:     1     1      x1      y1
# 2:     2     2      x2      y2

To select columns while merging two data.tables (like you are doing), you should not use a dollar symbol. You can prefix the names of the columns in A and B (in the merge of the A[B]) by x. and i. respectively (see below).
What you're missing is that, in your examples, you are selecting the columns in the original dataset (which has 3 rows) and not in the (inner)joined dataset which has 2 rows.

A[B, .(A.id = x.id), on=.(id), nomatch=NULL]  # prefixing id with x. select id from A
#     A.id
# 1:     1
# 2:     2
A[B, .(i.id), on=.(id), nomatch=NULL]         # prefixing id with i. select id from B
#     i.id
# 1:     1
# 2:     2
A[B, .(A.id = x.id, B.id= i.id,  A.x_val = x.x_val, i.y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val i.y_val
# 1:     1     1      x1      y1
# 2:     2     2      x2      y2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文