apache pig 中的子查询

发布于 2024-12-25 14:16:08 字数 354 浏览 2 评论 0原文

寻找有关在 apache pig 中编写子查询的一些帮助。例如,我有以下两个关系 -

A
sam 12 grad maths
sony 13 postgrad english

B
maths {(4.5,sam),(4,david)}
english {(4.2,peter),(3.9,rob)}

按主题连接这两个关系,即 A 按 A.$3 和 B 按 B.$0 连接,并且必须编写查询,输出将为 -

sam 12 grad maths 4.5
sony 13 postgrad english 

基本上它应该检查 B 中的匹配主题和然后在里面寻找名字。

Looking for some help on writing sub query in apache pig. For example I have the below two relations -

A
sam 12 grad maths
sony 13 postgrad english

B
maths {(4.5,sam),(4,david)}
english {(4.2,peter),(3.9,rob)}

Join the two relations by subject i.e. A by A.$3 and B by B.$0 and have to write query which will give the output as -

sam 12 grad maths 4.5
sony 13 postgrad english 

Basically it should check for the matching subject in B and then look for name in it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

影子是时光的心 2025-01-01 14:16:08

我解决这个问题的方法是 展平 B 关系,然后对 A 进行左外连接。

首先,将关系展平:

C = FOREACH B GENERATE $0, FLATTEN($1);

这会将您的日期转换为:

maths, 4.5, sam
maths, 4, david
english, 4.2, peter
english, 3.9, rob

现在,您可以执行 JOIN 将数据放在一起(我正在重命名这个东西为了保持头脑清醒):

J = JOIN A BY (Aname, Asubject), C BY (Bname, Bsubject);
J2 = FOREACH J GENERATE Astudent, Agrade, Alevel, Asubject, Bscore;

dump J2 将仅输出 sam, 12, grad, maths, 4.5

但是,有一个问题。如果列表 A 中的项目没有出现在列表 B 中,您似乎希望拥有 NULL 值。这是 LEFT OUTER join,幸运的是,Pig 可以进行外连接。将上面的代码修改如下:

J = JOIN A BY (Aname, Asubject) LEFT OUTER, C BY (Bname, Bsubject);
J2 = FOREACH J GENERATE Astudent, Agrade, Alevel, Asubject, Bscore;

dump J2 这里会输出,我想这就是你想要的:

sam, 12, grad, maths, 4.5
sony, 13, postgrad, english, 

The way I would approach this is to flatten the B relation, then do a left outer join onto A.

First, to flatten the relation out:

C = FOREACH B GENERATE $0, FLATTEN($1);

This translates your date into:

maths, 4.5, sam
maths, 4, david
english, 4.2, peter
english, 3.9, rob

Now, you can just do a JOIN to bring the data together (I'm renaming this stuff to keep my head straight):

J = JOIN A BY (Aname, Asubject), C BY (Bname, Bsubject);
J2 = FOREACH J GENERATE Astudent, Agrade, Alevel, Asubject, Bscore;

dump J2 will output only sam, 12, grad, maths, 4.5.

But, there is a problem. It looks like you want to have a NULL value if your item in list A does not show up in list B. This is a job for a LEFT OUTER join, and luckily, Pig can do outer joins. Modify the above code as follows:

J = JOIN A BY (Aname, Asubject) LEFT OUTER, C BY (Bname, Bsubject);
J2 = FOREACH J GENERATE Astudent, Agrade, Alevel, Asubject, Bscore;

dump J2 here will output, which is what I think you want:

sam, 12, grad, maths, 4.5
sony, 13, postgrad, english, 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文