apache pig 中的子查询
寻找有关在 apache pig 中编写子查询的一些帮助。例如,我有以下两个关系 -
A
sam 12 grad maths
sony 13 postgrad english
B
maths {(4.5,sam),(4,david)}
english {(4.2,peter),(3.9,rob)}
按主题连接这两个关系,即 A 按 A.$3 和 B 按 B.$0 连接,并且必须编写查询,输出将为 -
sam 12 grad maths 4.5
sony 13 postgrad english
基本上它应该检查 B 中的匹配主题和然后在里面寻找名字。
Looking for some help on writing sub query in apache pig. For example I have the below two relations -
A
sam 12 grad maths
sony 13 postgrad english
B
maths {(4.5,sam),(4,david)}
english {(4.2,peter),(3.9,rob)}
Join the two relations by subject i.e. A by A.$3 and B by B.$0 and have to write query which will give the output as -
sam 12 grad maths 4.5
sony 13 postgrad english
Basically it should check for the matching subject in B and then look for name in it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我解决这个问题的方法是 展平
B
关系,然后对 A 进行左外连接。首先,将关系展平:
这会将您的日期转换为:
现在,您可以执行 JOIN 将数据放在一起(我正在重命名这个东西为了保持头脑清醒):
dump J2
将仅输出sam, 12, grad, maths, 4.5
。但是,有一个问题。如果列表 A 中的项目没有出现在列表 B 中,您似乎希望拥有
NULL
值。这是LEFT OUTER
join,幸运的是,Pig 可以进行外连接。将上面的代码修改如下:dump J2
这里会输出,我想这就是你想要的:The way I would approach this is to flatten the
B
relation, then do a left outer join onto A.First, to flatten the relation out:
This translates your date into:
Now, you can just do a JOIN to bring the data together (I'm renaming this stuff to keep my head straight):
dump J2
will output onlysam, 12, grad, maths, 4.5
.But, there is a problem. It looks like you want to have a
NULL
value if your item in list A does not show up in list B. This is a job for aLEFT OUTER
join, and luckily, Pig can do outer joins. Modify the above code as follows:dump J2
here will output, which is what I think you want: