阿帕奇猪的薪水总和
为可用文件emp1.csv和 dept.csv。
colnames:emp:empno,name,sal,did,branch,dno 部门:deptno,name,loc
检索为工作的员工支付的总薪水 “芝加哥”。
EMP的表就像
1010,jack,45000,CSE,10
1011,nick,70000,ECE,20
1012,mike,60000,ECE,30
1013,james,25000,CSE,20
是
10,ACCOUNTING,DALLAS
20,OPERATIONS,CHICAGO
30,SALES,BOSTON
我
grunt> emp_data = load ‘student/emp1.csv’ using PigStorage(‘,’) as (empno: int, empname:
chararray, sal: int, did: chararray, branch: chararray, dno: int);
grunt> emp_dept = load ‘student/dept.csv’ using PigStorage(‘,’) as (deptno: int, name:
chararray, loc: chararray);
grunt> joined = join emp_data by dno, emp_dept by deptno;
grunt> emp_loc = joined by loc matches 'CHICAGO';
grunt> total_sal = foreach emp_loc generate sum(sal);
在最后一行之后加入了两个表格的表,显示答案
EROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve sum using import: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
应为 95000
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,您需要从
emp_data
中删除确实:CharArray
,因为这似乎不是您数据的一部分。关于错误,内置功能的资本化很重要。我发现,最好总是大写所有的猪关键字和功能。
对于“显式铸造”错误...
sum
(和其他汇总功能)拿一个袋子,而您仅传递int
,因此“它们都不适合” (sum
函数的方法签名)。要获取一个包,您需要
group
。您还可以通过预先过滤数据来改进加入
性能。加入和组之后,您需要提供sal
的完整标识符(从descript x
中查看)的nofollow noreferrer>
如果您使用SQL,则适用相同的逻辑。 。
示例
是的,当您在过滤器之后只有一个条目时进行过滤是很奇怪的,但是如果您有更多的芝加哥价值...
输出
First, you need to remove
did: chararray
from theemp_data
since that doesn't seem to be part of your data.Regarding the error, capitalization matters for built-in functions. Best to always capitalize all Pig keywords and functions, I've found.
For the "explicit cast" error ...
SUM
(and other aggregate functions) takes a bag, and you are only passing anint
, thus "none of them fit" (the method signatures of theSUM
function).To get a bag, you need to
GROUP
. You can also improve theJOIN
performance by pre-filtering the data. After the join and group, you need to provide the full identifier of thesal
(seen fromdescribe X
)From docs
Same logic applies if you were to use SQL...
Example
Yes, it is strange to filter then group by when you only have one entry after the filter, but it makes sense if you had more chicago values...
Output