阿帕奇猪的薪水总和

发布于 2025-02-06 19:53:08 字数 1130 浏览 3 评论 0 原文

为可用文件emp1.csv和 dept.csv。

colnames：emp：empno，name，sal，did，branch，dno 部门：deptno，name，loc

检索为工作的员工支付的总薪水 “芝加哥”。

EMP的表就像

1010,jack,45000,CSE,10
1011,nick,70000,ECE,20
1012,mike,60000,ECE,30
1013,james,25000,CSE,20

是

10,ACCOUNTING,DALLAS
20,OPERATIONS,CHICAGO
30,SALES,BOSTON

我

grunt> emp_data = load ‘student/emp1.csv’ using PigStorage(‘,’) as (empno: int, empname: 
       chararray, sal: int, did: chararray, branch: chararray, dno: int);

grunt> emp_dept = load ‘student/dept.csv’ using PigStorage(‘,’) as (deptno: int, name: 
       chararray, loc: chararray);
grunt> joined = join emp_data by dno, emp_dept by deptno;
grunt> emp_loc = joined by loc matches 'CHICAGO';
grunt> total_sal = foreach emp_loc generate sum(sal);

在最后一行之后加入了两个表格的表，显示答案

EROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve sum using import: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

应为 95000

原文

Create employee and dept tables for the available files emp1.csv and
dept.csv.

Colnames: Emp: Empno, name, sal, did, branch, dno
Dept: deptno, name, loc

Retrieve total salaries to be paid for the employees working in
‘chicago’.

the table for emp was like

1010,jack,45000,CSE,10
1011,nick,70000,ECE,20
1012,mike,60000,ECE,30
1013,james,25000,CSE,20

and dept table was

10,ACCOUNTING,DALLAS
20,OPERATIONS,CHICAGO
30,SALES,BOSTON

I've joined both tables

grunt> emp_data = load ‘student/emp1.csv’ using PigStorage(‘,’) as (empno: int, empname: 
       chararray, sal: int, did: chararray, branch: chararray, dno: int);

grunt> emp_dept = load ‘student/dept.csv’ using PigStorage(‘,’) as (deptno: int, name: 
       chararray, loc: chararray);
grunt> joined = join emp_data by dno, emp_dept by deptno;
grunt> emp_loc = joined by loc matches 'CHICAGO';
grunt> total_sal = foreach emp_loc generate sum(sal);

After the last line it shows an error

EROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve sum using import: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

The answer should be 95000

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绝情姑娘 2025-02-13 19:53:08

首先，您需要从 emp_data 中删除确实：CharArray ，因为这似乎不是您数据的一部分。

关于错误，内置功能的资本化很重要。我发现，最好总是大写所有的猪关键字和功能。

对于“显式铸造”错误... sum （和其他汇总功能）拿一个袋子，而您仅传递 int ，因此“它们都不适合” （ sum 函数的方法签名）。
要获取一个包，您需要 group 。您还可以通过预先过滤数据来改进加入性能。加入和组之后，您需要提供 sal 的完整标识符（从 descript x 中查看）

的nofollow noreferrer>

sum

在单列袋中计算数字值的总和。 sum 要求一个前面的全球总和的所有声明和 by a by Groud by convate for Group Sums

如果您使用SQL，则适用相同的逻辑。。

示例

E = LOAD 'emp.csv' USING PigStorage(',') AS (empno: int, empname: chararray, sal: int, branch: chararray, dno: int);
D = LOAD 'dept.csv' USING PigStorage(',') AS (deptno: int, name: chararray, loc: chararray);
D_CHI = FILTER D BY loc == 'CHICAGO';

X = JOIN E BY dno, D_CHI BY deptno;

X_BY_LOC = GROUP X BY D_CHI.loc;
O = FOREACH X_BY_LOC GENERATE group, SUM(X.E::sal) as total;
DUMP O;

是的，当您在过滤器之后只有一个条目时进行过滤是很奇怪的，但是如果您有更多的芝加哥价值...

输出

(CHICAGO,95000)

First, you need to remove did: chararray from the emp_data since that doesn't seem to be part of your data.

Regarding the error, capitalization matters for built-in functions. Best to always capitalize all Pig keywords and functions, I've found.

For the "explicit cast" error ... SUM (and other aggregate functions) takes a bag, and you are only passing an int, thus "none of them fit" (the method signatures of the SUM function).
To get a bag, you need to GROUP. You can also improve the JOIN performance by pre-filtering the data. After the join and group, you need to provide the full identifier of the sal (seen from describe X)

From docs

SUM

Computes the sum of the numeric values in a single-column bag. SUM requires a preceding GROUP ALL statement for global sums and a GROUP BY statement for group sums

Same logic applies if you were to use SQL...

Example

E = LOAD 'emp.csv' USING PigStorage(',') AS (empno: int, empname: chararray, sal: int, branch: chararray, dno: int);
D = LOAD 'dept.csv' USING PigStorage(',') AS (deptno: int, name: chararray, loc: chararray);
D_CHI = FILTER D BY loc == 'CHICAGO';

X = JOIN E BY dno, D_CHI BY deptno;

X_BY_LOC = GROUP X BY D_CHI.loc;
O = FOREACH X_BY_LOC GENERATE group, SUM(X.E::sal) as total;
DUMP O;

Yes, it is strange to filter then group by when you only have one entry after the filter, but it makes sense if you had more chicago values...

Output