使用 Pig Latin 选择不同的计数
我需要关于这个猪脚本的帮助。我只得到一条记录。我选择 2 列并对另一列进行计数(不同),同时还使用 where like 子句来查找特定描述 (desc)。
这是我正在尝试编写的带有 Pig 的 sql。
/*
For example in sql:
select domain, count(distinct(segment)) as segment_cnt
from table
where desc='ABC123'
group by domain
order by segment_count desc;
*/
A = LOAD 'myoutputfile' USING PigStorage('\u0005')
AS (
domain:chararray,
segment:chararray,
desc:chararray
);
B = filter A by (desc=='ABC123');
C = foreach B generate domain, segment;
D = DISTINCT C;
E = group D all;
F = foreach E generate group, COUNT(D) as segment_cnt;
G = order F by segment_cnt DESC;
I need help with this pig script. I am just getting a single record. I am selecting 2 columns and doing a count(distinct) on another while also using a where like clause to find a particular description (desc).
Here's my sql with pig I am trying to code.
/*
For example in sql:
select domain, count(distinct(segment)) as segment_cnt
from table
where desc='ABC123'
group by domain
order by segment_count desc;
*/
A = LOAD 'myoutputfile' USING PigStorage('\u0005')
AS (
domain:chararray,
segment:chararray,
desc:chararray
);
B = filter A by (desc=='ABC123');
C = foreach B generate domain, segment;
D = DISTINCT C;
E = group D all;
F = foreach E generate group, COUNT(D) as segment_cnt;
G = order F by segment_cnt DESC;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以对每个域进行 GROUP,然后使用 嵌套 FOREACH 语法:
You could GROUP on each domain and then count the number of distinct elements in each group with a nested FOREACH syntax:
您可以更好地将其定义为宏:
用法:
X = LOAD 'data' AS (x: int);
Y = DISTINCT_COUNT(X, x);
如果您需要在
FOREACH
中使用它,最简单的方法如下:...GENERATE COUNT(Distinct(x))...
在 Pig 12 上测试。
You can better define this as a macro:
Usage:
X = LOAD 'data' AS (x: int);
Y = DISTINCT_COUNT(X, x);
If you need to use it in a
FOREACH
instead then the easiest way is something like:...GENERATE COUNT(Distinct(x))...
Tested on Pig 12.
如果你不想依赖任何团体,你可以使用这个:
这只会给你一个数字。
If you don't want to count on any group, you use this:
This will just give you a number.