使用 Pig Latin 选择不同的计数

发布于 2025-01-04 13:42:38 字数 716 浏览 1 评论 0原文

我需要关于这个猪脚本的帮助。我只得到一条记录。我选择 2 列并对另一列进行计数(不同),同时还使用 where like 子句来查找特定描述 (desc)。

这是我正在尝试编写的带有 Pig 的 sql。

 /*
    For example in sql:
    select domain, count(distinct(segment)) as segment_cnt
    from table
    where desc='ABC123'
    group by domain
    order by segment_count desc;
    */

    A = LOAD 'myoutputfile' USING PigStorage('\u0005')
            AS (
                domain:chararray,
                segment:chararray,
                desc:chararray
                );
B = filter A by (desc=='ABC123');
C = foreach B generate domain, segment;
D = DISTINCT C;
E = group D all;
F = foreach E generate group, COUNT(D) as segment_cnt;
G = order F by segment_cnt DESC;

I need help with this pig script. I am just getting a single record. I am selecting 2 columns and doing a count(distinct) on another while also using a where like clause to find a particular description (desc).

Here's my sql with pig I am trying to code.

 /*
    For example in sql:
    select domain, count(distinct(segment)) as segment_cnt
    from table
    where desc='ABC123'
    group by domain
    order by segment_count desc;
    */

    A = LOAD 'myoutputfile' USING PigStorage('\u0005')
            AS (
                domain:chararray,
                segment:chararray,
                desc:chararray
                );
B = filter A by (desc=='ABC123');
C = foreach B generate domain, segment;
D = DISTINCT C;
E = group D all;
F = foreach E generate group, COUNT(D) as segment_cnt;
G = order F by segment_cnt DESC;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

与之呼应 2025-01-11 13:42:38

您可以对每个域进行 GROUP,然后使用 嵌套 FOREACH 语法:

D = group C by domain;
E = foreach D { 
    unique_segments = DISTINCT C.segment;
    generate group, COUNT(unique_segments) as segment_cnt;
};

You could GROUP on each domain and then count the number of distinct elements in each group with a nested FOREACH syntax:

D = group C by domain;
E = foreach D { 
    unique_segments = DISTINCT C.segment;
    generate group, COUNT(unique_segments) as segment_cnt;
};
游魂 2025-01-11 13:42:38

您可以更好地将其定义为宏:

DEFINE DISTINCT_COUNT(A, c) RETURNS dist {
  temp = FOREACH $A GENERATE $c;                                                                                                                                                      
  dist = DISTINCT temp;                                                                                                                                                               
  groupAll = GROUP dist ALL;                                                                                                                                                          
  $dist = FOREACH groupAll GENERATE COUNT(dist);                                                                                                                                      
}

用法:

X = LOAD 'data' AS (x: int);

Y = DISTINCT_COUNT(X, x);

如果您需要在 FOREACH 中使用它,最简单的方法如下:

...GENERATE COUNT(Distinct(x))...

在 Pig 12 上测试。

You can better define this as a macro:

DEFINE DISTINCT_COUNT(A, c) RETURNS dist {
  temp = FOREACH $A GENERATE $c;                                                                                                                                                      
  dist = DISTINCT temp;                                                                                                                                                               
  groupAll = GROUP dist ALL;                                                                                                                                                          
  $dist = FOREACH groupAll GENERATE COUNT(dist);                                                                                                                                      
}

Usage:

X = LOAD 'data' AS (x: int);

Y = DISTINCT_COUNT(X, x);

If you need to use it in a FOREACH instead then the easiest way is something like:

...GENERATE COUNT(Distinct(x))...

Tested on Pig 12.

_畞蕅 2025-01-11 13:42:38

如果你不想依赖任何团体,你可以使用这个:

G = FOREACH (GROUP A ALL){
unique = DISTINCT A.field;
GENERATE COUNT(unique) AS ct;
};

这只会给你一个数字。

If you don't want to count on any group, you use this:

G = FOREACH (GROUP A ALL){
unique = DISTINCT A.field;
GENERATE COUNT(unique) AS ct;
};

This will just give you a number.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文