Pig脚本函数问题

发布于 2024-11-18 00:39:44 字数 1025 浏览 5 评论 0原文

从下面的 Pig 代码中可以看出,我为 Attr1 和 Attr2 重复了一组语句。有没有办法在函数中将其提取出来?代码示例确实很有帮助。

Attr1ValidRecs = FILTER BaseRecs BY Attr1 IS NOT NULL;
Attr1ValidRecs_all = GROUP Attr1ValidRecs ALL;
Attr1Count = FOREACH Attr1ValidRecs_all GENERATE COUNT(Attr1ValidRecs);
Attr1CountStr = FOREACH Attr1Count GENERATE CONCAT('Recs with Attr1 not null : ',(chararray)$0);

Attr1BaseCross = CROSS BaseRecsCount,Attr1Count;
Attr1BaseRatio = FOREACH Attr1BaseCross GENERATE CONCAT('Ratio of Not Null Attr1 to Total Base Recs: ',(chararray)((double)$1/(double)$0));

Attr2ValidRecs = FILTER BaseRecs BY Attr2 IS NOT NULL;
Attr2ValidRecs_all = GROUP Attr2ValidRecs ALL;
Attr2Count = FOREACH Attr2ValidRecs_all GENERATE COUNT(Attr2ValidRecs);
Attr2CountStr = FOREACH Attr2Count GENERATE CONCAT('Recs with Attr2 not null : ',(chararray)$0);

Attr2BaseCross = CROSS BaseRecsCount,Attr2Count;
Attr2BaseRatio = FOREACH Attr2BaseCross GENERATE CONCAT('Ratio of Not Null Attr2 to Total Base Recs:
',(chararray)((double)$1/(double)$0));

As can be seen in the following Pig code, I am repeating a set of statements for Attr1 and Attr2. Is there a way to extract it out in a function? Code samples would really help.

Attr1ValidRecs = FILTER BaseRecs BY Attr1 IS NOT NULL;
Attr1ValidRecs_all = GROUP Attr1ValidRecs ALL;
Attr1Count = FOREACH Attr1ValidRecs_all GENERATE COUNT(Attr1ValidRecs);
Attr1CountStr = FOREACH Attr1Count GENERATE CONCAT('Recs with Attr1 not null : ',(chararray)$0);

Attr1BaseCross = CROSS BaseRecsCount,Attr1Count;
Attr1BaseRatio = FOREACH Attr1BaseCross GENERATE CONCAT('Ratio of Not Null Attr1 to Total Base Recs: ',(chararray)((double)$1/(double)$0));

Attr2ValidRecs = FILTER BaseRecs BY Attr2 IS NOT NULL;
Attr2ValidRecs_all = GROUP Attr2ValidRecs ALL;
Attr2Count = FOREACH Attr2ValidRecs_all GENERATE COUNT(Attr2ValidRecs);
Attr2CountStr = FOREACH Attr2Count GENERATE CONCAT('Recs with Attr2 not null : ',(chararray)$0);

Attr2BaseCross = CROSS BaseRecsCount,Attr2Count;
Attr2BaseRatio = FOREACH Attr2BaseCross GENERATE CONCAT('Ratio of Not Null Attr2 to Total Base Recs:
',(chararray)((double)$1/(double)$0));

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

只想待在家 2024-11-25 00:39:44

遗憾的是,您无法将多行替换为一批 Pig 操作。这是我有时希望自己能做的事情,所以我很同情。

过去,当我在同一个脚本中一遍又一遍地重复某些内容时,我所做的就是使用 for 循环在 Python 脚本(或任何其他脚本)中生成 Pig Latin 代码,替换某些关键字。不过,这仍然感觉很脏。

You can't replace multiple rows into a batch of Pig operations, unfortunately. This is something I wish I could do sometimes, so I sympathize.

What I've done in the past when I have something I repeat over and over in the same script is to generate the Pig Latin code in a Python script (or whatever, obviously) with a for loop, replacing certain key words. This still feels pretty dirty, though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文