如何“更新”使用猪拉丁语的专栏

发布于 2024-10-11 13:41:27 字数 283 浏览 10 评论 0原文

假设我有下表:

A: { x: int, y: int, z: int, ...99 other columns... }

我现在想要对其进行转换,将 z 设置为 NULL,其中 x > 为 NULL。 y,结果数据集存储为 B

我想这样做而不必明确提及所有其他列,因为这成为维护的噩梦。

有一个简单的解决方案吗?

Imagine I have the following table available to me:

A: { x: int, y: int, z: int, ...99 other columns... }

I now want to transform this, such that z is set to NULL where x > y, with the resulting dataset to be stored as B.

and I want to do it without having to explicitly mention all the other columns, as this becomes a maintenance nightmare.

Is there a simple solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

陌若浮生 2024-10-18 13:41:27

此 JIRA 中跟踪了此问题:
PIG-1693 在 foreach 中需要有一种方法来指示“并且所有 了

目前我不知道有什么比按照你说的做或者不加载 Z 并使用 明星表达

This issue is tracked in this JIRA:
PIG-1693 There needs to be a way in foreach to indicate "and all the rest of the fields"

Currently I don't know anything simpler than doing what you say or not loading Z and adding a new column Z with the star expression.

若有似无的小暗淡 2024-10-18 13:41:27

通过将它们嵌套在单排袋中并随后压平,我能够消除一些列的膨胀。

尽管如此,感觉还是有点像黑客。因此,我也在研究级联,看看它是否更适合我的场景。

I was able to drop some of the column bloat by nesting them in single-row bags and flattening afterwards.

Still, it feels like a bit of a hack. So I'm also investigating cascading to see if it's a better fit for my scenario.

最好是你 2024-10-18 13:41:27

Pig 0.9 中添加了一项可以简化您的场景的功能。新的项目范围运算符 (..) 允许您通过指示起始和/或结束字段名称来表达字段范围,如下例所示:

result = FOREACH someInput GENERATE field1, field2, null as field3, field4 .. ;

在上面的示例中,field1/2/3/4 是实际的字段名称。其中一个字段设置为空,而其他字段保持不变。

更多详细信息,请参阅这篇“Apache Pig 0.9 新功能 – 第 3 部分”文章:http://hortonworks.com/blog/new-apache-pig-0-9-features-part-3-additional-features/

解决您可能想要的特定问题执行 FILTER 和 UNION 来合并结果。

A feature to facilitate your scenario was added in Pig 0.9. The new project-range operator (..) allows you to express a range of fields by indicating the starting and/or ending field names as in this example:

result = FOREACH someInput GENERATE field1, field2, null as field3, field4 .. ;

In the example above field1/2/3/4 are actual field names. One of the fields is set to null while the other fields are kept intact.

More details in this "New Apache Pig 0.9 Features – Part 3" article: http://hortonworks.com/blog/new-apache-pig-0-9-features-part-3-additional-features/

To solve your specific problem you probably want to do a FILTER and an UNION to combine the results.

别低头,皇冠会掉 2024-10-18 13:41:27

当然,您可以按列号选择列,但如果您进行任何更改,这很容易成为一场噩梦。我发现列名更加稳定,因此我推荐以下解决方案:

当 mycol 位于两个已知列之间时更新 mycol

您可以使用 .. 来指示前导列或尾随列(或中间列)列)。如果您想将“MyCol”的值更改为“updatedvalue”,则执行以下操作。

aliasAfter = FOREACH aliasBefore GENERATE 
             .. colBeforeMyCol, updatedvalue, colAfterMyCol ..;

Of course you can select columns by column number, but that can easily become a nightmare if you change anything at all. I have found column names to be much more stable, and therefore I recommend the following solution:

Update mycol when it is between two known columns

You can use .. to indicate leading, or trailing columns (or inbetween columns). Here is how that would work out if you want to change the value of 'MyCol' to 'updatedvalue'.

aliasAfter = FOREACH aliasBefore GENERATE 
             .. colBeforeMyCol, updatedvalue, colAfterMyCol ..;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文