使用 PIG 加载文件

发布于 2024-12-14 18:38:07 字数 355 浏览 9 评论 0原文

我对 PIG 很陌生,我遇到了一个非常基本的问题。 我有一行代码,内容如下:

A = load 'Sites/trial_clustering/shortdocs/*'
      AS (word1:chararray, word2:chararray, word3:chararray, word4:chararray);

其中每个文件基本上都是一行 4 个逗号分隔的单词。然而 PIG 并没有将其分成 4 个词。当我转储 A 时,我得到:(金钱、硬币、贷款、债务、、、) 我尝试过谷歌搜索,但似乎无法找到我的文件需要采用什么格式,以便 PIG 能够正确解释它。请帮忙!

I am very new to PIG and I am having what feels like a very basic problem.
I have a line of code that reads:

A = load 'Sites/trial_clustering/shortdocs/*'
      AS (word1:chararray, word2:chararray, word3:chararray, word4:chararray);

where each file is basically a line of 4 comma separated words. However PIG is not splitting this into the 4 words. When I do dump A, I get: (Money, coins, loans, debt,,,)
I have tried googling and I cannot seem to find what format my file needs to be in so that PIG will interpret it properly. Please help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

美胚控场 2024-12-21 18:38:07

您的问题是,默认情况下,Pig 加载由制表符分隔的文件,而不是逗号。发生的情况是“金钱、硬币、贷款、债务” 被困在您的第一列 word1 中。当您打印它时,您会产生一种错觉,即您有多个列,但实际上第一个列填满了整行,然后其他列为空。

要解决此问题,您应该指定 PigStorage 通过逗号加载:

A = LOAD '...' USING PigStorage(',') AS (...);

Your problem is that Pig, by default, loads files delimited by tab, not comma. What's happening is "Money, coins, loans, debt" are getting stuck in your first column, word1. When you are printing it, you get the illusion that you have multiple columns, but really the first one is filled with your whole line, then the others are null.

To fix this, you should specify PigStorage to load by comma by doing:

A = LOAD '...' USING PigStorage(',') AS (...);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文