在猪拉丁语中使用多条件的替换

发布于 2025-01-18 06:43:31 字数 407 浏览 4 评论 0原文

我有这组数据:

dump data;

这是一个示例输出:(该数据集几乎有一百万行长)。

("0",60,0,1,"Fri")
("1",47,0,1,"Mon")
("1",23,1,0,"Tue")
("1",60,0,0,"Sat")
("1",50,1,1,"Fri")

我想替换这些值:周六、周五、周一到周数,我知道如何使用 REPLACE 一次仅更改 1 个值,但我必须重复多次才能更改一周中的所有日期:

data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,REPLACE($4, 'Mon', '1');

有没有办法只用一个语句来做到这一点?

I have this set of data:

dump data;

This is a sample output: (this dataset is almost a million rows long).

("0",60,0,1,"Fri")
("1",47,0,1,"Mon")
("1",23,1,0,"Tue")
("1",60,0,0,"Sat")
("1",50,1,1,"Fri")

I want to replace the values: Sat, Fri, Mon to numbers of week, I know how to use REPLACE for change just 1 value at a time, but I have to repeat it multiple times in order to change all days of the week:

data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,REPLACE($4, 'Mon', '1');

Is there any way to do this in only one statement?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

烈酒灼喉 2025-01-25 06:43:31

You could combine the Pig ToDate and ToString functions:

data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,
    ToString(ToDate($4, 'EEE'), 'e') as day_of_week;

The

根据 java docs 单个E或C应该给出一周中的当天的数字格式,而星期一为1。

You could combine the Pig ToDate and ToString functions:

data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,
    ToString(ToDate($4, 'EEE'), 'e') as day_of_week;

The ToDate functions will convert the chararray day of the week to a Pig date time format. Then ToString will convert this to a format of your choosing.

According to the Java docs a single e or c should give the numeric format for the day of the week, where Monday is 1.

毁虫ゝ 2025-01-25 06:43:31

您应该使用案例,

data_day_of_week = FOREACH data GENERATE
    CASE
         WHEN $4 == 'Mon' THEN '1'
         WHEN $4 == 'Tue' THEN '2'
         ...
         WHEN $4 == 'Sun' THEN '7'
    END AS day_number;

然后说明您还应命名您的关系,以免使用$ 1,$ 2等。

You should use a CASE WHEN THEN statement

data_day_of_week = FOREACH data GENERATE
    CASE
         WHEN $4 == 'Mon' THEN '1'
         WHEN $4 == 'Tue' THEN '2'
         ...
         WHEN $4 == 'Sun' THEN '7'
    END AS day_number;

You should also name your relations so not to use $1, $2 etc. If you name $4 as day_number then when you declare the variable as day_number from the CASE statement it'll "overwrite" your prior data.

始于初秋 2025-01-25 06:43:31

您可以将 JOIN 与对映射的引用一起使用,如下所示:

(Mon,1)
(Tue,2)
(Wed,3)
(Thu,4)
(Fri,5)
(Sat,6)
(Sun,7)

Join 语句:

outer_left = join your_data by $4 left outer, day_mapping by day;

You can use JOIN with a reference to a mapping like this:

(Mon,1)
(Tue,2)
(Wed,3)
(Thu,4)
(Fri,5)
(Sat,6)
(Sun,7)

Join statement:

outer_left = join your_data by $4 left outer, day_mapping by day;
深空失忆 2025-01-25 06:43:31

saph_top,是最接近回答我的问题的人,尽管如此,在测试它解决空白输出之后,我将补充他的答案:

“Mon”与“Mon”不同,因此当我使用时:CASE WHEN $4 == 'Mon' THEN '1' 它没有替换任何内容,导致空白结果:data_day_of_week。

为了解决这个问题,我只需添加“”(条件中的双引号):

data_day_of_week = FOREACH data GENERATE
CASE
     WHEN $4 == '"Mon"' THEN '1'
     WHEN $4 == '"Tue"' THEN '2'
     ...
     WHEN $4 == '"Sun"' THEN '7'
END AS day_number;

之后,为了重建数据,我将以下内容添加到 GENERATE 子句:

data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,
    CASE
         WHEN $4 == '"Mon"' THEN '1'
         WHEN $4 == '"Tue"' THEN '2'
         ...
         WHEN $4 == '"Sun"' THEN '7'
    END AS day_number;

现在输出已完成:
转储数据_星期几;

("0",60,0,1,5)
("1",47,0,1,1)
("1",23,1,0,2)
("1",60,0,0,6)
("1",50,1,1,5)

saph_top, was the closer one to answer my question, nonetheless after testing it was resolving blank output, I'm going to complement his answer:

'Mon' is not the same as "Mon", therefore when I was using: CASE WHEN $4 == 'Mon' THEN '1' It wasn´t replacing anything, resulting in blank result in: data_day_of_week.

To solve this, I just add " " (double quotes to the condition):

data_day_of_week = FOREACH data GENERATE
CASE
     WHEN $4 == '"Mon"' THEN '1'
     WHEN $4 == '"Tue"' THEN '2'
     ...
     WHEN $4 == '"Sun"' THEN '7'
END AS day_number;

After that in order to rebuild the data I add the following to GENERATE Clause:

data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,
    CASE
         WHEN $4 == '"Mon"' THEN '1'
         WHEN $4 == '"Tue"' THEN '2'
         ...
         WHEN $4 == '"Sun"' THEN '7'
    END AS day_number;

And the output was complete now:
dump data_day_of_week;

("0",60,0,1,5)
("1",47,0,1,1)
("1",23,1,0,2)
("1",60,0,0,6)
("1",50,1,1,5)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文