在猪拉丁语中使用多条件的替换
我有这组数据:
dump data;
这是一个示例输出:(该数据集几乎有一百万行长)。
("0",60,0,1,"Fri")
("1",47,0,1,"Mon")
("1",23,1,0,"Tue")
("1",60,0,0,"Sat")
("1",50,1,1,"Fri")
我想替换这些值:周六、周五、周一到周数,我知道如何使用 REPLACE 一次仅更改 1 个值,但我必须重复多次才能更改一周中的所有日期:
data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,REPLACE($4, 'Mon', '1');
有没有办法只用一个语句来做到这一点?
I have this set of data:
dump data;
This is a sample output: (this dataset is almost a million rows long).
("0",60,0,1,"Fri")
("1",47,0,1,"Mon")
("1",23,1,0,"Tue")
("1",60,0,0,"Sat")
("1",50,1,1,"Fri")
I want to replace the values: Sat, Fri, Mon to numbers of week, I know how to use REPLACE for change just 1 value at a time, but I have to repeat it multiple times in order to change all days of the week:
data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,REPLACE($4, 'Mon', '1');
Is there any way to do this in only one statement?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
You could combine the Pig ToDate and ToString functions:
The
根据 java docs 单个E或C应该给出一周中的当天的数字格式,而星期一为1。
You could combine the Pig ToDate and ToString functions:
The ToDate functions will convert the chararray day of the week to a Pig date time format. Then ToString will convert this to a format of your choosing.
According to the Java docs a single e or c should give the numeric format for the day of the week, where Monday is 1.
您应该使用案例,
然后说明您还应命名您的关系,以免使用$ 1,$ 2等。
You should use a CASE WHEN THEN statement
You should also name your relations so not to use $1, $2 etc. If you name $4 as day_number then when you declare the variable as day_number from the CASE statement it'll "overwrite" your prior data.
您可以将 JOIN 与对映射的引用一起使用,如下所示:
Join 语句:
You can use JOIN with a reference to a mapping like this:
Join statement:
saph_top,是最接近回答我的问题的人,尽管如此,在测试它解决空白输出之后,我将补充他的答案:
“Mon”与“Mon”不同,因此当我使用时:CASE WHEN $4 == 'Mon' THEN '1' 它没有替换任何内容,导致空白结果:data_day_of_week。
为了解决这个问题,我只需添加“”(条件中的双引号):
之后,为了重建数据,我将以下内容添加到 GENERATE 子句:
现在输出已完成:
转储数据_星期几;
saph_top, was the closer one to answer my question, nonetheless after testing it was resolving blank output, I'm going to complement his answer:
'Mon' is not the same as "Mon", therefore when I was using: CASE WHEN $4 == 'Mon' THEN '1' It wasn´t replacing anything, resulting in blank result in: data_day_of_week.
To solve this, I just add " " (double quotes to the condition):
After that in order to rebuild the data I add the following to GENERATE Clause:
And the output was complete now:
dump data_day_of_week;