Hive 有字符串分割功能吗?
我正在寻找 Hive 中内置的字符串分割函数?例如,如果 String 是:
A|B|C|D|E
那么我想要一个像这样的函数:
array<string> split(string input, char delimiter)
这样我就可以得到:
[A,B,C,D,E]
Hive 中是否存在这样的内置 split 函数。
我只能看到 regexp_extract
和 regexp_replace
。我很想看到 indexOf()
和 split()
字符串函数。
I am looking for a in-built String split function in Hive? e.g. if String is:
A|B|C|D|E
Then I want to have a function like:
array<string> split(string input, char delimiter)
So that I get back:
[A,B,C,D,E]
Does such a in-built split function exist in Hive.
I can only see regexp_extract
and regexp_replace
. I would love to see a indexOf()
and split()
string functions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
确实存在基于正则表达式的分割函数。它没有在教程中列出,但在wiki 上的语言手册:
在您的情况下,分隔符“
|
”作为正则表达式有特殊含义,因此应称为“\\|
”。There does exist a split function based on regular expressions. It's not listed in the tutorial, but it is listed on the language manual on the wiki:
In your case, the delimiter "
|
" has a special meaning as a regular expression, so it should be referred to as "\\|
".Hive 中 split 的另一个有趣用例是,例如,当表中的列 ipname 具有值“abc11.def.ghft.com”并且您想要提取“abc11”时:
Another interesting usecase for split in Hive is when, for example, a column
ipname
in the table has a value "abc11.def.ghft.com" and you want to pull "abc11" out:只是对 Bkkbrad 给出的答案进行澄清。
我尝试了这个建议,但它对我不起作用。
例如,
产生:
但是,
产生了期望的结果:
包括元字符“|”方括号内的 导致它按照预期按字面解释,而不是作为元字符。
有关正则表达式此行为的详细说明,请参阅:http://www.regular-expressions.info/charclass .html
Just a clarification on the answer given by Bkkbrad.
I tried this suggestion and it did not work for me.
For example,
produced:
But,
produced the desired result:
Including the metacharacter '|' inside the square brackets causes it to be interpreted literally, as intended, rather than as a metacharacter.
For elaboration of this behaviour of regexp, see: http://www.regular-expressions.info/charclass.html