Hive 有字符串分割功能吗?

发布于 2024-09-30 01:10:31 字数 408 浏览 7 评论 0原文

我正在寻找 Hive 中内置的字符串分割函数?例如,如果 String 是:

A|B|C|D|E

那么我想要一个像这样的函数:

array<string> split(string input, char delimiter)

这样我就可以得到:

[A,B,C,D,E]

Hive 中是否存在这样的内置 split 函数。

我只能看到 regexp_extractregexp_replace。我很想看到 indexOf()split() 字符串函数。

I am looking for a in-built String split function in Hive? e.g. if String is:

A|B|C|D|E

Then I want to have a function like:

array<string> split(string input, char delimiter)

So that I get back:

[A,B,C,D,E]

Does such a in-built split function exist in Hive.

I can only see regexp_extract and regexp_replace. I would love to see a indexOf() and split() string functions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

岁吢 2024-10-07 01:10:31

确实存在基于正则表达式的分割函数。它没有在教程中列出,但在wiki 上的语言手册

split(string str, string pat)
   Split str around pat (pat is a regular expression) 

在您的情况下,分隔符“|”作为正则表达式有特殊含义,因此应称为“\\|”。

There does exist a split function based on regular expressions. It's not listed in the tutorial, but it is listed on the language manual on the wiki:

split(string str, string pat)
   Split str around pat (pat is a regular expression) 

In your case, the delimiter "|" has a special meaning as a regular expression, so it should be referred to as "\\|".

心是晴朗的。 2024-10-07 01:10:31

Hive 中 split 的另一个有趣用例是,例如,当表中的列 ipname 具有值“abc11.def.ghft.com”并且您想要提取“abc11”时:

SELECT split(ipname,'[\.]')[0] FROM tablename;

Another interesting usecase for split in Hive is when, for example, a column ipname in the table has a value "abc11.def.ghft.com" and you want to pull "abc11" out:

SELECT split(ipname,'[\.]')[0] FROM tablename;
还不是爱你 2024-10-07 01:10:31

只是对 Bkkbrad 给出的答案进行澄清。

我尝试了这个建议,但它对我不起作用。

例如,

split('aa|bb','\\|')

产生:

["","a","a","|","b","b",""]

但是,

split('aa|bb','[|]')

产生了期望的结果:

["aa","bb"]

包括元字符“|”方括号内的 导致它按照预期按字面解释,而不是作为元字符。

有关正则表达式此行为的详细说明,请参阅:http://www.regular-expressions.info/charclass .html

Just a clarification on the answer given by Bkkbrad.

I tried this suggestion and it did not work for me.

For example,

split('aa|bb','\\|')

produced:

["","a","a","|","b","b",""]

But,

split('aa|bb','[|]')

produced the desired result:

["aa","bb"]

Including the metacharacter '|' inside the square brackets causes it to be interpreted literally, as intended, rather than as a metacharacter.

For elaboration of this behaviour of regexp, see: http://www.regular-expressions.info/charclass.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文