如何在蜂巢中的字符之间提取字符串
我有一个带有一个列的蜂巢表,其中包含一个带有多个主题名称的字符串。我希望将第一个主题名称分开(如果可能的话,第二和第三)。该字符串最多可以包含8个主题名称。
字符串的格式是:
["T.Topic1", "T.Topic2", "T.Topic3", "S.Topic4", "S.Topic5"]
我尝试了以下内容,但想知道是否有一种更好的方法在后续行中删除左字符“和正确的字符”,或者有可能提取的可能性超过第一个主题。
SELECT SUBSTR(split(l.Intent, '[\\,]')[0], 2) AS TOPIC_1
FROM Table l
结果:
“ t.topic1”
谢谢
I have a Hive table with a column which includes a string with multiple topic names. I am looking to split out the first topic name (and if possible the second and third). The string can contain up to 8 topic names.
The format of the string is:
["T.Topic1", "T.Topic2", "T.Topic3", "S.Topic4", "S.Topic5"]
I have tried the following but wanted to know if there was a better way that would not involve the need to remove the left characters " and the right character " in a subsequent line or a possibility to extract more than the first topic.
SELECT SUBSTR(split(l.Intent, '[\\,]')[0], 2) AS TOPIC_1
FROM Table l
Results:
"T.Topic1"
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您真的很接近解决方案。
我建议您尝试在两个阶段进行解决。
regexp_extract(l.intent,'^\\ [(。*)“ \\]')
这将在数组中获取文本。split(文本,'“,”)
将将字符串拆分为所需的数组。将其放在一起:
您现在可以访问这些行
topics.Array_of_topics [0]
,topics.Array_of_topics [1]
,topics.Array_of_topics [3]
You are really close to the solution.
I suggest that you try and tackle it in 2 stages.
regexp_extract(l.Intent,'^\\["(.*)"\\]' )
This will get the text inside the array.split ( text , '", "' )
will split the string into the array you want.putting it together:
You can now access these rows
topics.array_of_topics[0]
,topics.array_of_topics[1]
,topics.array_of_topics[3]