如何在蜂巢中的字符之间提取字符串

发布于 2025-01-30 17:42:31 字数 406 浏览 2 评论 0原文

我有一个带有一个列的蜂巢表,其中包含一个带有多个主题名称的字符串。我希望将第一个主题名称分开(如果可能的话,第二和第三)。该字符串最多可以包含8个主题名称。

字符串的格式是:

["T.Topic1", "T.Topic2", "T.Topic3", "S.Topic4", "S.Topic5"]

我尝试了以下内容,但想知道是否有一种更好的方法在后续行中删除左字符“和正确的字符”,或者有可能提取的可能性超过第一个主题。

SELECT SUBSTR(split(l.Intent, '[\\,]')[0], 2) AS TOPIC_1
FROM Table l

结果:

“ t.topic1”

谢谢

I have a Hive table with a column which includes a string with multiple topic names. I am looking to split out the first topic name (and if possible the second and third). The string can contain up to 8 topic names.

The format of the string is:

["T.Topic1", "T.Topic2", "T.Topic3", "S.Topic4", "S.Topic5"]

I have tried the following but wanted to know if there was a better way that would not involve the need to remove the left characters " and the right character " in a subsequent line or a possibility to extract more than the first topic.

SELECT SUBSTR(split(l.Intent, '[\\,]')[0], 2) AS TOPIC_1
FROM Table l

Results:

"T.Topic1"

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

病毒体 2025-02-06 17:42:31

您真的很接近解决方案。
我建议您尝试在两个阶段进行解决。

  1. 删除阵列
  2. 拆分字符串。
  • regexp_extract(l.intent,'^\\ [(。*)“ \\]')这将在数组中获取文本。
  • split(文本,'“,”)将将字符串拆分为所需的数组。

将其放在一起:

with l as (select '["T.Topic1", "T.Topic2", "T.Topic3", "S.Topic4", "S.Topic5"]' as Intent)
select
   split ( 
     regexp_extract(l.Intent,'^\\["(.*)"\\]')
     , '", "' ) as array_of_topics 
from l as topics;

您现在可以访问这些行topics.Array_of_topics [0]topics.Array_of_topics [1]topics.Array_of_topics [3]

You are really close to the solution.
I suggest that you try and tackle it in 2 stages.

  1. Remove the Array
  2. Split the string.
  • regexp_extract(l.Intent,'^\\["(.*)"\\]' ) This will get the text inside the array.
  • split ( text , '", "' ) will split the string into the array you want.

putting it together:

with l as (select '["T.Topic1", "T.Topic2", "T.Topic3", "S.Topic4", "S.Topic5"]' as Intent)
select
   split ( 
     regexp_extract(l.Intent,'^\\["(.*)"\\]')
     , '", "' ) as array_of_topics 
from l as topics;

You can now access these rows topics.array_of_topics[0],topics.array_of_topics[1],topics.array_of_topics[3]

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文