雪花正则表达式

发布于 2025-01-19 07:35:59 字数 466 浏览 5 评论 0原文

我在 Snowflake 列中有这个字符串:

\[
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}
\]

无论公司名称有多少,我都需要获取这种格式的名称:“SpecTra,Signal Capital Partners”。换句话说,我需要提取公司名称并将它们连接起来。

我已经尝试过这个:

regexp_replace(col, '"(\[^"\]+)"|.', '\\1|') 

并且 regexp_substr() 函数,但没有得到所需的输出

你能帮我解决这个问题吗? 谢谢

I have this string in Snowflake column:

\[
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}
\]

I need to get names in this format regardless of the number of company names: "SpecTra, Signal Capital Partners". In other words, I need to extract company names and concatenate them.

I have tried this :

regexp_replace(col, '"(\[^"\]+)"|.', '\\1|') 

and
regexp_substr() function, but did not get the desired output

Can you please help me with this?
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

智商已欠费 2025-01-26 07:35:59

您可以使用

trim(regexp_replace(regexp_replace(col, '"name":\\s*"([^"]+)"|.', '\\1,'), ',+', ','), ',')

详细信息

  • "name":\s*"([^"]+)"|。正则表达式匹配"name": ,然后是零个或多个空格,以及一个 ",然后将除 " 之外的任何一个或多个字符捕获到第 1 组中,然后匹配 " > char,并替换为组 1 和 a逗号
  • 第二个 regexp_replace 将所有逗号缩小为单个逗号,,+ 匹配一个或多个逗号(您也可以使用更具体的 ,{2 ,} 模式改为此处)
  • trim 删除开头和结尾的逗号。

You can use

trim(regexp_replace(regexp_replace(col, '"name":\\s*"([^"]+)"|.', '\\1,'), ',+', ','), ',')

Details:

  • "name":\s*"([^"]+)"|. regex matches "name":, then zero or more whitespaces, and a ", and then captures into Group 1 any one or more chars other than " and then matches a " char, and replaces with Group 1 and a comma
  • The second regexp_replace shrinks all commas into a single occurrence of a comma, ,+ matches one or more commas (you may also use a more specific ,{2,} pattern here instead)
  • trim removes commas from start and end.
乜一 2025-01-26 07:35:59

因此将文本 blob 推入 CTE。

with data as (
    SELECT * FROM VALUES
    ('[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]')
    t(str)
)

我忍不住注意到它是 JSON,所以让 PARSE_JSON然后扁平化它,这是您“命名”

select 
    d.*
    ,f.value:name::text as name
from data d
    ,table(flatten(input=>parse_json(d.str))) f

给出:

STRNAME
[{“entryListId”:3279,“id”:4617,“name”:“SpecTra”,“type”:0},{“entryListId”:3279,“id”:7455, "name": "Signal Capital Partners","type": 0}]SpecTra
[{"entryListId":第3279章 4617、第3279章 7455、第3279章0}]Signal Capital Partners

因此使用 LISTAGG

select 
    listagg(f.value:name::text, ',') as names
from data d
    ,table(flatten(input=>parse_json(d.str))) f

给出:

NAMES
SpecTra,Signal Capital Partners

重复数据:

您可以将 DISTINCT 添加到 LISTAGG 中,并且只保留不同的值,但考虑到这是一个成本,我确实指出了这一点,并且您没有提到重复数据。

with data as (
    SELECT * FROM VALUES
    ('[
    {
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},   
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}]')
    t(str)
)
select 
    listagg(distinct f.value:name::text, ',') as names
from data d
    ,table(flatten(input=>parse_json(d.str))) f;

给出:

NAMES
SpecTra,Signal Capital Partners

而该正则表达式解决方案不处理这种情况:

with data as (
    SELECT * FROM VALUES
    ('[
    {
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},   
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}]')
    t(str)
)
select 
    trim(regexp_replace(regexp_replace(d.str, '"name":\\s*"([^"]+)"|.', '\\1,'), ',+', ','), ',') as regexp_replace
from data d

给出:

REGEXP_REPLACE
, , , SpecTra, , , , , , SpecTra, , , , , , Signal Capital Partners, ,

So pushing you text blob into a CTE.

with data as (
    SELECT * FROM VALUES
    ('[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]')
    t(str)
)

I cannot help but note it's JSON, so lets PARSE_JSON that and then FLATTEN it, and here are you "names"

select 
    d.*
    ,f.value:name::text as name
from data d
    ,table(flatten(input=>parse_json(d.str))) f

giving:

STRNAME
[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]SpecTra
[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]Signal Capital Partners

And thus to aggregate, using LISTAGG

select 
    listagg(f.value:name::text, ',') as names
from data d
    ,table(flatten(input=>parse_json(d.str))) f

gives:

NAMES
SpecTra,Signal Capital Partners

Duplicate data:

you can add DISTINCT to the LISTAGG and have only the distinct values kept, but given it is a cost, I did point it out, and you didn't mention duplicate data.

with data as (
    SELECT * FROM VALUES
    ('[
    {
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},   
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}]')
    t(str)
)
select 
    listagg(distinct f.value:name::text, ',') as names
from data d
    ,table(flatten(input=>parse_json(d.str))) f;

gives:

NAMES
SpecTra,Signal Capital Partners

Where-as that regex solution does not handle this case:

with data as (
    SELECT * FROM VALUES
    ('[
    {
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},   
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}]')
    t(str)
)
select 
    trim(regexp_replace(regexp_replace(d.str, '"name":\\s*"([^"]+)"|.', '\\1,'), ',+', ','), ',') as regexp_replace
from data d

gives:

REGEXP_REPLACE
, , , SpecTra, , , , , , SpecTra, , , , , , Signal Capital Partners, ,
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文