雪花正则表达式

发布于 2025-01-19 07:35:59 字数 466 浏览 5 评论 0原文

我在 Snowflake 列中有这个字符串：

\[
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}
\]

无论公司名称有多少，我都需要获取这种格式的名称：“SpecTra，Signal Capital Partners”。换句话说，我需要提取公司名称并将它们连接起来。

我已经尝试过这个：

regexp_replace(col, '"(\[^"\]+)"|.', '\\1|')

并且 regexp_substr() 函数，但没有得到所需的输出

你能帮我解决这个问题吗？谢谢

原文

I have this string in Snowflake column:

\[
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}
\]

I need to get names in this format regardless of the number of company names: "SpecTra, Signal Capital Partners". In other words, I need to extract company names and concatenate them.

I have tried this :

regexp_replace(col, '"(\[^"\]+)"|.', '\\1|')

and
regexp_substr() function, but did not get the desired output

Can you please help me with this?
Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

智商已欠费 2025-01-26 07:35:59

您可以使用

trim(regexp_replace(regexp_replace(col, '"name":\\s*"([^"]+)"|.', '\\1,'), ',+', ','), ',')

详细信息：

"name":\s*"([^"]+)"|。正则表达式匹配"name": ，然后是零个或多个空格，以及一个 "，然后将除 " 之外的任何一个或多个字符捕获到第 1 组中，然后匹配 " > char，并替换为组 1 和 a逗号
第二个 regexp_replace 将所有逗号缩小为单个逗号，,+ 匹配一个或多个逗号（您也可以使用更具体的 ,{2 ,} 模式改为此处）
trim 删除开头和结尾的逗号。

You can use

trim(regexp_replace(regexp_replace(col, '"name":\\s*"([^"]+)"|.', '\\1,'), ',+', ','), ',')

Details:

"name":\s*"([^"]+)"|. regex matches "name":, then zero or more whitespaces, and a ", and then captures into Group 1 any one or more chars other than " and then matches a " char, and replaces with Group 1 and a comma
The second regexp_replace shrinks all commas into a single occurrence of a comma, ,+ matches one or more commas (you may also use a more specific ,{2,} pattern here instead)
trim removes commas from start and end.

回复收藏 0 原文

乜一 2025-01-26 07:35:59

因此将文本 blob 推入 CTE。

with data as (
    SELECT * FROM VALUES
    ('[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]')
    t(str)
)

我忍不住注意到它是 JSON，所以让 PARSE_JSON然后扁平化它，这是您“命名”

select 
    d.*
    ,f.value:name::text as name
from data d
    ,table(flatten(input=>parse_json(d.str))) f

给出：

STR	NAME
[{“entryListId”：3279，“id”：4617，“name”：“SpecTra”，“type”：0}，{“entryListId”：3279，“id”：7455， "name": "Signal Capital Partners","type": 0}]	SpecTra
[{"entryListId":第3279章 4617、第3279章 7455、第3279章0}]	Signal Capital Partners

因此使用 LISTAGG

select 
    listagg(f.value:name::text, ',') as names
from data d
    ,table(flatten(input=>parse_json(d.str))) f

给出：

NAMES
SpecTra，Signal Capital Partners

重复数据：

您可以将 DISTINCT 添加到 LISTAGG 中，并且只保留不同的值，但考虑到这是一个成本，我确实指出了这一点，并且您没有提到重复数据。

with data as (
    SELECT * FROM VALUES
    ('[
    {
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},   
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}]')
    t(str)
)
select 
    listagg(distinct f.value:name::text, ',') as names
from data d
    ,table(flatten(input=>parse_json(d.str))) f;

给出：

NAMES
SpecTra,Signal Capital Partners

而该正则表达式解决方案不处理这种情况：

with data as (
    SELECT * FROM VALUES
    ('[
    {
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},   
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}]')
    t(str)
)
select 
    trim(regexp_replace(regexp_replace(d.str, '"name":\\s*"([^"]+)"|.', '\\1,'), ',+', ','), ',') as regexp_replace
from data d

给出：

REGEXP_REPLACE
, , , SpecTra, , , , , , SpecTra, , , , , , Signal Capital Partners, ,

So pushing you text blob into a CTE.

with data as (
    SELECT * FROM VALUES
    ('[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]')
    t(str)
)

I cannot help but note it's JSON, so lets PARSE_JSON that and then FLATTEN it, and here are you "names"

select 
    d.*
    ,f.value:name::text as name
from data d
    ,table(flatten(input=>parse_json(d.str))) f

giving:

STR	NAME
[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]	SpecTra
[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]	Signal Capital Partners

And thus to aggregate, using LISTAGG

select 
    listagg(f.value:name::text, ',') as names
from data d
    ,table(flatten(input=>parse_json(d.str))) f

gives:

NAMES
SpecTra,Signal Capital Partners

Duplicate data:

you can add DISTINCT to the LISTAGG and have only the distinct values kept, but given it is a cost, I did point it out, and you didn't mention duplicate data.

with data as (
    SELECT * FROM VALUES
    ('[
    {
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},   
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}]')
    t(str)
)
select 
    listagg(distinct f.value:name::text, ',') as names
from data d
    ,table(flatten(input=>parse_json(d.str))) f;

gives:

NAMES
SpecTra,Signal Capital Partners

Where-as that regex solution does not handle this case:

with data as (
    SELECT * FROM VALUES
    ('[
    {
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},   
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}]')
    t(str)
)
select 
    trim(regexp_replace(regexp_replace(d.str, '"name":\\s*"([^"]+)"|.', '\\1,'), ',+', ','), ',') as regexp_replace
from data d

gives: