雪花正则表达式
我在 Snowflake 列中有这个字符串:
\[
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}
\]
无论公司名称有多少,我都需要获取这种格式的名称:“SpecTra,Signal Capital Partners”。换句话说,我需要提取公司名称并将它们连接起来。
我已经尝试过这个:
regexp_replace(col, '"(\[^"\]+)"|.', '\\1|')
并且 regexp_substr() 函数,但没有得到所需的输出
你能帮我解决这个问题吗? 谢谢
I have this string in Snowflake column:
\[
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}
\]
I need to get names in this format regardless of the number of company names: "SpecTra, Signal Capital Partners". In other words, I need to extract company names and concatenate them.
I have tried this :
regexp_replace(col, '"(\[^"\]+)"|.', '\\1|')
and
regexp_substr() function, but did not get the desired output
Can you please help me with this?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用
详细信息:
"name":\s*"([^"]+)"|。
正则表达式匹配"name":
,然后是零个或多个空格,以及一个"
,然后将除"
之外的任何一个或多个字符捕获到第 1 组中,然后匹配"
> char,并替换为组 1 和 a逗号regexp_replace
将所有逗号缩小为单个逗号,,+
匹配一个或多个逗号(您也可以使用更具体的,{2 ,}
模式改为此处)trim
删除开头和结尾的逗号。You can use
Details:
"name":\s*"([^"]+)"|.
regex matches"name":
, then zero or more whitespaces, and a"
, and then captures into Group 1 any one or more chars other than"
and then matches a"
char, and replaces with Group 1 and a commaregexp_replace
shrinks all commas into a single occurrence of a comma,,+
matches one or more commas (you may also use a more specific,{2,}
pattern here instead)trim
removes commas from start and end.因此将文本 blob 推入 CTE。
我忍不住注意到它是 JSON,所以让 PARSE_JSON然后扁平化它,这是您“命名”
给出:
因此使用 LISTAGG
给出:
重复数据:
您可以将 DISTINCT 添加到 LISTAGG 中,并且只保留不同的值,但考虑到这是一个成本,我确实指出了这一点,并且您没有提到重复数据。
给出:
而该正则表达式解决方案不处理这种情况:
给出:
So pushing you text blob into a CTE.
I cannot help but note it's JSON, so lets PARSE_JSON that and then FLATTEN it, and here are you "names"
giving:
And thus to aggregate, using LISTAGG
gives:
Duplicate data:
you can add DISTINCT to the LISTAGG and have only the distinct values kept, but given it is a cost, I did point it out, and you didn't mention duplicate data.
gives:
Where-as that regex solution does not handle this case:
gives: