当前位置：文江博客话题详情

snowflake-schema snowflake-cloud-data-platform

在分离器之间提取单词

发布于 2025-02-13 23:46:00 字数 895 浏览 0 评论 0 原文

我的输入如下

我想要以下

我正在尝试

Sales External?HR?Purchase Department

使用Listagg，因为最终我想要在单独的列中

查询输出就像以下一样，这意味着它应该搜索第一次出现分隔符（在这种情况下“？”，它可以是任何东西，而不是常见的，例如“ - ”或“/”，因为分离器需要分开，然后提取短语在第一个分离器之前，创建一个列并放置值。然后，它应该寻找分隔符的第二次出现，然后提取单词并继续创建列，可以有多个分离器。

我尝试使用split_part尝试，但它在实际数据方案中无法维护序列，并且数据不正确。

我还尝试了RegexP_Instr，但无法使用特殊字符作为分离器。有什么想法吗？

原文

I have input like below

I want like below

I was trying with

Sales External?HR?Purchase Department

I did LISTAGG because finally i want in separate columns

Query Output would be like below,
meaning it should search for first occurrence of the separator (in this case "?", it can be anything but not common ones like "-" or "/" as the separator needs to be separate than sting value) and then extract the phrase before the first separator and create one column and put the value. Then it should look for second occurrence of the separator and then extract the word and keep creating columns, there can be multiple separators.

I tried with SPLIT_PART but it does not maintain the sequence in real data scenario and data does not come correct as per sequence.

I also tried with REGEXP_INSTR, but unable to use special characters as separators.
Any thought?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初心未许 2025-02-20 23:46:00

regex提取器

SELECT 
  REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")

href =“ https://docs.snowflake.com/en/sql-reference/functions/flatten.html” rel =“ nofollow noreferrer”>侧面扁平将您的阵列转化为行：

WITH MY_CTE AS (
  SELECT 
    REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")
)
SELECT
  *
FROM
  LATERAL FLATTEN(INPUT => MY_CTE, MODE=> 'ARRAY')

更深的潜水：更多案例： https://dwgeek.com/snowflake-convert-array -to-lows-methods and-examples.html/

Regex Extract should work for you:

SELECT 
  REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")

You can use LATERAL FLATTEN to convert your array into rows:

WITH MY_CTE AS (
  SELECT 
    REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")
)
SELECT
  *
FROM
  LATERAL FLATTEN(INPUT => MY_CTE, MODE=> 'ARRAY')

Deeper dive into some more cases: https://dwgeek.com/snowflake-convert-array-to-rows-methods-and-examples.html/

回复收藏 0 原文

标点 2025-02-20 23:46:00

这是数据的简化版本。它使用带有array_agg的CTE来分组行。然后，它从数组更改为列。要添加更多列，您可以使用max（），min（）或any_value（）函数来通过聚合使它们通过。（请注意，使用Any_value（）将不允许使用结果集缓存的缓存结果，因为它被标记为非确定性。）

create or replace table T1 (EMPID int, ROLE string, ACCESS string, ACCESS_LVL string, ITERATION string);

insert into T1(EMPID, ROLE, ACCESS, ACCESS_LVL, ITERATION) values
(1234, 'Sales Rep', 'Specific', 'REGION', 'DEV'),
(1234, 'Purchase Rep', 'Specific', 'EVERY', 'PROD'),
(1234, 'HR', NULL, 'Dept', 'PROD'),
(4321, 'HR', 'Foo', 'Foo', 'Foo')
;

with X as
(
    select   EMPID
            ,array_agg(nvl(ROLE,''))       within group (order by ROLE) ARR_ROLE
            ,array_agg(nvl(ACCESS,''))     within group (order by ROLE) ARR_ACCESS
            ,array_agg(nvl(ACCESS_LVL,'')) within group (order by ROLE) ARR_ACCESS_LVL
            ,array_agg(nvl(ITERATION,''))  within group (order by ROLE) ARR_ITERATION
    from T1
    group by EMPID
)
select  EMPID
       ,ARR_ROLE[0]::string       as ROLE1
       ,ARR_ROLE[1]::string       as ROLE2
       ,ARR_ROLE[2]::string       as ROLE3
       ,ARR_ACCESS[0]::string     as ACCESS1
       ,ARR_ACCESS[1]::string     as ACCESS2
       ,ARR_ACCESS[2]::string     as ACCESS3
       ,ARR_ACCESS_LVL[0]::string as ACCESS_LVL1
       ,ARR_ACCESS_LVL[1]::string as ACCESS_LVL2
       ,ARR_ACCESS_LVL[2]::string as ACCESS_LVL3
       ,ARR_ITERATION[0]::string  as ITERATION1
       ,ARR_ITERATION[1]::string  as ITERATION2
       ,ARR_ITERATION[2]::string  as ITERATION3
from X
;

将行分类为阵列似乎没有什么特别的东西，以便将行分解为cool1，cole2，cole2，cool3，cool3，etc。是确定性的。我显示了简单地对角色的名称进行排序，但这可能是该组中的任何顺序。

Here's a simplified version of the data. It uses a CTE with array_agg to group the rows. It then changes from arrays to columns. To add more columns, you can use max(), min(), or any_value() functions to get them through the aggregation. (Note that use of any_value() will not allow use of cached results from the result set cache since it's flagged as nondeterministic.)

create or replace table T1 (EMPID int, ROLE string, ACCESS string, ACCESS_LVL string, ITERATION string);

insert into T1(EMPID, ROLE, ACCESS, ACCESS_LVL, ITERATION) values
(1234, 'Sales Rep', 'Specific', 'REGION', 'DEV'),
(1234, 'Purchase Rep', 'Specific', 'EVERY', 'PROD'),
(1234, 'HR', NULL, 'Dept', 'PROD'),
(4321, 'HR', 'Foo', 'Foo', 'Foo')
;

with X as
(
    select   EMPID
            ,array_agg(nvl(ROLE,''))       within group (order by ROLE) ARR_ROLE
            ,array_agg(nvl(ACCESS,''))     within group (order by ROLE) ARR_ACCESS
            ,array_agg(nvl(ACCESS_LVL,'')) within group (order by ROLE) ARR_ACCESS_LVL
            ,array_agg(nvl(ITERATION,''))  within group (order by ROLE) ARR_ITERATION
    from T1
    group by EMPID
)
select  EMPID
       ,ARR_ROLE[0]::string       as ROLE1
       ,ARR_ROLE[1]::string       as ROLE2
       ,ARR_ROLE[2]::string       as ROLE3
       ,ARR_ACCESS[0]::string     as ACCESS1
       ,ARR_ACCESS[1]::string     as ACCESS2
       ,ARR_ACCESS[2]::string     as ACCESS3
       ,ARR_ACCESS_LVL[0]::string as ACCESS_LVL1
       ,ARR_ACCESS_LVL[1]::string as ACCESS_LVL2
       ,ARR_ACCESS_LVL[2]::string as ACCESS_LVL3
       ,ARR_ITERATION[0]::string  as ITERATION1
       ,ARR_ITERATION[1]::string  as ITERATION2
       ,ARR_ITERATION[2]::string  as ITERATION3
from X
;

There's nothing particular that seems interesting to sort the rows into the array so that ROLE1, ROLE2, ROLE3, etc. are deterministic. I showed simply sorting on the name of the role, but it could be any order by within that group.

回复收藏 0 原文

太阳哥哥 2025-02-20 23:46:00

这是一个存储的POC，它将根据输入字符串和指定的定界符产生一个动态列的表结果。

如果您正在寻找一种基于值生成动态列名称的方法，我建议您在此处访问Felipe Hoffa的博客条目：
https://medium.com/medium.com/snowflake/snowflake/dynemic-pivots-pivots-pivots-pivots-pivots-po- snowflake-c76393987c in-sql-with-with-snowflake

create or replace procedure pivot_dyn_results(input string, delimiter string) 
returns table ()
language SQL 
AS
declare

max_count integer default 0;
lcount integer default 0;
rs resultset;
stmt1 string;
stmt2 string;

begin

-- Get number of delimiter separated values (assumes no leading or trailing delimiter)
select regexp_count(:input, '\\'||:delimiter, 1) into :max_count from dual;

-- Generate the initial row-based result set of parsed values
stmt1 := 'SELECT * from lateral split_to_table(?,?)';

-- Build dynamic query to produce the pivoted column based results
stmt2 := 'select * from (select * from table(result_scan(last_query_id(-1)))) pivot(max(value) for index in (';

-- initialize look counter for resulting columns
lcount := 1;
stmt2 := stmt2 || '\'' || lcount || '\'';

-- append pivot statement for each column to be represented
FOR l in 1 to max_count do
lcount := lcount + 1;
stmt2 := stmt2 || ',\'' || lcount || '\'';
END FOR;

-- close out the pivot statement
stmt2 := stmt2 || '))';

-- execute the 
EXECUTE IMMEDIATE :stmt1 using (input, delimiter);
rs := (EXECUTE IMMEDIATE :stmt2);

return table(rs);
end;

调用：
调用pivot_dyn_results（[string]，[DeLimiter]）;

call pivot_dyn_results('Sales External?HR?Billing?Purchase Department','?');

结果：

Here's a stored proc that will produce a table result with a dynamic set of columns based on the input string and specified delimiter.

If you are looking for a way to generate dynamic column names based on values, I recommend visiting Felipe Hoffa's blog entry here:
https://medium.com/snowflake/dynamic-pivots-in-sql-with-snowflake-c763933987c

create or replace procedure pivot_dyn_results(input string, delimiter string) 
returns table ()
language SQL 
AS
declare

max_count integer default 0;
lcount integer default 0;
rs resultset;
stmt1 string;
stmt2 string;

begin

-- Get number of delimiter separated values (assumes no leading or trailing delimiter)
select regexp_count(:input, '\\'||:delimiter, 1) into :max_count from dual;

-- Generate the initial row-based result set of parsed values
stmt1 := 'SELECT * from lateral split_to_table(?,?)';

-- Build dynamic query to produce the pivoted column based results
stmt2 := 'select * from (select * from table(result_scan(last_query_id(-1)))) pivot(max(value) for index in (';

-- initialize look counter for resulting columns
lcount := 1;
stmt2 := stmt2 || '\'' || lcount || '\'';

-- append pivot statement for each column to be represented
FOR l in 1 to max_count do
lcount := lcount + 1;
stmt2 := stmt2 || ',\'' || lcount || '\'';
END FOR;

-- close out the pivot statement
stmt2 := stmt2 || '))';

-- execute the 
EXECUTE IMMEDIATE :stmt1 using (input, delimiter);
rs := (EXECUTE IMMEDIATE :stmt2);

return table(rs);
end;

Invocation:
call pivot_dyn_results([string],[delimiter]);

call pivot_dyn_results('Sales External?HR?Billing?Purchase Department','?');

Results:

回复收藏 0 原文

~没有更多了~

关于作者

单身狗的梦

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

在分离器之间提取单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

在分离器之间提取单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。