将嵌套阵列转换为雪花

发布于 2025-02-08 15:39:57 字数 524 浏览 2 评论 0原文

我有一个表列,上面有雪花数据库中的嵌套数组。我想以雪花SQL中所示的方式将嵌套数组转换为列。

表名称:Sensor_data

Rx列是数据类型变体的。如下所示,嵌套阵列不会总是为3。在某些情况下,有20,000个嵌套阵列,而其他情况则没有。

| ID |             RX              |
|----|-----------------------------|
| 1  |[[0, 15], [1, 50], [2, 34.2]]|
| 2  |[[0, 20], [1, 25]]           |
| 3  |[[0, 100], [1, 42], [2, 76]] |

我想从上表中实现这样的东西:

| ID |Col0 | Col1| Col2|
|----|-----|-----|-----|
| 1  |  15 |  50 | 34.2|
| 2  |  20 |  25 | NULL|
| 3  | 100 |  42 | 76  |

I have a table column with nested arrays in a Snowflake database. I want to convert the nested array into columns in the manner shown below in Snowflake SQL.

Table Name: SENSOR_DATA

The RX column is of data type VARIANT. The nested arrays will not always be 3 as shown below. There are cases where there are 20,000 nested arrays, and other cases where there are none.

| ID |             RX              |
|----|-----------------------------|
| 1  |[[0, 15], [1, 50], [2, 34.2]]|
| 2  |[[0, 20], [1, 25]]           |
| 3  |[[0, 100], [1, 42], [2, 76]] |

I want to achieve something like this from the table above:

| ID |Col0 | Col1| Col2|
|----|-----|-----|-----|
| 1  |  15 |  50 | 34.2|
| 2  |  20 |  25 | NULL|
| 3  | 100 |  42 | 76  |

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

荒人说梦 2025-02-15 15:39:57

让我们重新创建该表:

create table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]');

然后获取您拥有的不同列键:

select distinct r.value[0] from sensor_data, table(flatten(input=>rx)) r;
r.value [0]
0
1
2

给定我们希望这是动态的,下一步只是使用静态SQL检查它,它给了我们正确的答案:

 select id
    ,max(iff(r.value[0] = 0, r.value[1], null)) as col_0
    ,max(iff(r.value[0] = 1, r.value[1], null)) as col_1
    ,max(iff(r.value[0] = 2, r.value[1], null)) as col_2
 from sensor_data, table(flatten(input=>rx)) r
 group by 1
 order by 1;
IDcol_0col_1col_2
1155034.2
22025null
31004276

立即写入动态执行此操作:

declare
  sql string;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  return sql;
end;

这给了我们:

选择ID,max(iff(r.value [0] = 0,r.value 1 ,null))为col_0,max(iff(r.value [0] = 1,r.value 1 ,null))为col_1,max(iff(iff(r.value [0] = 2,r.value,r.value) 1 ,null)表(FLATTEN(输入=> rx))R组按1顺序为1

,当运行给我们预期的结果时。

因此,现在我们想运行:

declare
  sql string;
  res resultset;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  
  res := (execute immediate :sql);
  return table (res);
end;

给予:

IDCOL_0COL_1COL_2
1155034.2
22025NULL
31004276

基于手册的这些部分的代码:

与loops一起工作

雪花扫描/结果表.html“ rel =“ nofollow noreferrer”>与结果集一起使用

和上述顺序:

create or replace table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]')
             ,(4, '[[0,20],[30,50], [45, 100]]');
declare
  sql string;
  res resultset;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r order by key;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  
  res := (execute immediate :sql);
  return table (res);
end;

giss:

idcol_0col_0col_1col_2col_3
带有额外和提及的顺序棘手
数据25nullnullnull
31004276nullnull
420nullnull50100

,所以是的,但这是最大的操作:

因此,如果我们返回原始数据

create or replace table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]');

并更改代码以不具有最大和组,

 select id
    ,iff(r.value[0] = 0, r.value[1], null) as col_0
    ,iff(r.value[0] = 1, r.value[1], null) as col_1
    ,iff(r.value[0] = 2, r.value[1], null) as col_2
 from sensor_data, table(flatten(input=>rx)) r
 order by 1;  

我们会看到:

IDcol_0col_1col_2
115nullnull
11 null50null
null 1nullnull34.2
220nullnull
2null2 null 2 null 2 null 25null
3100nullnull
3null 3 null3 null 42null
3null 3 nullnull76,

以便我们可以看到这些值被插入了,现在由ID进行了,现在是ID的组,将要滚动这三个ID,因此对于Col0,我们具有15,null,Null max采用最高的值,而15较高,而null则是null的。这是正在进行的过程。

Lets recreate that table:

create table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]');

Then get the distinct column keys that you have:

select distinct r.value[0] from sensor_data, table(flatten(input=>rx)) r;
R.VALUE[0]
0
1
2

Given we want this to be dynamic, the next step is just to check with static SQL it gives us the correct answer:

 select id
    ,max(iff(r.value[0] = 0, r.value[1], null)) as col_0
    ,max(iff(r.value[0] = 1, r.value[1], null)) as col_1
    ,max(iff(r.value[0] = 2, r.value[1], null)) as col_2
 from sensor_data, table(flatten(input=>rx)) r
 group by 1
 order by 1;
IDCOL_0COL_1COL_2
1155034.2
22025null
31004276

write now to do this dynamically:

declare
  sql string;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  return sql;
end;

which gives us:

select id ,max(iff(r.value[0] = 0, r.value1, null)) as col_0,max(iff(r.value[0] = 1, r.value1, null)) as col_1,max(iff(r.value[0] = 2, r.value1, null)) as col_2 from sensor_data, table(flatten(input=>rx)) r group by 1 order by 1

which when run gives us the expected results.

so now we want to run that:

declare
  sql string;
  res resultset;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  
  res := (execute immediate :sql);
  return table (res);
end;

gives:

IDCOL_0COL_1COL_2
1155034.2
22025null
31004276

based of code from these sections of the manual:

Working with loops

Working with Resultsets

With extra "tricky data" and the mentioned ORDER BY:

create or replace table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]')
             ,(4, '[[0,20],[30,50], [45, 100]]');
declare
  sql string;
  res resultset;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r order by key;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  
  res := (execute immediate :sql);
  return table (res);
end;

gives:

IDCOL_0COL_1COL_2COL_30COL_45
1155034.2nullnull
22025nullnullnull
31004276nullnull
420nullnull50100

So yes, but that's the MAX doing:

so if we go back in to original data

create or replace table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]');

and alter the code to not have the MAX and the GROUP BY

 select id
    ,iff(r.value[0] = 0, r.value[1], null) as col_0
    ,iff(r.value[0] = 1, r.value[1], null) as col_1
    ,iff(r.value[0] = 2, r.value[1], null) as col_2
 from sensor_data, table(flatten(input=>rx)) r
 order by 1;  

we see:

IDCOL_0COL_1COL_2
115nullnull
1null50null
1nullnull34.2
220nullnull
2null25null
3100nullnull
3null42null
3nullnull76

so we can see those values getting unrolled, now the GROUP BY id, is going to roll those three id's up, thus for col0 we have 15, null, null MAX takes the higest value, and 15 is higher that null so that is what is keep. This is the process this is working on.

柠檬色的秋千 2025-02-15 15:39:57

使用[]访问数组元素:

SELECT ID, RX[0][1] AS col1,  RX[1][1] AS col1, RX[2][1] AS col2
FROM SENSOR_DATA;

Using [] to access array elements:

SELECT ID, RX[0][1] AS col1,  RX[1][1] AS col1, RX[2][1] AS col2
FROM SENSOR_DATA;
┊风居住的梦幻卍 2025-02-15 15:39:57

您的要求不完全是什么,但这很接近

with sensor_data as (
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]')
       as vals
),
flat as (
select id, val.value[1] arrvalue
  from sensor_data,
  lateral flatten(input => sensor_data.rx, outer => true) val
 )
select
     id
    ,listagg(arrvalue, ',') rx_list
from flat
group by id
order by id
  ;
ID  RX_LIST
1   15,50,34.2
2   20,25
3   100,42,76

Not exactly what you asked for but this is close

with sensor_data as (
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]')
       as vals
),
flat as (
select id, val.value[1] arrvalue
  from sensor_data,
  lateral flatten(input => sensor_data.rx, outer => true) val
 )
select
     id
    ,listagg(arrvalue, ',') rx_list
from flat
group by id
order by id
  ;
ID  RX_LIST
1   15,50,34.2
2   20,25
3   100,42,76
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文