PL/SQL 中的表转换/字段解析

发布于 2024-08-31 05:33:28 字数 386 浏览 1 评论 0原文

我有非规范化的表,类似于

CODES

ID  | VALUE
10  | A,B,C
11  | A,B
12  | A,B,C,D,E,F
13  | R,T,D,W,W,W,W,W,S,S

The job is to conversion ,其中 VALUE 中的每个标记将生成新行。示例:

CODES_TRANS

ID  | VALUE_TRANS
10  | A
10  | B
10  | C
11  | A
11  | B

在不使用自定义 pl/sql 包(最好使用纯 SQL)的情况下,在 PL/SQL 中执行此操作的最佳方法是什么?

显而易见的解决方案是通过游标来实现它。有什么想法吗?

I have de-normalized table, something like

CODES

ID  | VALUE
10  | A,B,C
11  | A,B
12  | A,B,C,D,E,F
13  | R,T,D,W,W,W,W,W,S,S

The job is to convert is where each token from VALUE will generate new row. Example:

CODES_TRANS

ID  | VALUE_TRANS
10  | A
10  | B
10  | C
11  | A
11  | B

What is the best way to do it in PL/SQL without usage of custom pl/sql packages, ideally with pure SQL?

Obvious solution is to implement it via cursors. Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

奈何桥上唱咆哮 2024-09-07 05:33:28

另一种替代方法是使用 model 子句:

SQL> select id
  2       , value
  3    from codes
  4   model
  5         return updated rows
  6         partition by (id)
  7         dimension by (-1 i)
  8         measures (value)
  9         ( value[for i from 0 to length(value[-1])-length(replace(value[-1],',')) increment 1]
 10           = regexp_substr(value[-1],'[^,]+',1,cv(i)+1)
 11         )
 12   order by id
 13       , i
 14  /

        ID VALUE
---------- -------------------
        10 A
        10 B
        10 C
        11 A
        11 B
        12 A
        12 B
        12 C
        12 D
        12 E
        12 F
        13 R
        13 T
        13 D
        13 W
        13 W
        13 W
        13 W
        13 W
        13 S
        13 S

21 rows selected.

我在此博文中为此类查询编写了最多 6 个替代方法: http://rwijk.blogspot.com/2007/11/interval-based-row- Generation.html

问候,
抢。

Another alternative is to use the model clause:

SQL> select id
  2       , value
  3    from codes
  4   model
  5         return updated rows
  6         partition by (id)
  7         dimension by (-1 i)
  8         measures (value)
  9         ( value[for i from 0 to length(value[-1])-length(replace(value[-1],',')) increment 1]
 10           = regexp_substr(value[-1],'[^,]+',1,cv(i)+1)
 11         )
 12   order by id
 13       , i
 14  /

        ID VALUE
---------- -------------------
        10 A
        10 B
        10 C
        11 A
        11 B
        12 A
        12 B
        12 C
        12 D
        12 E
        12 F
        13 R
        13 T
        13 D
        13 W
        13 W
        13 W
        13 W
        13 W
        13 S
        13 S

21 rows selected.

I have written up to 6 alternatives for this type of query in this blogpost: http://rwijk.blogspot.com/2007/11/interval-based-row-generation.html

Regards,
Rob.

剑心龙吟 2024-09-07 05:33:28

我为您提供了一个纯 SQL 解决方案。

我改编了我在 上发现的技巧一个旧的 Ask Tom 网站,由 Mihail Bratu 发布。我的改编使用正则表达式来标记 VALUE 列,因此它需要 10g 或更高。

测试数据。

SQL> select * from t34
  2  /

        ID VALUE
---------- -------------------------
        10 A,B,C
        11 A,B
        12 A,B,C,D,E,F
        13 R,T,D,W1,W2,W3,W4,W5,S,S

SQL>

查询:

SQL> select   t34.id
  2            , t.column_value value
  3  from t34
  4       , table(cast(multiset(
  5              select regexp_substr (t34.value, '[^(,)]+', 1, level)
  6              from dual
  7              connect by level <= length(value)
  8         ) as sys.dbms_debug_vc2coll )) t
  9  where t.column_value != ','
 10  /

        ID VALUE
---------- -------------------------
        10 A
        10 B
        10 C
        11 A
        11 B
        12 A
        12 B
        12 C
        12 D
        12 E
        12 F
        13 R
        13 T
        13 D
        13 W1
        13 W2
        13 W3
        13 W4
        13 W5
        13 S
        13 S

21 rows selected.

SQL> 

I have a pure SQL solution for you.

I adapted a trick I found on an old Ask Tom site, posted by Mihail Bratu. My adaptation uses regex to tokenise the VALUE column, so it requires 10g or higher.

The test data.

SQL> select * from t34
  2  /

        ID VALUE
---------- -------------------------
        10 A,B,C
        11 A,B
        12 A,B,C,D,E,F
        13 R,T,D,W1,W2,W3,W4,W5,S,S

SQL>

The query:

SQL> select   t34.id
  2            , t.column_value value
  3  from t34
  4       , table(cast(multiset(
  5              select regexp_substr (t34.value, '[^(,)]+', 1, level)
  6              from dual
  7              connect by level <= length(value)
  8         ) as sys.dbms_debug_vc2coll )) t
  9  where t.column_value != ','
 10  /

        ID VALUE
---------- -------------------------
        10 A
        10 B
        10 C
        11 A
        11 B
        12 A
        12 B
        12 C
        12 D
        12 E
        12 F
        13 R
        13 T
        13 D
        13 W1
        13 W2
        13 W3
        13 W4
        13 W5
        13 S
        13 S

21 rows selected.

SQL> 
无戏配角 2024-09-07 05:33:28

根据 Celko 的书,这是我发现的并且效果很好!

  SELECT 
    TABLE1.ID
    , MAX(SEQ1.SEQ) AS START_POS
    , SEQ2.SEQ AS END_POS
    , COUNT(SEQ2.SEQ) AS PLACE
  FROM 
    TABLE1, V_SEQ SEQ1, V_SEQ SEQ2
  WHERE 
    SUBSTR(',' || TABLE1.VALUE || ',', SEQ1.SEQ, 1) = ','
    AND SUBSTR(',' || TABLE1.VALUE || ',', SEQ2.SEQ, 1) = ','
    AND SEQ1.SEQ < SEQ2.SEQ
    AND SEQ2.SEQ <= LENGTH(TABLE1.VALUE)
  GROUP BY TABLE1.ID, TABLE1.VALUE, SEQ2.SEQ

其中 V_SEQ 是一个只有一个字段的静态表:

SEQ, integer values 1 through N, where N >= MAX_LENGTH(VALUE).

这是基于 VALUE 两端都用“,”包裹的事实,如下所示:

,A,B,C,D,

如果您的标记是固定长度的(就像我的情况一样),我只需使用 PLACE 字段来计算实际的字符串。如果长度可变,则使用 start_pos 和 end_pos

所以,在我的例子中,标记是 2 个字符长,所以最终的 SQL 是:

SELECT 
    TABLE1.ID
    , SUBSTR(TABLE1.VALUE, T_SUB.PLACE * 3 - 2 , 2 ) AS SINGLE_VAL
FROM
(
  SELECT 
    TABLE1.ID
    , MAX(SEQ1.SEQ) AS START_POS
    , SEQ2.SEQ AS END_POS
    , COUNT(SEQ2.SEQ) AS PLACE
  FROM 
    TABLE1, V_SEQ SEQ1, V_SEQ SEQ2
  WHERE 
    SUBSTR(',' || TABLE1.VALUE || ',', SEQ1.SEQ, 1) = ','
    AND SUBSTR(',' || TABLE1.VALUE || ',', SEQ2.SEQ, 1) = ','
    AND SEQ1.SEQ < SEQ2.SEQ
    AND SEQ2.SEQ <= LENGTH(TABLE1.VALUE)
  GROUP BY TABLE1.ID, TABLE1.VALUE, SEQ2.SEQ
) T_SUB
INNER JOIN 
  TABLE1 ON TABLE1.ID = T_SUB.ID
ORDER BY TABLE1.ID, T_SUB.PLACE   

Based on Celko's book, here is what I found and it's working well!

  SELECT 
    TABLE1.ID
    , MAX(SEQ1.SEQ) AS START_POS
    , SEQ2.SEQ AS END_POS
    , COUNT(SEQ2.SEQ) AS PLACE
  FROM 
    TABLE1, V_SEQ SEQ1, V_SEQ SEQ2
  WHERE 
    SUBSTR(',' || TABLE1.VALUE || ',', SEQ1.SEQ, 1) = ','
    AND SUBSTR(',' || TABLE1.VALUE || ',', SEQ2.SEQ, 1) = ','
    AND SEQ1.SEQ < SEQ2.SEQ
    AND SEQ2.SEQ <= LENGTH(TABLE1.VALUE)
  GROUP BY TABLE1.ID, TABLE1.VALUE, SEQ2.SEQ

Where V_SEQ is a static table with one field:

SEQ, integer values 1 through N, where N >= MAX_LENGTH(VALUE).

This is based on the fact the the VALUE is wrapped by ',' on both ends, like this:

,A,B,C,D,

If your tokens are fixed length (like in my case) I simply used PLACE field to calculate the actual string. If variable length, use start_pos and end_pos

So, in my case, tokens are 2 char long, so the final SQL is:

SELECT 
    TABLE1.ID
    , SUBSTR(TABLE1.VALUE, T_SUB.PLACE * 3 - 2 , 2 ) AS SINGLE_VAL
FROM
(
  SELECT 
    TABLE1.ID
    , MAX(SEQ1.SEQ) AS START_POS
    , SEQ2.SEQ AS END_POS
    , COUNT(SEQ2.SEQ) AS PLACE
  FROM 
    TABLE1, V_SEQ SEQ1, V_SEQ SEQ2
  WHERE 
    SUBSTR(',' || TABLE1.VALUE || ',', SEQ1.SEQ, 1) = ','
    AND SUBSTR(',' || TABLE1.VALUE || ',', SEQ2.SEQ, 1) = ','
    AND SEQ1.SEQ < SEQ2.SEQ
    AND SEQ2.SEQ <= LENGTH(TABLE1.VALUE)
  GROUP BY TABLE1.ID, TABLE1.VALUE, SEQ2.SEQ
) T_SUB
INNER JOIN 
  TABLE1 ON TABLE1.ID = T_SUB.ID
ORDER BY TABLE1.ID, T_SUB.PLACE   
转角预定愛 2024-09-07 05:33:28

原始答案

在 SQL Server TSQL 中,我们解析字符串并创建一个表对象。这是示例代码 - 也许你可以翻译它。

http://rbgupta.blogspot.com/2007/ 10/tsql-parsing-delimited-string-into.html

第二个选项

计算每行的逗号数量。获取逗号的最大数量。假设整个表中有一行最多包含 5 个逗号。构建一个包含 5 个子字符串的 SELECT。这将使其成为基于集合的操作,并且应该比 rbar 快得多。

Original Answer

In SQL Server TSQL we parse strings and make a table object. Here is sample code - maybe you can translate it.

http://rbgupta.blogspot.com/2007/10/tsql-parsing-delimited-string-into.html

Second Option

Count the number of commas per row. Get the Max number of commas. Let's say that in the entire table you have a row with 5 commas max. Build a SELECT with 5 substrings. This will make it a set based operation and should be much faster than a rbar.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文