Oracle由数据定义的解析函数窗口

发布于 2024-08-18 21:08:03 字数 1058 浏览 13 评论 0原文

我有一个表，它表示从特定文本文件格式读取的数据的逐行转储。每行可以代表“主”行或“详细”行，通过 rec_type 代码指示。我想编写一个查询来获取“主”行以及相关的详细信息行。我想出了一些可以完成这项工作的方法，但它看起来有点老套，并且对更好的方法（如果有的话）感兴趣。

CREATE TABLE mdtest
 (rec_seq  NUMBER        PRIMARY KEY
 ,rec_type VARCHAR2(3)   NOT NULL
 ,rec_data VARCHAR2(100) NOT NULL);

INSERT INTO mdtest VALUES (1, '100', 'Bill Jones');
INSERT INTO mdtest VALUES (2, '200', '20080115,100.25');
INSERT INTO mdtest VALUES (3, '100', 'John Smith');
INSERT INTO mdtest VALUES (4, '200', '20090701,80.95');
INSERT INTO mdtest VALUES (5, '200', '20091231,110.35');

期望的结果：

SEQ_EMP  EMP_NAME    SEQ_DATA  EMP_DATA
=======  ==========  ========  ===============
      1  Bill Jones         2  20080115,100.25
      3  John Smith         4  20090701,80.95
      3  John Smith         5  20091231,110.35

假设：

按照rec_seq的顺序处理记录，
第一个记录类型是“100”，
每个“100”记录有1个或多个“200” code>" 记录如下

注意：这是针对 Oracle 9i 的，但今年我们应该升级到 11g R1。

原文

I have a table which represents a line-by-line dump of the data read from a particular text file format. Each line may represent a "master" or a "detail" line, indicated via rec_type code. I'd like to write a query that gets the "master" lines alongside the associated detail lines. I've come up with something that does the job, but it seems a bit hackish and am interested in better ways if any.

CREATE TABLE mdtest
 (rec_seq  NUMBER        PRIMARY KEY
 ,rec_type VARCHAR2(3)   NOT NULL
 ,rec_data VARCHAR2(100) NOT NULL);

INSERT INTO mdtest VALUES (1, '100', 'Bill Jones');
INSERT INTO mdtest VALUES (2, '200', '20080115,100.25');
INSERT INTO mdtest VALUES (3, '100', 'John Smith');
INSERT INTO mdtest VALUES (4, '200', '20090701,80.95');
INSERT INTO mdtest VALUES (5, '200', '20091231,110.35');

Desired Result:

SEQ_EMP  EMP_NAME    SEQ_DATA  EMP_DATA
=======  ==========  ========  ===============
      1  Bill Jones         2  20080115,100.25
      3  John Smith         4  20090701,80.95
      3  John Smith         5  20091231,110.35

Assumptions:

records are processed in sequence of rec_seq
first record type is a "100"
each "100" record has 1 or more "200" records following

Note: this is for Oracle 9i, however we should be upgrading to 11g R1 this year.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过期以后 2024-08-25 21:08:03

这是我到目前为止所做的：

SELECT seq_emp 
      ,SUBSTR(emp_seq_name,10) emp_name 
      ,seq_data 
      ,emp_data 
FROM  (SELECT MAX(CASE WHEN rec_type = '100' THEN rec_seq END) 
              OVER (ORDER BY rec_seq 
                    ROWS BETWEEN UNBOUNDED PRECEDING 
                             AND CURRENT ROW) seq_emp 
             ,MAX(CASE 
                  WHEN rec_type = '100' 
                  THEN TO_CHAR(rec_seq,'fm00000000') || '|' || rec_data 
                  END) 
              OVER (ORDER BY rec_seq 
                    ROWS BETWEEN UNBOUNDED PRECEDING 
                             AND CURRENT ROW) emp_seq_name 
             ,rec_seq seq_data 
             ,rec_type 
             ,rec_data emp_data 
       FROM   mdtest) 
WHERE  rec_type = '200' 
ORDER BY seq_data;

如您所见，我使用 MAX 报告分析函数，窗口从集合的顶部开始一直到当前行，以获取当前“200”的相关“100”记录“ 记录;然后在外部查询中我将丢弃不需要的“100”记录。

为了获得 emp_name，我必须将数据附加到 rec_seq 中，以便 MAX 函数仍然选择正确的头记录；然后在外部查询中我砍掉了rec_seq。

我尝试过其他分析函数和语法，包括 FIRST_VALUE 和 KEEP 语法，但这些似乎都没有使这项工作变得更简单；困难在于窗口是由 rec_type 的值定义的，而不是某个常量偏移量。

Here's what I have so far:

SELECT seq_emp 
      ,SUBSTR(emp_seq_name,10) emp_name 
      ,seq_data 
      ,emp_data 
FROM  (SELECT MAX(CASE WHEN rec_type = '100' THEN rec_seq END) 
              OVER (ORDER BY rec_seq 
                    ROWS BETWEEN UNBOUNDED PRECEDING 
                             AND CURRENT ROW) seq_emp 
             ,MAX(CASE 
                  WHEN rec_type = '100' 
                  THEN TO_CHAR(rec_seq,'fm00000000') || '|' || rec_data 
                  END) 
              OVER (ORDER BY rec_seq 
                    ROWS BETWEEN UNBOUNDED PRECEDING 
                             AND CURRENT ROW) emp_seq_name 
             ,rec_seq seq_data 
             ,rec_type 
             ,rec_data emp_data 
       FROM   mdtest) 
WHERE  rec_type = '200' 
ORDER BY seq_data;

As you can see, I'm using the MAX reporting analytic function with a window starting from the top of the set down to the current row, to get the relevant "100" record for the current "200" record; then in the outer query I'm discarding the unneeded "100" records.

To get emp_name, I've had to append the rec_seq with the data so that the MAX function still picks the correct header record; then in the outer query I chop the rec_seq off.

I've played with other analytic functions and syntax including FIRST_VALUE and the KEEP syntax, but neither of these seem to make this job simpler; the difficulty is that the window is defined by the value of rec_type instead of being some constant offset.

回复收藏 0 原文

掌心的温暖 2024-08-25 21:08:03

为了简单起见，您认为值得在处理之前将每种记录类型加载到单独的导入表中吗？

create table mdtest100 as select * from mdtest where rec_type = 100;

create table mdtest200 as select * from mdtest where rec_type = 200;

with mdtest_detail as
    (
    select
        (select max(m.rec_seq) from mdtest100 m 
         where m.rec_seq < r200.rec_seq) master_rec_seq,
        r200.* 
    from 
        mdtest200 r200
    )
select
    m.rec_seq seq_emp,
    m.rec_data emp_name,
    d.rec_seq seq_data,
    d.rec_data emp_data
from
    mdtest_detail d
        inner join mdtest100 m on m.rec_seq = d.master_rec_seq
order by
    seq_emp,
    seq_data;


    SEQ_EMP  EMP_NAME    SEQ_DATA   EMP_DATA          
    1        Bill Jones  2          20080115,100.25          
    3        John Smith  4          20090701,80.95          
    3        John Smith  5          20091231,110.35

这可能有助于提供更易于维护的解决方案，并让您可以单独解析和验证通信分隔的 EMP_DATA 字段。

只是一个想法 - 抱歉，如果您只是在寻找分析解决方案。

In a desire for simplicity, do you think it's worth loading each record type into a separate import table before processing?

create table mdtest100 as select * from mdtest where rec_type = 100;

create table mdtest200 as select * from mdtest where rec_type = 200;

with mdtest_detail as
    (
    select
        (select max(m.rec_seq) from mdtest100 m 
         where m.rec_seq < r200.rec_seq) master_rec_seq,
        r200.* 
    from 
        mdtest200 r200
    )
select
    m.rec_seq seq_emp,
    m.rec_data emp_name,
    d.rec_seq seq_data,
    d.rec_data emp_data
from
    mdtest_detail d
        inner join mdtest100 m on m.rec_seq = d.master_rec_seq
order by
    seq_emp,
    seq_data;


    SEQ_EMP  EMP_NAME    SEQ_DATA   EMP_DATA          
    1        Bill Jones  2          20080115,100.25          
    3        John Smith  4          20090701,80.95          
    3        John Smith  5          20091231,110.35

This might lend itself to a more maintainable solution and will let you parse out and validate the comm-separated EMP_DATA field separately.

Just a thought - sorry if you're only looking for an Analytics solution.

回复收藏 0 原文

~没有更多了~