Oracle：将名称值对扁平化到表中

发布于 2024-07-17 01:35:59 字数 1589 浏览 13 评论 0原文

我希望有人可以提供一些建议，以更简单的方法来处理这个问题。我正在致力于创建高度标准化的数据集的扁平化视图。扁平化的目标是提供一个普通人可以用来开发报告的视图。源数据包含几个表，如下所示：

CREATE TABLE VARIABLES  ( 
    VARIABLE_ID INT NOT NULL IDENTITY, 

    VARIABLE_NAME VARCHAR(100) NOT NULL, 

    VARIABLE_DATATYPE VARCHAR(100) NOT NULL, 

    PRIMARY KEY (VARIABLE_ID), 

    UNIQUE (VARIABLE_NAME,VARIABLE_DATATYPE) 
)

CREATE TABLE RECORD_VALUES (
    RUN_ID INT NOT NULL REFERENCES RUNS (RUN_ID) ON DELETE CASCADE, 

    VARIABLE_ID INT NOT NULL REFERENCES VARIABLES(VARIABLE_ID) ON DELETE CASCADE, 

    RECORD_ID VARCHAR(100) NOT NULL, 

    VARIABLE_VALUE VARCHAR(1000), 

    PRIMARY KEY (RUN_ID,VARIABLE_ID,RECORD_ID)
)

记录值表中的variable_id 对应于原始输入流中的变量之一，例如地址或帐户余额。对于包含 12 个变量的输入记录，记录值表中将有 12 行。

原始过程的输入包括不同宽度和变量名称的记录。它们被分成 record_values 表中的名称/值元组。我正在编写一个过程，将变量重新组装回记录，看起来像

run_id
record_id (which is actually an underlying account number)
variable_value_1
variable_value_2
...
variable_value_n

我当前的方法是通过查找给定运行集的唯一变量来动态构建表（细节在这里并不重要），然后构建一个 SQL 字符串将创建该表。

我的挑战是如何有效地从原始数据加载结果工作表。由于变量的名称和数量随 run_id 的不同而变化，我能想到的解决此问题的唯一方法是通过以下方式：

create a cursor for the list of variables
for each variable in the list
   create a cursor to find all the record values for that variable
   for each record value
       update the appropriate record/column in the work table
   end
end

这将永远运行，因为父表有数百百万行。

有谁知道如何生成一种可用于对每个目标行进行一次更新的方法？

以免有人跳到原始表格的设计上 - 这样做是有商业原因的。我不喜欢这样做，但这样做有充分的理由。

感谢您提供的任何想法。

安德鲁

原文

I'm hoping someone can provide some advise for an easier way to deal with this problem. I am working on creating a flattened view of a highly normalized set of data. The goal of flattening is to provide a view which normal people can use to develop reports from. The source data contains a couple of tables as shown:

CREATE TABLE VARIABLES  ( 
    VARIABLE_ID INT NOT NULL IDENTITY, 

    VARIABLE_NAME VARCHAR(100) NOT NULL, 

    VARIABLE_DATATYPE VARCHAR(100) NOT NULL, 

    PRIMARY KEY (VARIABLE_ID), 

    UNIQUE (VARIABLE_NAME,VARIABLE_DATATYPE) 
)

CREATE TABLE RECORD_VALUES (
    RUN_ID INT NOT NULL REFERENCES RUNS (RUN_ID) ON DELETE CASCADE, 

    VARIABLE_ID INT NOT NULL REFERENCES VARIABLES(VARIABLE_ID) ON DELETE CASCADE, 

    RECORD_ID VARCHAR(100) NOT NULL, 

    VARIABLE_VALUE VARCHAR(1000), 

    PRIMARY KEY (RUN_ID,VARIABLE_ID,RECORD_ID)
)

The variable_id in the record values table corresponds to one of the variables in the original input stream, say an address or an account balance. For an input record that has 12 variables in it, there will be twelve rows in the record values table.

The inputs to the original process include records of varying widths and variable names. These are split out into name/value tuples in the record_values table. I am writing a procedure to reassemble the variables back into a record that looks like

run_id
record_id (which is actually an underlying account number)
variable_value_1
variable_value_2
...
variable_value_n

My current approach is to dynamically build the table by finding the unique variables for the given set of runs (details not important here), and then building a SQL string that will create the table.

My challenge is how to efficiently load this resulting work table from the original data. Since the names and number of the variables varies with run_id, the only way I can think of to approach this is through something like:

create a cursor for the list of variables
for each variable in the list
   create a cursor to find all the record values for that variable
   for each record value
       update the appropriate record/column in the work table
   end
end

This is going to run forever, as the parent tables have 100's of millions of rows.

Does anyone have an idea about how to generate an approach that I can use to do one update per destination row?

Lest anyone jump on the design of the original tables - there were business reasons to do it this way. I don't like it, but there were good reasons for doing it.

Thanks for any thoughts you can provide.

Andrew

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风铃鹿 2024-07-24 01:35:59

在 Oracle 10g 及更高版本中：

SELECT  DISTINCT run_id, record_id, val1, val2, ..., val12
FROM    record_values
MODEL
IGNORE NAV
PARTITION BY
        (run_id, record_id)
DIMENSION BY
        (variable_id)
MEASURES
        (val, 0 AS val1, 0 AS val2, ..., 0 AS val12)
RULES UPDATE
        (
        val1[ANY] = val[1], /* Put real variable ID's in the square brackets */
        val2[ANY] = val[2],
        ...,
        val12[ANY] = val[12]
        )

In Oracle 10g and above:

SELECT  DISTINCT run_id, record_id, val1, val2, ..., val12
FROM    record_values
MODEL
IGNORE NAV
PARTITION BY
        (run_id, record_id)
DIMENSION BY
        (variable_id)
MEASURES
        (val, 0 AS val1, 0 AS val2, ..., 0 AS val12)
RULES UPDATE
        (
        val1[ANY] = val[1], /* Put real variable ID's in the square brackets */
        val2[ANY] = val[2],
        ...,
        val12[ANY] = val[12]
        )

回复收藏 0 原文

~没有更多了~