规范化表:在一系列行中查找唯一列 (Oracle 10.x)
我有一个具有以下结构的表:
WorkerPersons
-------------------------------
ID (PK)
PersonID (Indicates which version of Person the record describes)
SomeColumn1 (data specific to Worker)
SomeColumn2 (data specific to Person)
....
SomeColumnN
-------------------------------
正如您所看到的,它是一个非规范化的表,它在一个表中同时保存 Worker 和 Person(以及一个 Person 的许多版本)数据。我的愿望是规范化该表,但是,由于该表包含大量数据(许多列),我需要确定哪些列应进入 Workers 表,哪些列应进入 Persons 表。结果应该是这样的:
Workers Persons
----------------------- ---------------------
ID ID
PersonID (now a FK) PersonColumn1
WorkerColumn1 PersonColumn2
WorkerColumn2 ...
... PersonColumnN
WorkerColumnN
----------------------- ---------------------
为此,我需要分析哪些数据在所有唯一 Person 中的 Person 范围有所不同(在 WorkerPersons 中由 PersonID 分隔)。例如:
WorkerPersons
-------------------------------------------------------
ID PersonID Column1 Column2 Column3
-------------------------------------------------------
1 PersonA 10.1 John Doe Single
2 PersonA 10.1 John Doe Single
3 PersonA 10.1 John Doe Married
4 PersonB 09.2 Sully Single
5 PersonB 09.2 Sullivan Single
在本例中,PersonA 有 3 个版本,PersonB 有 2 个版本。 Column1 值在所有版本的 Person 中始终相同,我们可以将该列移至表 Worker。但是 Column 2 和 Column3 值在不同版本的 Person 中发生变化,因此这些值应该移动到 Person 表中。
想象不到,我有大约 10 个这样的表需要标准化,每个表大约有 40 列。每个表保存大约 500k 到 5m 行。
我需要一个脚本来帮助我分析哪些列要移动到哪里。我需要一个脚本来输出整个表中唯一 Person 范围内发生变化的所有列。我不知道如何做到这一点。我尝试使用 LAG 分析函数来与下一行进行比较,但到底如何输出更改的列超出了我的范围。
请指教。
最好的祝愿, 安德鲁
I have a table with the following structure:
WorkerPersons
-------------------------------
ID (PK)
PersonID (Indicates which version of Person the record describes)
SomeColumn1 (data specific to Worker)
SomeColumn2 (data specific to Person)
....
SomeColumnN
-------------------------------
As you can see, it's a denormalized table, which holds both Worker and Person (and many versions of one Person) data in one table. My wish is to normalize that table, however, as the table holds a lot of data (many many columns), I need to be sure which columns should go to Workers table and which columns to Persons table. The outcome should be like this:
Workers Persons
----------------------- ---------------------
ID ID
PersonID (now a FK) PersonColumn1
WorkerColumn1 PersonColumn2
WorkerColumn2 ...
... PersonColumnN
WorkerColumnN
----------------------- ---------------------
To do that, I need to analyze which data differs in scope of Person over all unique Persons (wich are separated by PersonID in WorkerPersons). For example:
WorkerPersons
-------------------------------------------------------
ID PersonID Column1 Column2 Column3
-------------------------------------------------------
1 PersonA 10.1 John Doe Single
2 PersonA 10.1 John Doe Single
3 PersonA 10.1 John Doe Married
4 PersonB 09.2 Sully Single
5 PersonB 09.2 Sullivan Single
In this case, there are 3 versions on PersonA and 2 versions of PersonB. Column1 values are always the same over all versions of Person, and we can move that column to table Worker. But Column 2 and Column3 values change over different versions of Person, so those values should be moved to Person table.
No imagine, I have about 10 tables like this that need to be normalized, with about 40 columns in each. Eeach table holds about 500k to 5m rows.
I need a script that helps me analyse which columns to move where. I need a script that outputs all columns that change in scope of unique Person over the whole table. I've no ideas however how to do that. I experimented with LAG analytical function to compare against the next row but how in the world to output changed columns is beyond me.
Please advise.
Best wishes,
Andrew
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
由于 10 个表并不是很多,这里是(某种)
带有信息模式和动态查询的伪代码,您可以制作上述正确的 PL/SQL 或采用核心查询并用您最喜欢的语言编写脚本。
编辑:
上面假设
column_name
中没有NULL
。编辑2:
核心查询的其他变体可以是,
如果每个 PersonID 的所有值都相同,则将返回一行。
(此查询也存在 NULL 问题,但我认为 NULL 是一个单独的问题;出于上述查询的目的,您始终可以将 NULL 转换为某个域外值)
上述查询也可以写为
Which will test multiple同时对于属于“Workers”的列返回 true,对于应该进入“Persons”的列返回 false。
Since 10 tables is not a lot, here is (some sort of) pseudo code
with information schema and dynamic queries you could make the above proper PL/SQL or take the core query and script it in your favourite language.
EDIT:
The above assumes no
NULL
s incolumn_name
.EDIT2:
Other variants of the core query can be
Which will return a row if all values per PersonID are the same.
(this query also has problems with NULLS, but I consider NULLs a separate issue; you can always cast a NULL to some out-of-domain value for purposes of the above query)
The above query can be also written as
Which will test multiple columns at the same time returning true for columns that belong to 'Workers' and false for columns that should go into 'Persons'.
谢谢,但我通过让 Excel 在表架构信息上创建一系列选择来解决这个问题。它生成的最终查询是一长串选择,但它有效(尽管运行了一个多小时)。 “核心查询”(实际上是 Excel 中创建核心查询的公式):
Thanks, but I solved it by letting Excel create series of selects over table schema information. The final query that it generated was a long list of selects but it works (although it run over a hour). The "core query" (actually a formula in Excel to create to core query):