用于确定表格数据属于哪种范式的 Python 代码

发布于 2024-08-19 16:26:03 字数 56 浏览 6 评论 0原文

我正在寻找可以获取表格数据并确定其采用的范式(如果有)并显示任何函数依赖关系等的Python代码。

I'm looking for Python code that can take tabular data and establish which normal form(s) it is in (if any) and show any functional dependencies, etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟花肆意 2024-08-26 16:26:03

有“标准化”的逻辑测试。然而,它们并不是简单的编程练习;它们是一种简单的编程练习。它们是强加于数据的元数据中的关系。他们需要“思考”。

1NF——无重复组。如何识别“重复组”?它将是强加在表的列上的数组结构。这是怎么做到的? SQL 不提供机制,因此您必须查看列名来检查“模式”。例如,COL_1、COL_2、COL_3。

1NF——行的一致布局。呃。 SQL 通过表定义的本质强加了这一点。

2NF——一行中的数据取决于键。你必须执行类似这样的程序。

For each non-key column:
   Query distinct pairs (Key and the non-key column)
   Do all non-key values depend in a consistent way on a key value?
   Can you build a simple dict mapping non-key to key?

完整的算法在这里: http://en.wikipedia.org/wiki/Relational_model# Set-theoretic_formulation

3NF——一行中的数据仅取决于键。这更糟糕,因为您必须将非键列的所有组合与非键列的所有组合进行比较,以确保值之间不存在非键依赖性。

4NF 和 5NF 让我感到困惑,所以我就到此为止。

我的观点是——理论上——你可以做到。实际上,需要对数据进行大量复杂的排列才能断言范式关系实际上成立。

对特定违规行为做出假设并通过一些 SQL 查询和一些思考来探究该问题要容易得多。

正式的数学在这里:

http://en.wikipedia.org/wiki/Relational_model

There are logical tests for "normalization". However, they're not trivial exercises in programming; they're relationships in the metadata that are imposed on the data. They require "thinking".

1NF -- no repeating groups. How does one identify a "repeating group"? It would be an array structure imposed on the columns of a table. How is that done? SQL doesn't provide a mechanism, so you'd have to look at the column names to check for a "pattern". COL_1, COL_2, COL_3, for example.

1NF -- consistent layout of rows. Duh. SQL imposes this by the very nature of table definition.

2NF -- data in a row depends on the key. You'd have to do a procedure something like this.

For each non-key column:
   Query distinct pairs (Key and the non-key column)
   Do all non-key values depend in a consistent way on a key value?
   Can you build a simple dict mapping non-key to key?

The full algorithm is here: http://en.wikipedia.org/wiki/Relational_model#Set-theoretic_formulation

3NF -- data in a row depends ONLY on the key. This is worse, because you have to compare all combinations of non-key columns against all combinations of non-key columns to be sure that there were no non-key dependencies among the values.

4NF and 5NF confuse me, so I'll stop here.

My point is that -- theoretically -- you can do it. Practically, it's a lot of complex permutations of data to assert that the normal form relationships actually hold.

It's much, much easier to have a hypothesis about a specific violation and probe just that issue with some SQL queries and some thinking.

The formal math is here:

http://en.wikipedia.org/wiki/Relational_model

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文