用于确定表格数据属于哪种范式的 Python 代码
我正在寻找可以获取表格数据并确定其采用的范式(如果有)并显示任何函数依赖关系等的Python代码。
I'm looking for Python code that can take tabular data and establish which normal form(s) it is in (if any) and show any functional dependencies, etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
有“标准化”的逻辑测试。然而,它们并不是简单的编程练习;它们是一种简单的编程练习。它们是强加于数据的元数据中的关系。他们需要“思考”。
1NF——无重复组。如何识别“重复组”?它将是强加在表的列上的数组结构。这是怎么做到的? SQL 不提供机制,因此您必须查看列名来检查“模式”。例如,COL_1、COL_2、COL_3。
1NF——行的一致布局。呃。 SQL 通过表定义的本质强加了这一点。
2NF——一行中的数据取决于键。你必须执行类似这样的程序。
完整的算法在这里: http://en.wikipedia.org/wiki/Relational_model# Set-theoretic_formulation
3NF——一行中的数据仅取决于键。这更糟糕,因为您必须将非键列的所有组合与非键列的所有组合进行比较,以确保值之间不存在非键依赖性。
4NF 和 5NF 让我感到困惑,所以我就到此为止。
我的观点是——理论上——你可以做到。实际上,需要对数据进行大量复杂的排列才能断言范式关系实际上成立。
对特定违规行为做出假设并通过一些 SQL 查询和一些思考来探究该问题要容易得多。
正式的数学在这里:
http://en.wikipedia.org/wiki/Relational_model
There are logical tests for "normalization". However, they're not trivial exercises in programming; they're relationships in the metadata that are imposed on the data. They require "thinking".
1NF -- no repeating groups. How does one identify a "repeating group"? It would be an array structure imposed on the columns of a table. How is that done? SQL doesn't provide a mechanism, so you'd have to look at the column names to check for a "pattern". COL_1, COL_2, COL_3, for example.
1NF -- consistent layout of rows. Duh. SQL imposes this by the very nature of table definition.
2NF -- data in a row depends on the key. You'd have to do a procedure something like this.
The full algorithm is here: http://en.wikipedia.org/wiki/Relational_model#Set-theoretic_formulation
3NF -- data in a row depends ONLY on the key. This is worse, because you have to compare all combinations of non-key columns against all combinations of non-key columns to be sure that there were no non-key dependencies among the values.
4NF and 5NF confuse me, so I'll stop here.
My point is that -- theoretically -- you can do it. Practically, it's a lot of complex permutations of data to assert that the normal form relationships actually hold.
It's much, much easier to have a hypothesis about a specific violation and probe just that issue with some SQL queries and some thinking.
The formal math is here:
http://en.wikipedia.org/wiki/Relational_model