将表格组织成多个表格以减少单元格的算法?
我并不是真的想压缩数据库。这更多的是一个逻辑问题。是否有任何算法可以获取包含大量列和重复数据的数据表,并找到一种方法将其组织成许多带有 ID 的表,从而使单元格总数尽可能少,并且该表可以然后加入一个查询来复制原始查询。
我不关心任何特定的数据库引擎或语言。我只是想看看是否有一种合乎逻辑的方法可以做到这一点。如果您要发布代码,我喜欢 C# 和 SQL,但您可以使用任何一个。
I'm not really trying to compress a database. This is more of a logical problem. Is there any algorithm that will take a data table with lots of columns and repeated data and find a way to organize it into many tables with ID's in such a way that in total there are as few cells as possible, and that this tables can be then joined with a query to replicate the original one.
I don't care about any particular database engine or language. I just want to see if there is a logical way of doing it. If you will post code, I like C# and SQL but you can use any.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不知道有任何自动化算法,但您真正需要做的是对数据库进行严格规范化。这意味着要查看实际的功能依赖关系,并在任何有意义的地方打破它。
尝试在计算机程序中执行此操作的问题在于,并不总是清楚当前存储的数据集是否代表所有可能的问题情况。您也不能只查看值的数量。将布尔值分解到它们自己的表中没有什么意义,因为它们只有两个值,例如,这只是冰山一角。
我认为,在这一点上,没有什么比耐心的、手工制作的正常化更好的了。这是需要手工完成的事情。任何可能的计算机算法要么会把事情搞得一团糟,要么让你定义关系,这样你还不如自己做这一切。
I don't know of any automated algorithms but what you really need to do is heavily normalize your database. This means looking at your actual functional dependencies and breaking this off wherever it makes sense.
The problem with trying to do this in a computer program is that it isn't always clear if your current set of stored data represents all possible problem cases. You can't only look at numbers of values either. It makes little sense to break off booleans into their own table because they have only two values, for example, and this is only the tip of the iceberg.
I think that at this point, nothing is going to beat good ol' patient, hand-crafted normalization. This is something to do by hand. Any possible computer algorithm will either make a total mess of things or make you define the relationships such that you might as well do it all yourself.