mySQL 数据库的自动规范化 - 如何做到?

发布于 2024-07-27 13:00:17 字数 259 浏览 4 评论 0原文

我有一个 mySQL 数据库,其中包含一张包含 80 列和 1000 万行的巨大表。 数据可能存在不一致。

我想以自动化且有效的方式规范化数据库。

我可以使用 java/c++/... 来完成,但我想在数据库内尽可能多地完成。 我想数据库之外的任何工作都会大大减慢速度。

关于如何做的建议? 有哪些好的资源/教程可以开始?

不是寻找任何关于标准化是什么的提示(使用谷歌发现了很多这样的东西)!

I have a mySQL database filled with one huge table of 80 columns and 10 million rows. The data may have inconsistencies.

I would like to normalize the database in an automated and efficient way.

I could do it using java/c++/..., but I would like to do as much as possible inside the database. I guess that any work outside the database will slow down things very much.

Suggestions on how to do it? What are good resources/tutorials to start with?

I am not looking for any hints on what normalization is (found plenty of this stuff using google)!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

携余温的黄昏 2024-08-03 13:00:17

您需要研究列以识别“类似”实体并将它们分成单独的表格。 自动化工具充其量可能会识别某些列具有相同值的行组,但理解数据的人必须确定这些行是否真正属于一个单独的实体。

这是一个人为的示例 - 假设您的列是名字、姓氏、地址、城市、州、邮政编码。 自动化工具可能会识别出具有相同姓氏、地址、城市、州和邮政编码的同一家庭成员的行,并错误地得出结论:这五列代表一个实体。 然后它可能会将表拆分:

名字、ReferenceID

和另一个表

ID、姓氏、地址、城市、州、邮政编码

明白我的意思吗?

You need to study the columns to identify 'like' entities and break them out into seperate tabels. At best an automated tool might identify groups of rows with identical values for some of the columns, but a person who understood the data would have to decide if those truely belong as a seperate entity.

Here's a contrived example - suppose your columns were first name, last name, address, city, state, zip. An automated tool might identify rows of people who were members of the same family with the same last name, address, city, state, and zip and incorrectly conclude that those five columns represented an entity. It might then split the tables up:

First Name, ReferenceID

and another table

ID, Last Name, Address, City, State, Zip

See what i mean?

苏别ゝ 2024-08-03 13:00:17

我想不出任何方法可以使其自动化。 您必须创建所需的表,然后通过手动查询检查并替换每条数据。

例如,

INSERT INTO contact
SELECT DISTINCT first_name, last_name, phone
FROM massive_table;

那么您可以从庞大的表中删除这些列,并将其替换为 contact_id 列。

当提取进入一对多表的行时,您将有类似的过程。

I can't think of any way you can automate it. You would have to create the tables that you want, and then go through and replace each piece of data with manual queries.

e.g.,

INSERT INTO contact
SELECT DISTINCT first_name, last_name, phone
FROM massive_table;

then you could drop the columns out of the massive table and replace it with a contact_id column.

You would have a similar process when pulling out rows that go into a one-to-many table.

嘿看小鸭子会跑 2024-08-03 13:00:17

在清理混乱的数据时,我喜欢创建用户定义的 mysql 函数来执行典型的数据清理工作……这样您就可以稍后重用它们。 通过这种方式还可以让您查看是否可以找到已编写的可以使用的现有 udf(无论是否经过修改)...例如 mysqludf.org

In cleaning up messy data, I like to create user defined mysql functions to do typical data-scrubbing stuff... that way you can reuse them later. Approaching this way also lets you see if you can find existing udf's that have been written which you can use (with or without modification)... for example mysqludf.org

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文