将一个人的名字正常化是否太过分了?

发布于 2024-07-18 07:21:50 字数 259 浏览 9 评论 0原文

您通常会对数据库进行规范化以避免数据冗余。 在充满姓名的表格中很容易看出存在大量冗余。 如果您的目标是创建地球上每个人的姓名目录(祝您好运),我可以看到标准化姓名会带来多大好处。 但在一般商业数据库的背景下,它是不是有点矫枉过正呢?

(当然我知道你可以把任何事情推向极端......比如说,如果你归一化为音节......甚至相邻的字符对。我看不出走那么远有什么好处)

更新:

一个可能的理由是一个随机名称生成器。 这就是我能想到的全部。

You usually normalize a database to avoid data redundancy. It's easy to see in a table full of names that there is plenty of redundancy. If your goal is to create a catalog of the names of every person on the planet (good luck), I can see how normalizing names could be beneficial. But in the context of the average business database is it overkill?

(Of course I know you could take anything to an extreme... say if you normalized down to syllables... or even adjacent character pairs. I can't see a benefit in going that far)

Update:

One possible justification for this is a random name generator. That's all I could come up with off the top of my head.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(19

迷你仙 2024-07-25 07:21:51

一般来说是的。 正常化到这个水平就已经走得太远了。 根据查询(例如按姓氏搜索很常见的电话簿),这可能是值得的。 我预计这种情况很少见。

Generally yes. Normalizing to that level would be going to far. Depending on the queries (such as phone books where searches by last name are common) it might be worthwhile. I expect that to be rare.

别再吹冷风 2024-07-25 07:21:51

我通常没有看到需要规范化名称,主要是因为这会增加始终被调用的联接的性能影响,并且不会带来任何好处。

如果您有这么多相似的名称,并且存在存储问题,那么这可能是值得的,但需要考虑性能会受到影响。

I generally haven't seen a need to normalize the name, mainly because that adds a performance hit on the join that will always be called, and doesn't give any benefit.

If you have so many similar names, and have a storage problem then it may be worth it, but there will be a performance hit that would need to be considered.

孤凫 2024-07-25 07:21:51

我想说这绝对是矫枉过正。 在大多数应用程序中,您经常显示人员的姓名,因此涉及的每个查询都会看起来更加复杂且难以阅读。

I would say it is absolutely overkill. In most applications, you display folks' names so often, every query involved with that is going to look that much more complex and harder to read.

寂寞笑我太脆弱 2024-07-25 07:21:51

是的。 人们普遍认为,仅应用所有规范化规则可能会导致您走得太远并最终得到过度规范化的数据库。 例如,可以将每个字符的每个实例标准化为对字符枚举表的引用。 很容易看出这很荒谬。

标准化需要在适合您的问题域的级别上执行。 过度标准化与欠标准化一样都是一个问题(当然,出于不同的原因)。

Yes, it is. It is commonly recognized that just applying all of the Rules of Normalization can cause you to go way too far and end up with an overnormalized database. For example, it would be possible to normalize every instance of every character to a reference to a character enumeration table. It's easy to see that that's ridiculous.

Normalization needs to be performed at a level that is appropriate for your problem domain. Overnormalization is as much a problem as undernormalization (although, of course, for different reasons).

肥爪爪 2024-07-25 07:21:51

在某些情况下,能够链接已婚/婚前姓名可能会很有用。
最近有一个案例,我不得不重命名数千封电子邮件作为交换,因为有人离婚了,并且不希望任何电子邮件将她列为 [电子邮件受保护]

There might be a case where being able to link married/maiden names would be useful.
Recently had a case where I had to rename thousands of emails in exchange because somebody got divorced and didn't want any emails listing her as [email protected]

雨后彩虹 2024-07-25 07:21:51

不需要规范化到该级别,除非名称构成复合主键并且您拥有依赖于其中一个名称的数据(例如任何姓 Plummer 的人对数据库一无所知)。 在这种情况下,如果不进行规范化,您将违反 第二范式

No need to normalize to that level unless the names make up a composite primary key and you have data that is dependant on one of the names (e.g. anyone with the surname Plummer knows nothing about databases). In which case, by not normalizing, you would violate second normal form.

单挑你×的.吻 2024-07-25 07:21:51

我同意一般的回答,你不会这样做。

但我想到了一件事,那就是压缩。 如果您有 10 亿人,并且您发现 60% 的名字是从 5 个非常常见的名字中提取的,那么您可以使用一些棘手的位操作来显着减小大小。 它还需要非常定制的数据库软件。

但这并不是为了标准化,只是为了压缩。

I agree with the general response, you wouldn't do that.

One thing comes to mind though, compression. If you had a billion people and you found that 60% of first names were pulled from 5 very common names, you could use some tricky bit manipulation to reduce the size very significantly. It would also require very customized database software.

But this isn't for the purpose of normalization, just compression.

三生殊途 2024-07-25 07:21:51

如果您需要避免不破坏它而导致的删除异常,则应该将其标准化。 也就是说,如果您需要回答这个问题,我的数据库中是否曾经有过一个名为“Joejimbobjake”的人,您需要避免出现异常。 软删除可能是比拥有全面的名字表(例如)更好的方法,但您明白我的意思。

You should normalize it out if you need to avoid the delete anomaly that comes with not breaking it out. That is, if you ever need to answer the question, has my database ever had a person named "Joejimbobjake" in it, you need to avoid the anomaly. Soft deletes is probably a much better way than having a comprehensive first name table (for example), but you get my point.

墨落画卷 2024-07-25 07:21:51

除了其他人提出的所有观点之外,请考虑一下,如果您正在实施数据输入操作(例如),并且要插入新联系人,则您必须搜索名字和姓氏表才能找到正确的联系人。 Id,然后使用这些值。 但是,当名称不在 FN 和/或 LN 表中时,情况会变得更加复杂,那么您必须插入新的名字/姓氏并使用新的 id。

如果您认为自己有一份完整的名单,请再想一想。 我处理了超过 20 万个独特名字的列表,我猜它代表了美国人口的 99.9%。 但这 0.1% = 很多人。 并且不要忘记外国名字和拼写错误......

In addition to all the points everyone else has made, consider that if you were implementing a data entry operation (for example), and were to insert a new contact, you would have to search your first name and last name tables to locate the correct Id's and then use those values. But then this is further complicated by the occasion when the name is not on the FN and/or LN tables, then you have to insert the new first/last name and use the new id(s).

And if you think that you have a comprehensive list of names, think again. I work with a list of over 200k unique first names and I'd guess it represents 99.9% of the US population. But that .1% = a lot of people. And don't forget the foreign names and misspellings...

红颜悴 2024-07-25 07:21:50

是的,这太过分了。

人们不会一下子将自己的名字从 Bill 更改为 Joe

Yes, it's an overkill.

People don't change their names from Bill to Joe all at once.

掀纱窥君容 2024-07-25 07:21:50

数据库规范化通常是指规范字段,而不是规范其内容。 换句话说,您可以规范化数据库中只有一个名字字段。 这通常是值得的。 然而,数据内容不应该被规范化,因为它对于那个人来说是单独的——你不是从列表中选择,也不是在一个地方改变列表来影响每个人——这将是一个错误,而不是一个功能。

Database normalization usually refers to normalizing the field, not its content. In other words, you would normalize that there only be one first name field in the database. That is generally worthwhile. However the data content should not be normalized, since it is individual to that person - you are not picking from a list, and you are not changing a list in one place to affect everybody - that would be a bug, not a feature.

寄意 2024-07-25 07:21:50

如何规范化一个名字? 并非所有名称都具有相同的结构。 并非所有国家或文化都使用相同的名称规则。 名字不一定只是名字。 人们有不同数量的名字。 有些国家/地区没有简单的名字/姓氏对。 如果我的名字恰好是您的姓氏,那么它们在您的数据库中是否应该被视为相同? 如果不是,那么您就会遇到姓氏在不同国家可能具有不同含义的问题。 在我所知道的大多数国家,它是一个姓氏。 您的姓氏至少与您父母之一的姓氏相同。 在冰岛,这是你父亲的名字,后面跟着“儿子”或“女儿”。 因此,相同的姓氏意味着完全不同的含义,具体取决于您是否在冰岛和美国遇到它。

在某些文化中,女性结婚时随夫姓是很常见的。 在其他文化中,这完全是可选的,甚至可能以相反的方式起作用。

你如何才能将其正常化? 它会给你带来什么信息? 如果您在数据库中发现某人的名字最后一个词是“Smith”,这告诉您什么? 这可能不是他们的姓氏。 它可能只是姓氏的一部分。 在某些语言中,这可能是一种荣誉,但根据他们的文化,这应该被视为名称的一部分。

仅当数据遵循通用结构时,您才能对其进行标准化。

How do you normalize a name? Not all names have the same structure. Not all countries or cultures use the same rules for names. A first name is not necessarily just a first name. People have variable numbers of names. Some countries don't have the simple pair of firstname/lastname. What if my first name just so happens to be your last name, should they be considered the same in your database? If not, then you get into the problem that last name might mean different things in different countries. In most countries I know of, it is a family name. Your last name is the same as at least one of your parents' last name. On Iceland, it is your father's first name, followed by "son" or "daughter". So the same last name will mean completely different things depending on whether you encounter it in Iceland and the US.

In some cultures it is common when getting married, for the woman to take her husband's last name. In other cultures, that's completely optional, or might even work the opposite way.

How can you normalize this? What information would it gain you? If you find someone in your database who has "Smith" as the last word making up their name, what does that tell you? It might not be their family name. It might only be part of the family name. It might be an honorary in some language, but which according to their culture, should be considered part of the name.

You can only normalize data if it follows a common structure.

治碍 2024-07-25 07:21:50

如果您需要根据小名称执行查询,我可能会发现需要规范化名称。 例如,搜索“Betty”可能需要返回“Betty”、“Beth”和“Elizabeth”的结果

If you had a need to perform queries based on diminutive names I could see a need for normalizing the names. e.g. a search for "Betty" may need to return results for "Betty", "Beth", and "Elizabeth"

黯然 2024-07-25 07:21:50

是的,绝对是矫枉过正。 朋友之间的几十个字节是多少?

Yes, definitely overkill. What's a few dozen bytes betewen friends?

蘸点软妹酱 2024-07-25 07:21:50

如果您在人口普查办公室工作,也许这可能有意义。 否则,请参阅其他所有答案:)

Maybe if you work in the Census office it might make sense. Otherwise, see every other answer :)

愛上了 2024-07-25 07:21:50

我想说是的,95% 以上的情况都太过分了。

I would say yes, it is going too far in 95%+ of the cases.

怀中猫帐中妖 2024-07-25 07:21:50

是的。 我想不出一个好处超过问题和查询复杂性的例子。

Yes. I cannot think of an instance where the benefits outweigh the problems and query complications.

流殇 2024-07-25 07:21:50

不,但您可能想要标准化为客户的规范记录(这样您就不会在数据库中获得“Bloggs & Co.”的 5 个不同条目。这是一个经常困扰 MIS 项目的数据清理问题。

No, but you might want to normalise to a canonical record for a customer (so you don't get 5 different entries for 'Bloggs & Co.' in your database. This is a data cleansing issue that often bites on MIS projects.

暖树树初阳… 2024-07-25 07:21:50

您通常不会在数据库中进行第四种形式的规范化。 因此第七形态标准化有点过分了。 唯一这可能是一个远程合理想法的地方是在某种大型数据仓库中。

You often don't go over fourth form normalization in a database. Therefore seventh form normalization is quite a bit overboard. The only place this might even be a remotely plausible idea is in some kind of massive data warehouse.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文