使用 1 位数字代码 VS 数据库中 ENUM 的描述性字符串?
在我的 user_accounts
表中,有一个名为 source
的字段,它是一个 ENUM
。它表明用户是如何被推荐到网站的。可能的值是通过 Facebook、电子邮件或常规网站注册。
对于如何将这些值存储在数据库中,有 3 种可能的选项:
作为完整字符串,即
facebook、email 、 website
作为 1 字母代码,例如
F、E、W
作为 1 数字代码,例如
1, 2, 3
其中从数据库性能/维护的角度来看,该方法是最好的方法吗?如果我将值存储为 1 个字母/数字代码而不是完整字符串,是否会产生任何影响(例如更快的查询)?此列将在 WHERE
语句中使用。
In my user_accounts
table, I have a field called source
which is an ENUM
. It indicates how the user was referred to the website. Possible values are via Facebook, via Email, or via regular website signup.
There are 3 possible options for how I can store these values in the database:
As a full string, i.e
facebook, email , website
As a 1 letter code, e.g
F, E, W
As a 1 number code, e.g
1, 2, 3
Which approach is the best approach from a database performance / maintainence point of view? Is there going to be any impact (such as faster queries) if I stored the values as 1 letter / digit codes rather than as the full strings? This column will be used in WHERE
statements.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
性能方面,ENUMS确实倾向于最优化(由于系统知道可能的值集,它在搜索等中使用不同的算法)。您可以使用 ENUMS(“FACEBOOK”等)存储完整的字符串,并且它们每行仅占用 1 个字节的空间! (前提是枚举总数少于 256 个)。但是,仅当您确定 F、E、W 是您将看到的唯一选项时才使用枚举。添加新的枚举是一件痛苦的事情,因为您必须更改表才能更新枚举值,并且必须确保枚举顺序。
紧随其后的是数字。问题在于可读性。您最终可能不得不保留另一张表来了解每个数字代表的含义。 (无论是在数据库中还是在应用程序中)
Varchar 的查询速度最差,但可读性和管理性最强。
Performance wise, ENUMS really tend to the most optimized (As the system knows the possible set of values, it uses different algos in searching etc etc). You can store the full string with ENUMS ('FACEBOOK', etc etc) and they will take only 1 byte of space per row! (provided its less than 256 enums in total). However, use enums only if you know for sure that F,E,W are the only options you will see. Adding new ENUMS is a pain as you will have to alter the table to update the enum values, and you will have to ensure the enum order.
A close second is digits. The issue with that is readability. You may end up having to keep another table to understand what each digit stands for. (Either in the database or in the application)
Varchar is the worst when it comes to speed of query, but is the most readable and managable.
由于您将在 WHERE 子句中使用此字段,因此最好使用数字枚举器。一般来说,您应该对将要过滤或排序的字段建立索引(我说一般是因为您确实需要使用查询执行计划验证索引是否值得开销)。
然而,话虽如此,ENUM 使用起来却很混乱。如果你决定以后要改变一些事情,那就很难改变。另外,如果你传入了错误的数据,除非你对 SQL 有严格的要求,否则你将在该字段中得到 NULL。您也不能向 ENUM 数据添加其他属性,例如它是活动的还是旧的等。最后,直接在应用程序中使用 ENUM 信息要困难得多(例如使用您的 ENUM 选择填充下拉菜单) )。
这是关于这个主题的一个很好的问题:
MySQL ENUM 类型与连接表
最后,我建议您使用连接表和 PK/FK 关系。如果您想要更快的查询,只需不要链接表并使用索引作为数字枚举器即可。在我看来,与使用 ENUM 相比,这更适合良好的数据库设计。
Since you are going to use this field in a WHERE clause, you are better off using the numeric enumerator. It is generally true that you should index fields that will be filtered or sorted on (I say generally because you really need to verify with query execution plans whether the index is worth the overhead).
With this being said, however, an ENUM is a mess to use. If you decide down the road to change something, it is difficult to change. Also, if you pass in bad data, unless you have SQL strict on you will just end up with a NULL in that field. You also cannot add other attributes to the ENUM data such as whether it is active or legacy, etc. Finally, it is much more difficult to use the ENUM information in your applications directly (such as populating a drop-down menu with your ENUM choices).
Here is a good SO question about this topic:
MySQL ENUM type vs join tables
In the end, I would recommend that you use a joined table and a PK/FK relationship. If you want a faster query, just don't link the table in and use the index as your numeric enumerator. This fits much better into a good database design compared to using an ENUM in my opinion.
好吧,如果您使用数字,您可能会节省一些 CPU 周期。这相当于大脑中的一个神经元与另一个神经元对话所需的时间。用外行的话来说,除非您计划每秒执行数十亿次,否则最好选择最具可读性和灵活性的选项,这样可以节省您许多秒的调试工作,这很可能比您所节省的时间更长使用数字。
Well, if you use numbers, you might save a few CPU cycles. Which equals about how long it takes one of the neurons in your brain to talk to another. In layman's terms, unless you plan on doing it several billion times a second, it's better to go with which option is most readable and flexible, saving you many seconds on debugging effort, which is most likely longer than the saving you'll get by using numbers.