数据库优化:通过整数或短字符串搜索哪个更快?
我想知道我遇到的基本数据库设计/数据类型问题。
我有一个项目表,其中有一个名为“experience_required”的字段。我知道该字段将始终由以下选项之一填充:实习生、初级、高级或总监。随着时间的推移,这个列表可能会有所不同,但我预计其中的项目不会发生巨大变化。
我应该选择整数还是字符串?将来,当我有大量这样的记录并且需要通过 expeirence_required 检索它们时,将它们保存为整数会有所不同吗?
I am wondering about a basic database design / data type question I am having.
I have a porjects table with a field called "experience_required". I know this field will be always populated from one of these options: intern, junior, senior, or director. This list may vary a bit as time evolves but I don't expect dramatic changes to the items on it.
Should I go for integer or string? In the future when I have tons of records like this and need to retrieve them by expeirence_required, will it make a difference to have them in integers?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
绝对选择整数而不是字符串。
性能会更好,您的数据库将更接近规范化。
最终,您应该创建一个名为 ExperienceLevel 的新表,其中包含字段 Id 和 Title。现有表中的 experience_required 字段应更改为另一个表上的外键。
这将是一个更强大的设计,并且在您更改可用经验级别或决定重命名经验级别的情况下会更加宽容。
您可以在此处了解有关规范化的更多信息。
Definitely go for Integer over String.
Performance will be better, and your database will be closer to being normalized.
Ultimately, you should create a new table called ExperienceLevel, with fields Id and Title. The experience_required field in the existing table should be changed to a foreign key on the other table.
This will be a much stronger design, and will be more forgiving in the case that you change the experience levels available, or decide to rename an experience level.
You can read more about Normalization here.
您可能喜欢这个已索引的字段。一旦建立索引,整数和小字符字符串就没有太大的(读取可忽略不计的)性能差异。
You may like this field indexed. Once indexed Integer and small Char String don't have much (read negligible) performance difference.
整数。恕我直言,字符串应该仅用于存储文本数据(名称、地址、文本等)。
此外,在这种情况下,整数更适合排序、存储空间和维护。
Integers. Strings should IMHO only be used to store textual data (names, addresses, text, etc).
Besides, integers are in this case better for sorting, storage space and maintaining.
理论上,当你索引整数时,它们会占用更少的内存。
您还可以使用枚举(在 mysql 中),它看起来像字符串,但存储为整数。
In theory integers will take less memory when you index them.
You can also use enums (in mysql) which look like strings but stored as integers.
没关系。差异可以忽略不计。有什么区别有利于选择整数,但这是我更喜欢短文本键的少数情况之一,因为它会在许多报告情况下将 JOIN 保存回查找表。
Doesn't matter. The difference would be negligible. What difference there is would favor the choice of integer, but this is one of the few cases in which I prefer a short text key since it will save a JOIN back to a lookup table in many reporting situations.
为了让水变得浑浊一些,我建议混合使用。从 @GregSansom 的想法(已投票)开始,但使用
CHAR(1)
数据类型代替整数,其值为 I、J、S 和 D。这将为您提供与使用 tinyint 相同的性能,并在(如果)直接处理数据时提供简单易记的助记符的额外优势。通过一点使用,记住“S”意味着“高级”是很简单的,而 3 不带有任何内置含义 - 特别是如果,正如您所建议的,随着时间的推移添加额外的值。 (将试用期添加为 5,“低排名 = 低价值”范式就不再适用。)只有当您的项目列表非常短时,这才有效。获得太多或太相似,就很难编写出可用的代码。
当然,如果这些是连续值怎么办?当然听起来像这里。在这种情况下,不要将它们设置为 1、2、3、4,而将它们设置为 10、20、30、40,以便稍后插入新的分类。这还可以让您轻松实现范围,例如“每个人 < 30”(意味着小于“高级”)。
我想我的要点是:了解您的数据,如何使用它,随着时间的推移它可能或将如何变化,并相应地计划和编码!
To muddy the waters some, I'll suggest a mix. Start with @GregSansom's idea (upvoted), but instead of integers use the
CHAR(1)
datatype, with values I, J, S, and D. This will give you the same performance as using tinyint, and give the extra advantage of a simple to remember mnemonic when (if) working directly with the data. With a bit of use, it is trivial to remember that "S" means "senior", whereas 3 does not carry any built in meaning--particularly if, as you suggest, extra values are added over time. (Add Probationary as, say, 5, and the "low rank = low value" paradigm is out the window.)This only works if you have a very short list of items. Get too many or too similar, and it's hard to work up usable codes.
Of course, what if these are sequential values? Sure sounds like it here. In that case, don't make them 1,2,3,4, make them 10, 20, 30, 40, so you can insert new categorizations later on. This would also allow you to easily implement ranges, such as "everyone < 30" (meaning less than "senior").
I guess my main point is: know your data, how it will be used, how it may or will change over time, and plan and code accordingly!