SQL:存储是/否值的最佳方式?关注大型数据库的性能
我有一些列,我必须存储基本上是/否值。 例如,用户状态为活动或非活动。已订阅或未订阅的时事通讯订阅状态。
好吧,我想知道(考虑有很多记录的表)最好的方法是否是放置一个字符长度为 1 的小 int 并设置 1 表示是,0 表示否。
这是正确的想法吗?或者,当仅使用 yes、no、active、inactive、subscribed 等词时,数据库查询的性能不会受到影响
。提前致谢。
I have some columns where I have to store basically yes/no values.
For example user status for active or inactive. Newsletter suscription status for suscribed or unsuscribed.
Well I want to know (considering tables with a lot of records) if the best way is to put a tiny int with char length of 1 and set 1 for yes, and 0 for no.
Is this a correct thought? Or there are no impact in the performance of db queries when using just words like yes, no, active, inactive, suscribed, etc.
thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
从语义上讲,我建议您使用
bit
(如果您可以使用)。当查看该列时,任何其他开发人员都可以立即确定其中存储了布尔值。如果您没有bit
,请尝试使用tinyint
。确保1
是唯一的true
值,0
是唯一的false
值将带来一致性。否则,您可能会得到true
/false
、yes
/no
、的混乱组合有效
/无效
、y
/n
和/或t
/f< /代码>。
比较
bit
或tinyint
值可能并不比比较字符串慢,即使它比比较字符串慢,我也无法想象它对整体有显着影响速度。Semantically, I suggest you use
bit
if it's available to you. When looking at the column, any other developer can immediately determine that a boolean value is stored in it. If you don't havebit
, try usingtinyint
. Ensuring that1
is the onlytrue
value and0
is the onlyfalse
value will bring consistency. Otherwise, you could end up with a messy mixture oftrue
/false
,yes
/no
,valid
/invalid
,y
/n
, and/ort
/f
.Comparing
bit
ortinyint
values probably isn't slower than comparing strings, and even if it were slower than comparing strings, I can't imagine it having a significant effect on overall speed.您对“位”数据类型有什么不喜欢的地方吗?
Is there something you don't like about the 'bit' data type?
最常用的支持方法是使用
CHAR(1)
- 在大多数数据库上,它占用与 BIT 相同的空间量(假设 BIT 可用,1 个字节),但支持更多值(如果情况为 26不敏感,如果不敏感则为 52),如果有机会支持更多值。与BIT
不同,CHAR(1)
是人类可读的。此外,并非所有数据库都支持BIT
。The most commonly supported means is to use
CHAR(1)
- on most databases, it takes the same amount of space as BIT (assuming BIT is available, 1 byte) but supports more values (26 if case insensitive, 52 if not) if there's any chance of supporting more values. UnlikeBIT
,CHAR(1)
is human readable. Also,BIT
isn't supported on every database.如果您的 RDBMS 支持位图索引,则每次都选择 BIT。如果没有,你想用什么就用什么,char(1)、tinyint(byte)之间真的没有区别。
If your RDBMS supports bitmap indexes, go for BIT every time. If it doesn't, use whatever you want, there is really no difference between char(1), tinyint (byte).
您只是一般性地问,存储是/否标志的最有效方法是什么?
或者您手头有性能问题吗?
如果是这样,您什么时候遇到性能问题(特定查询、插入、维护等)?您正在寻找什么样的性能提升?
2%? 10%? 50%?
更改数据类型可能只会带来很小的改进,除非我们讨论的是数亿行。我给你举个例子。假设无论您做了什么更改,每行都减少了 3 个字节。假设该表包含 100,000,000 行。这将节省约 285 mb。假设磁盘子系统可以为您提供 100mb/s 的速度,那么您就可以为全表扫描节省 3 秒的时间。有人告诉我,用户会认为 2 小时 3 秒与 2 小时是一样的:)
Are you just asking in general, what the most efficient way to store a yes/no flag is?
Or do you have a performance problem at hand?
If so, when do you have the performance problem (specific queries, inserts, maintenance etc)? What kind of performance gain are you looking for?
2%? 10%? 50%?
Changing datatypes will likely result in only a minor improvement unless we are talking about several hundred million rows. I will give you an example. Let's say that whatever change you did, you shaved of 3 bytes per row. Let's say the table contains 100,000,000 rows. That would be a saving of ~285 mb. Assuming the disk subsystem can provide you 100mb/s you have saved a whopping 3 seconds for a full table scan. Something tells me that the users would think 2 hours and 3 seconds vs 2 hours is same same :)
我的直觉是使用tinyints性能会更好,但是这篇文章并没有真正表达出这种深思熟虑。这SO 帖子还提供了一些其他有趣的观点。
我确实认为使用数字存储的数据进行分析通常比字符数据更容易。您还需要连接和使用哪些其他程序?例如,我的几个分析工具根本不读取字符数据,因此我们必须以“是”、“否”等格式重新编码我们收到的任何数据。
My intuition would have said performance would have been better with tinyints, but this post doesn't really bare that thought out. This SO post also offers some other interesting opinions.
I do think that performing analysis with data stored as numbers is typically easier than character data. What other programs are you going to have to interface with and use? For example, several of my analysis tools do not read character data at all, so we have to recode any data we receive in the format of "yes", "no", etc.