当前位置：文江博客话题详情

在每个数据库引擎中存储二进制标志/布尔值的最佳方法是什么？

发布于 2024-10-09 20:57:57 字数 645 浏览 2 评论 0原文

我已经看到了一些可能的方法（在某些数据库引擎中，其中一些是同义词）：

TINYINT(1)
BOOL
BIT(1)
ENUM(0,1)
CHAR(0) NULL

应该注意 PHP 支持的所有主要数据库引擎，但作为参考，如果能注明其他引擎就更好了。

我要求的是一种最适合阅读的设计。例如，在 WHERE 条件中使用标志字段进行选择，或使用标志进行 GROUP BY。性能比存储空间重要得多（除非大小对性能有影响）。

还有一些更多细节：

在创建表时，我不知道它是否会稀疏（如果大多数标志打开或关闭），但我可以稍后更改表，所以如果我知道有什么可以优化的对此，应当指出。

此外，如果每行只有一个（或几个）标志与许多（或很多）标志有区别，则应注意这一点。

顺便说一句，我在某处读过以下内容：

使用布尔值可以做同样的事情使用tinyint，但是它有语义传达的优势你的意图是什么，那就是值得一些东西。

好吧，就我而言，它没有任何价值，因为每个表都由我的应用程序中的一个类表示，并且所有内容都在类中显式定义并有详细记录。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

假装不在乎 2024-10-16 20:57:57

这个答案适用于 ISO/IEC/ANSI 标准 SQL，并且包括更好的免费软件假装 SQL。

第一个问题是您已经确定了两个类别，而不是一个类别，因此无法对它们进行合理比较。

A.类别一

(1)、(4)和(5)包含多个可能的值并且是一个类别。所有这些都可以在 WHERE 子句中轻松有效地使用。它们具有相同的存储，因此存储和读取性能都不是问题。因此，剩下的选择只是基于列的实际数据类型。

ENUM 是非标准的；更好或标准的方法是使用查找表；那么这些值在表中可见，而不是隐藏，并且可以通过任何报表工具进行枚举。由于内部处理，ENUM 的读取性能会受到轻微影响。

B. 第二类

(2) 和 (3) 是二值元素：True/False；男/女；死/活。该类别与第一类不同。它在数据模型和每个平台中的处理方式都是不同的。 BOOLEAN 只是 BIT 的同义词，它们是同一个东西。从法律上讲（SQL 方面），所有兼容 SQL 的平台都会进行相同的处理，并且在 WHERE 子句中使用它没有问题。

性能差异取决于平台。 Sybase 和 DB2 将最多 8 位打包为一个字节（这里存储并不重要），并动态映射 2 的幂，因此性能非常好。 Oracle 在每个版本中做了不同的事情，我看到建模者使用 CHAR(1) 而不是 BIT 来克服性能问题。 MS 在 2005 年之前表现还不错，但在 2008 年就打破了这一局面，因为结果是不可预测的；所以简短的答案可能是将其实现为 CHAR(1)。

当然，假设您不会做愚蠢的事情，例如将 8 个单独的列打包到一个 TINYINT 中。这不仅是一个严重的规范化错误，而且对编码人员来说是一场噩梦。保持每列离散且数据类型正确。

C. 多重指标&可空列

这与 (A) 和 (B) 无关且独立。列的正确数据类型是什么，与您拥有的数量以及是否可为 Null 无关。可空意味着（通常）该列是可选的。本质上您还没有完成建模或标准化练习。功能依赖性不明确。如果完成规范化练习，将不会有可为 Null 的列，也不会存在可选列；它们要么对于特定关系明显存在，要么不存在。这意味着使用普通的父类型-子类型的关系结构。

当然，这意味着更多的表，但没有空值。企业 DBMS 对于更多表或更多连接没有问题，这就是它们优化的目的。规范化数据库的性能比非规范化或非规范化数据库要好得多，并且无需“重构”即可扩展它们。您可以通过为每个子类型提供视图来简化使用。

如果您想了解有关此主题的更多信息，请查看此问题/答案如果。如果您需要建模方面的帮助，请提出一个新问题，在您的提问级别，我建议您坚持使用 5NF。

D. 空值的性能

如果性能对您很重要，则排除每个可空列。存储为可变长度；这需要对每行/列进行额外处理，以允许日志记录等在不妨碍固定行的情况下移动队列。索引中的长度列（包括可为 Null 的列）：需要在每次访问时进行解包。

E. 民意调查

最后，我认为这个问题没有民意调查的意义。很公平，你会得到技术答案，甚至意见，但民意调查是为了人气竞赛，而SO的响应者的技术能力涵盖了非常广泛的范围，所以最受欢迎的答案和技术上最正确的答案是在两个不同的答案频谱的两端。

This answer is for ISO/IEC/ANSI Standard SQL, and includes the better freeware pretend-SQLs.

First problem is you have identified two Categories, not one, so they cannot be reasonably compared.

A. Category One

(1) (4) and (5) contain multiple possible values and are one category. All can be easily and effectively used in the WHERE clause. They have the same storage so neither storage nor read performance is an issue. Therefore the remaining choice is simply based on the actual Datatype for the purpose of the column.

ENUM is non-standard; the better or standard method is to use a lookup table; then the values are visible in a table, not hidden, and can be enumerated by any report tool. The read performance of ENUM will suffer a small hit due to the internal processing.

B. Category Two

(2) and (3) are Two-Valued elements: True/False; Male/Female; Dead/Alive. That category is different to Category One. Its treatment both in your data model, and in each platform, is different. BOOLEAN is just a synonym for BIT, they are the same thing. Legally (SQL-wise) there are handled the same by all SQL-compliant platforms, and there is no problem using it in the WHERE clause.

The difference in performance depends on the platform. Sybase and DB2 pack up to 8 BITs into one byte (not that storage matters here), and map the power-of-two on the fly, so performance is really good. Oracle does different things in each version, and I have seen modellers use CHAR(1) instead of BIT, to overcome performance problems. MS was fine up to 2005 but they have broken it with 2008, as in the results are unpredictable; so the short answer may be to implement it as CHAR(1).

Of course, the assumption is that you do not do silly things such as pack 8 separate columns in to one TINYINT. Not only is that a serious Normalisation error, it is a nightmare for coders. Keep each column discrete and of the correct Datatype.

C. Multiple Indicator & Nullable Columns

This has nothing to do with, and is independent of, (A) and (B). What the columns correct Datatype is, is separate to how many you have and whether it is Nullable. Nullable means (usually) the column is optional. Essentially you have not completed the modelling or Normalisation exercise. The Functional Dependencies are ambiguous. if you complete the Normalisation exercise, there will be no Nullable columns, no optional columns; either they clearly exist for a particular relation, or they do not exist. That means using the ordinary Relational structure of Supertype-Subtypes.

Sure, that means more tables, but no Nulls. Enterpise DBMS have no problem with more tables or more joins, that is what they are optimised for. Normalised databases perform much better than unnormalised or denormalised ones, and they can be extended without "re-factoring'. You can ease the use by supplying a View for each Subtype.

If you want more information on this subject, look at this question/answer. If you need help with the modelling, please ask a new question. At your level of questioning, I would advise that you stick with 5NF.

D. Performance of Nulls

Separately, if performance is important to you, then exclude Nulls. Each Nullable column is stored as variable length; that requires additional processing for each row/column. The enterprise databases use a "deferred" handling for such rows, to allow the logging, etc to move thought the queues without impeding the fixed rows. In particular never use variable length columns (that includes Nullable columns) in an Index: that requires unpacking on every access.

E. Poll

Finally, I do not see the point in this question being a poll. It is fair enough that you will get technical answers, and even opinions, but polls are for popularity contests, and the technical ability of responders at SO covers a very range, so the most popular answers and the most technically correct answers are at two different ends of the spectrum.

回复收藏 0 原文