如何最好地表示数据库中的常量(枚举)(INT 与 VARCHAR)?
就性能和“可读性/良好的编码风格”而言,在数据库层上表示整数(或一般任何数字数据类型)与字符串的(Java)枚举(固定常量集)的最佳解决方案是什么表示。
警告:有些数据库系统直接支持“枚举”,但这需要使数据库枚举定义与业务层实现保持同步。此外,这种数据类型可能并非在所有数据库系统上都可用,并且语法 => 也可能不同。我正在寻找一种易于管理且可在所有数据库系统上使用的简单解决方案。 (所以我的问题只涉及数字与字符串表示。)
在我看来,常量的数字表示存储起来非常有效(例如仅消耗两个字节作为整数),并且在索引方面很可能非常快,但很难读取(“0”与“1”等)。
字符串表示形式更具可读性(与“0”和“1”相比,存储“启用”和“禁用”),但消耗更多存储空间,并且是大多数在索引方面可能也较慢。
我的问题是,我是否错过了一些重要的方面?您建议在数据库层上使用什么枚举表示。
非常感谢!
what is the best solution in terms of performance and "readability/good coding style" to represent a (Java) Enumeration (fixed set of constants) on the DB layer in regard to an integer (or any number datatype in general) vs a string representation.
Caveat: There are some database systems that support "Enums" directly but this would require to keept the Database Enum-Definition in sync with the Business-Layer-implementation. Furthermore this kind of datatype might not be available on all Database systems and as well might differ in the syntax => I am looking for an easy solution that is easy to mange and available on all database systems. (So my question only adresses the Number vs String representation.)
The Number representation of a constants seems to me very efficient to store (for example consumes only two bytes as integer) and is most likely very fast in terms of indexing, but hard to read ("0" vs. "1" etc)..
The String representation is more readable (storing "enabled" and "disabled" compared to a "0" and "1" ), but consumes much mor storage space and is most likely also slower in regard to indexing.
My questions is, did I miss some important aspects? What would you suggest to use for an enum representation on the Database layer.
Thank you very much!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在大多数情况下,我更喜欢使用简短的字母数字代码,然后使用包含扩展文本的查找表。必要时,我会从数据库表动态地在程序中构建枚举表。
例如,假设我们有一个字段应该包含交易类型,可能的值为销售、退货、服务和分期付款。我将创建一个包含代码和描述的交易类型表,使代码可能为“SA”、“RE”、“SV”和“LY”,并使用代码字段作为主键。然后在每条交易记录中我都会发布该代码。这比记录本身和索引中的整数键占用的空间更少。它的具体处理方式取决于数据库引擎,但它的效率不应显着低于整数键。而且由于它具有助记性,因此非常易于使用。您可以转储记录并轻松查看值是什么,并且可能记住哪个是哪个。您可以在用户输出中显示无需翻译的代码,并且用户可以理解它们。事实上,这可以为您带来比整数键更高的性能:在许多情况下,缩写对用户来说是有好处的——他们通常希望缩写保持显示紧凑并避免滚动——所以您不需要加入事务表获得翻译。
我绝对不会在每条记录中存储长文本值。就像在这个例子中一样,我不想放弃交易表并存储“Layaway”。这不仅效率低下,而且很可能有一天用户会说他们希望将其更改为“Layaway sale”,甚至是“Lay-away”之类的细微差别。然后,您不仅需要更新数据库中的每条记录,还必须在程序中搜索该文本出现的每个位置并进行更改。此外,文本越长,程序员就越有可能拼写错误并产生模糊的错误。
此外,事务类型表提供了一个方便的位置来存储有关事务类型的附加信息。永远不要编写这样的代码:“如果whatevercode='A'或whatevercode='C'或whatevercode='X'那么......”无论是什么使这三个代码与所有其他代码有所不同,请为将其放入交易表中并测试该字段。如果您说“嗯,这些都是与税务相关的代码”或其他什么,那么可以创建一个名为“tax_lated”的字段,并根据需要将每个代码值设置为 true 或 false。否则,当有人创建新的交易类型时,他们必须查看所有这些 if/or 列表,并找出该类型应该添加到哪些交易类型以及不应该添加到哪些交易类型。我读过很多令人困惑的程序,在这些程序中我必须弄清楚为什么某些逻辑应用于这三个代码值而不是其他代码值,并且当您认为第四个值应该包含在列表中时,很难判断它是否是缺失是因为它在某些方面确实不同,或者程序员犯了错误。
我不创建转换表的唯一类型是当列表非常短,没有额外的数据要保留时,并且从宇宙的本质可以清楚地看出它不太可能改变,因此可以安全地保存值硬编码。就像真/假或正/负/零或男性/女性。 (嘿,即使是最后一个,虽然看起来很明显,但有人坚持认为我们现在包括“跨性别”等。)
有些人教条地坚持每个表都有一个自动生成的顺序整数键。在许多情况下,此类键是一个很好的选择,但对于代码列表,由于上述原因,我更喜欢短字母键。
In most cases, I prefer to use a short alphanumeric code, and then have a lookup table with the expanded text. When necessary I build the enum table in the program dynamically from the database table.
For example, suppose we have a field that is supposed to contain, say, transaction type, and the possible values are Sale, Return, Service, and Layaway. I'd create a transaction type table with code and description, make the codes maybe "SA", "RE", "SV", and "LY", and use the code field as the primary key. Then in each transaction record I'd post that code. This takes less space than an integer key in the record itself and in the index. Exactly how it is processed depends on the database engine but it shouldn't be dramatically less efficient than an integer key. And because it's mnemonic it's very easy to use. You can dump a record and easily see what the values are and likely remember which is which. You can display the codes without translation in user output and the users can make sense of them. Indeed, this can give you a performance gain over integer keys: In many cases the abbreviation is good for the users -- they often want abbreviations to keep displays compact and avoid scrolling -- so you don't need to join on the transaction table to get a translation.
I would definitely NOT store a long text value in every record. Like in this example, I would not want to dispense with the transaction table and store "Layaway". Not only is this inefficient, but it is quite possible that someday the users will say that they want it changed to "Layaway sale", or even some subtle difference like "Lay-away". Then you not only have to update every record in the database, but you have to search through the program for every place this text occurs and change it. Also, the longer the text, the more likely that somewhere along the line a programmer will mis-spell it and create obscure bugs.
Also, having a transaction type table provides a convenient place to store additional information about the transaction type. Never ever ever write code that says "if whatevercode='A' or whatevercode='C' or whatevercode='X' then ..." Whatever it is that makes those three codes somehow different from all other codes, put a field for it in the transaction table and test that field. If you say, "Well, those are all the tax-related codes" or whatever, then fine, create a field called "tax_related" and set it to true or false for each code value as appropriate. Otherwise when someone creates a new transaction type, they have to look through all those if/or lists and figure out which ones this type should be added to and which it shouldn't. I've read plenty of baffling programs where I had to figure out why some logic applied to these three code values but not others, and when you think a fourth value ought to be included in the list, it's very hard to tell whether it is missing because it is really different in some way, or if the programmer made a mistake.
The only type I don't create the translation table is when the list is very short, there is no additional data to keep, and it is clear from the nature of the universe that it is unlikely to ever change so the values can be safely hard-coded. Like true/false or positive/negative/zero or male/female. (And hey, even that last one, obvious as it seems, there are people insisting we now include "transgendered" and the like.)
Some people dogmatically insist that every table have an auto-generated sequential integer key. Such keys are an excellent choice in many cases, but for code lists, I prefer the short alpha key for the reasons stated above.
我将存储字符串表示形式,因为这很容易关联回枚举并且更加稳定。使用ordinal() 会很糟糕,因为如果您在系列中间添加一个新的枚举,它可能会发生变化,因此您必须实现自己的编号系统。
就性能而言,这一切都取决于枚举的用途,但开发带有转换的完整单独表示而不是仅使用自然字符串表示很可能是不成熟的优化。
I would store the string representation, as this is easy to correlate back to the enum and much more stable. Using ordinal() would be bad because it can change if you add a new enum to the middle of the series, so you would have to implement your own numbering system.
In terms of performance, it all depends on what the enums would be used for, but it is most likely a premature optimization to develop a whole separate representation with conversion rather than just use the natural String representation.