用于“状态”的类型SQL表中的列
我有一个(虚拟)表结构,如下所示:
ticket id: int(11) PK name: varchar(255) status: ?????????
问题是,我应该使用什么数据类型来表示状态?以下是我看到的选项:
- varchar 表示状态 - 不好,因为没有
- 表示状态的完整性枚举 - 不好,因为要更改值,我必须更改表,然后更改带有值下拉菜单的任何代码等等
- int FK 到状态表 - 好,因为它是动态的,不好,因为很难通过视觉检查(这可能有用)
- varchar FK 到状态表 - 好,因为它是动态的,并在检查时可见。不好,因为键是有意义的,这通常是不受欢迎的。有趣的是,在这种情况下,状态表完全有可能只有 1 列,使其成为一个美化的枚举
我是否准确地了解了情况?拥有一个有意义的密钥真的那么糟糕吗?因为虽然它确实让我起鸡皮疙瘩,但我没有任何理由这样做......
更新: 对于选项 4,建议的结构为状态表的status:char(4) FK。所以,
打开=> “打开”
CLOS => “已关闭”
“待处理”=> “待授权”
“PROG”=> “进行中
这种情况下的缺点是什么?在这种情况下,我认为使用 int 而不是 char 的唯一好处是性能稍差。
I have a (dummy) table structure as follows:
ticket id: int(11) PK name: varchar(255) status: ?????????
The question is, what data type should I use for status? Here are my options, as I see them:
- varchar representing the status - BAD because there's no integrity
- enum representing the status - BAD because to change the value, I'd have to alter the table, and then any code with dropdowns for the values, etc etc etc
- int FK to a status table - GOOD because it's dynamic, BAD because it's harder to inspect by sight (which may be useful)
- varchar FK to a status table - GOOD because it's dynamic, and visible on inspection. BAD because the keys are meaningful, which is generally frowned upon. Interestingly, in this case it's entirely possible for the status table to have just 1 column, making it a glorified enum
Have I got an accurate read of the situation? Is having a meaningful key really that bad? Because while it does give me goosebumps, I don't have any reason for it doing so...
Update:
For option 4, the proposed structure would be status: char(4) FK, to a status table. So,
OPEN => "Open"
CLOS => "Closed"
"PEND" => "Pending Authorization"
"PROG" => "In Progress
What's the disadvantage in this case ? The only benefit I can see of using int over char in this case is slight performance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我会选择数字 4,但我会使用
char(x)
列。如果您担心性能,char(4) 会占用与 int 一样多的空间(以及磁盘 I/O、带宽和处理时间),后者也需要 4 个字节来存储。如果您真的担心性能,请将其设为 char(2) 甚至 char(1)。不要将其视为“有意义的数据”,而应将其视为自然键的缩写。是的,数据有意义,但正如您所注意到的,在处理数据时这可能是一件好事——这意味着您不必总是通过连接(即使是一个很小的表)来从数据中提取含义。数据库。当然,外键约束确保数据有效,因为它必须位于查找表中。 (这也可以通过 CHECK 约束来完成,但随着时间的推移,查找表通常更容易管理和维护。)
缺点是您可能会陷入试图寻找含义的困境。 char(1) 具有很强的吸引力,但如果您获得十个或更多值,则可能很难想出好的有意义的值。 char(4) 的问题较少,但仍然是一个可能的问题。另一个缺点:如果数据可能会发生变化,那么是的,您的有意义的数据(“PEND”=“待授权”)可能会失去其意义(“PEND”=“转发到总部进行初步批准”)。这是一个很糟糕的例子;如果类似的代码确实发生变化,那么您最好重构系统以反映业务规则的变化。我想我的观点应该是,如果它是用户输入的查找值,代理键(整数)将是你的朋友,但如果它们是内部定义和维护的,你绝对应该考虑更人性化的值。否则,您需要在监视器上留下事后注释,以提醒您 Status = 31 到底意味着什么。 (我身上有三个,粘纸每隔几个月就会磨损。谈论维护成本......)
I would go with number 4, but I'd use a
char(x)
column. If you're worried about performance, a char(4) takes up as much space (and, or so one would think, disk i/o, bandwidth, and processing time) as an int, which also takes 4 bytes to store. If you're really worried about performance, make it a char(2) or even char(1).Don't think of it as "meaningful data", think of it as an abbreviation of the natural key. Yes, the data has meaning, but as you've noticed that can be a good thing when working with the data--it means you don't always have to join (even if to a trivially small table) to extract meaning from the database. And of course the foreign key constraint ensures that the data is valid, since it must be in the lookup table. (This can be done with CHECK constraints as well, but Lookup tables are generally easier to manage and maintain over time.)
The downside is that you can get caught up with trying to find meaning. char(1) has a strong appeal, but if you get to ten or more values, it can get hard to come up with good meaningful values. Less of a problem with char(4), but still a possible issue. Another downside: if the data is likely to change, then yes, your meaningful data ("PEND" = "Pending Authorization") can lose its meaning ("PEND" = "Forward to home office for initial approval"). That's a poor example; if codes like that do change, you're probably much better off refactoring your system to reflect the change in business rules. I guess my point should be, if it's a user-entered lookup value, surrogate keys (integers) will be your friend, but if they're internally defined and maintained you should definitely consider more human-friendly values. That, or you'll need post-em notes on your monitor to remind you what the heck Status = 31 is supposed to mean. (I've got three on mine, and the stickum wears out every few months. Talk about cost to maintain...)
选择数字 3。如果您想要检查某些内容,请创建一个加入状态值的视图。
Go with number 3. Create a view that join's in the status value if you want something inspectable.
我将使用 INT,并创建与状态表的外键关系。对于枚举状态列来说,INT 绝对应该是安全的。
I would use an INT, and create a foreign key relationship to the status table. An INT should definitely be safe for an enumerated status column.
我最近一直在使用很多需要很多状态的数据库,并且我有一些可能值得添加到对话中的注释。
INT:我发现的一件事是,如果应用程序正在进行大量跟踪,引用表的数量很快就会变得难以处理,并且正如您所提到的,使检查数据库一目了然不切实际的。 (对于我的一些客户来说,这比在处理时间中节省的几毫秒更重要。)
VARCHAR:编程的糟糕想法,但重要的是要考虑给定的状态是否实际上正在发生由代码或人眼使用。对于后者,您可以获得无限的范围,并且不必维持任何关系。
CHAR(4):使用描述性字符列实际上是一个非常好的方法。我通常只会在值范围较低且明显的情况下考虑它,但这只是因为我认为这是一种非标准方法(可能会让新开发人员感到困惑)。实际上,您可以使用 CHAR 值作为外键,就像 INT 一样,获得易读性并保持性能平价。
你不能做但我会错过的一件事是数学运算(如“<”和“>”)。
INT 范围:我尝试过的一种混合策略是使用 INT,但向数字添加一定程度的语义。因此,举例来说,
这里的问题是,如果您发现需要更多数字,那么您就是 SOL,因为下一个范围已经被占用。因此,我最终所做的是(在某种程度上)模仿 HTTP 响应:
与简单的 INT 相比,我更喜欢这种方式,虽然它比 CHAR 的描述性要少,但也不会那么含糊。虽然“PROG”可能意味着很多事情,好的、坏的或良性的,但如果我看到某些东西在 500 范围内,我可能不知道问题是什么,我将能够告诉你是< /em> 有问题。
I've been working with a lot of databases recently that require a lot of statuses AND I've got a few notes that might be worth adding to the conversation.
INT: One thing I found is that if an application has a lot of tracking going on, the number of reference tables can quickly get unwieldy and, as you've mentioned, make inspecting the database at a glance impractical. (Which, for some of my clients, has mattered much more than the scant milliseconds it's saved in processing time.)
VARCHAR: Terrible idea for programming, but it's important to consider if a given status is actually going to be used by the code, or just human eyes. For the latter, you get unlimited range and don't have to maintain any relationships.
CHAR(4): Using a descriptive char column can actually be a very good approach. I'd typically only consider it if the value range were going to be low and obvious, but only because I consider this a nonstandard approach (risking confusion to new devs). Realistically, you could use a CHAR value as a foreign key just the same as an INT, gain legibility and maintain performance parity.
The one thing you couldn't do that I'd miss is mathematical operations (like "<" and ">").
INT Range: A hybrid strategy I've tried out is to use INT, but adding a degree of semantics to the numbers. So, for instance,
The problem here is that if you discover you need more numbers, you're SOL, since the next range is already taken. So, what I ended up doing was (sort of) mimicking HTTP responses:
I prefer this to simple INT, and while it can be less descriptive than CHAR, it can also be less ambiguous. Whereas "PROG" could mean a number of things, good, bad or benign, if I can see something is in the 500 range, I may not known what the problem is, I will be able to tell you there is a problem.
我可以建议您使用 statusID 字段,并使用一个单独的表将 ID 映射到 varchar 吗?
编辑:我想这正是您在第 3 点中概述的内容。我认为这是最好的选择。
May I recommend you go with a statusID field instead, and have a separate table mapping the ID to a varchar?
EDIT: I guess that's exactly what you outlined in point 3. I think that is the best option.
我假设您的数据库有一些描述的前端,并且普通用户不会看到状态代码。
所以,你的便利只针对程序员和 DBA——重要的人,但我不会为他们优化我的设计。
更强 - 我会非常小心地使用“有意义”的缩写 - 我见过的最严重的数据混乱发生在开发人员清理一些数据时,并错误地解释了“有意义”的键;事实证明,“PROG”并不意味着“已编程”,而是“正在进行”。
选择选项 3。
I'm assuming that your database has a front end of some description, and that regular users are not exposed to the status code.
So, your convenience is only for programmers and DBAs - important people, but I wouldn't optimize my design for them.
Stronger - I would be very careful of using "meaningful" abbreviations - the most egregious data foul-up I've ever seen happened when a developer was cleansing some data, and interpreted the "meaningful" key incorrectly; turns out that "PROG" does not mean "programmed", but "in progress".
Go with option 3.
当您想要在 HTML 表单中显示状态列表时,创建一个包含状态的单独表是一个好主意。您可以从查找表中显示详细描述,如果有这样的需求,它将帮助用户选择状态。
从开发的角度来看,我想用整数作为主键。如果你知道它不会超过限制,你可以使用小/微小整数来优化它。
如果您使用缩写作为外键,那么您必须每次都考虑使其始终唯一,正如 @Philip Kelley 提到的它的缺点。
最后,如果您愿意,您可以声明表类型 MYISAM。
更新:
反映@Philip Kelley的观点,如果状态太多,那么最好使用整数作为外键。如果只有几个状态,则可以使用缩写作为外键。
Creating a separate table with status is a good idea when you want to show the list of the status in the HTML form. You can show the verbose description from the lookup table and it will help the user to choose status if the requirements are like that.
From the development perspective, I would like to go integer as a primary key. You can optimize it by using small/tiny integer if you know it will not exceed the limit.
If you use abbreviation as a foreign key then you have to think every time to make it unique all the time as @Philip Kelley had mentioned it as a downside of it.
Lastly, you can declare the table type MYISAM if you like.
Update:
Reflecting @Philip Kelley opinion, if there are too many status, then it's better to use integer as foreign key. If there are only couple of status, then may be use abbr as a foreign key.