为应用程序设置数据库的更好方法是什么?标准化与现实世界
请考虑为数据库支持的应用程序进行此设置。 (在我的例子中,数据库是MySQL,应用程序是Ruby(Rails 3),但我认为这对这个问题并不重要)
假设我有一个仓库应用程序。
我有多个具有类别和状态的项目。
例如,包含零件的表将具有一些状态,例如:库存、停产、缺货和多个类别,例如:IT 硬件、汽车、医疗等。
此外,我还有其他需要状态和类别的表,例如 供应商:已批准、停业、新 订单:打开、处理、发货、取消。
等等。
这是问题:
我想如果我想正确规范我的数据库 - 我会有一个名为类别、类别_类型、状态、状态_类型的表。
然后,我会将所有类别存储在该表中,并且任何特定类型的类别(例如零件的所有类别)都将具有类别_类型 - 零件的外键,依此类推。类型相同。
这是规范化的方式。
然而,我经常看到人们为特定类别创建单独的表,例如, 将有一个名为part_categories、vendor_categories、order_statuses、part_status 的表。这是一个不太规范化的数据库,但我想当您处理很多表时,它可能会更清晰。
这些方法中哪一种更好?有什么缺点和缺点?您的经验有哪些优点? 我通常采用第一种设置,但我经常看到第二种设置,以至于我开始怀疑我的方法。
谢谢。
Consider, please this setup for a database backed application.
( in my case DB is MySQL and app is in Ruby ( Rails 3), but I don't think it matters for this question)
Let's say I have an app for a warehouse.
I have multiple items that would have categories and statuses.
For example table that has parts would have a few statuses such as: in stock, discontinued, backordered and multiple categories, such as: it hardware, automotive, medical, etc.
Also I have other tables that need statuses and categories, such as
Vendor: approved, out of business, new
Order: Open, processes, shipped, canceled.
Etc.
Here is the question:
I think if I wanted to properly normalize my db - I would have a table called categories, categories_types, statuses, statuses_types.
Then I would store all categories in that table, and any category that is of a certain type, such all categories of parts, would have a foreign key to category_type - parts, and so on. Same for types.
This is the normalized way.
However I often see that people create separate tables for specific categories, for example,
there would be a table called part_categories, vendor_categories, order_statuses, part_status. This is a less normalized db, but I guess when you are dealing with a lot of tables, it might be clearer.
Which of this approaches is a better one? What are the cons & pros in your experience?
I usually go with the first setup, but I see the second one so often that I'm beginning to doubt my approach.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为这取决于您想要如何与数据交互。第二种方法的好处是可以轻松查看哪些类别和状态与特定对象(供应商、项目、订单)相关联。请记住,如果您确实使用第一种方法,您可能必须在类别和状态表中具有类型标识符,以识别与行相关的类别或状态(供应商、项目、订单)。
第一种方法的好处是更容易为新对象添加状态和类别,并且只需两个表就很简单。当您想要向特定类别或状态添加附加信息时,就会出现问题。例如,如果订单状态需要有 effective_date,但项目状态不应该有 effective_date。一旦达到这一点,您要么必须转向第二种方法,要么添加一个 effective_date,对于该属性不适用的其他状态,该有效日期将为空。
请记住,另一种方法是根本不创建状态和类别表,而是将状态和类别值存储在原始表中。您可以在 MySQL 或 Rails 中使用可枚举 (ENUM) 来完成此操作。在 MySQL 中,ENUM 作为整数存储在数据库中,但它解析为单词值,如“已处理”、“已发货”或“已取消”。这样做的好处是,如果您的状态不经常更改,您就可以少做一次连接,并且更容易读取数据库和 Ruby 模型。在 Ruby 中,ENUM 可以简单地是具有与其关联的键(整数)和值(字符串)的常量列表。您可以使用整数值来查询和更新应用程序端的数据库和单词值。
我相信这两种方法都是合理的,您选择的路径实际上取决于您的需求。如果您打算将数据存储在数据库中,那么请分析您将如何与状态和类别进行交互 - 您的方法可能会有所不同。哪种方法会更快、更容易查询?哪一个更容易更新或修改?您多久读书一次;你多久写一次信?最后,请记住,您是敏捷的!通过简单的迁移和一些重构,这两种方法都可以转换为另一种方法。现在对您的应用程序来说最简单的方法可能不是将来使用的最佳方法,但这完全没问题。这就是敏捷的伟大之处!
I think this depends on how you want to interact with the data. The benefit of the second approach is that it's easy to see which categories and statuses are associated with a specific object (vendor, item, order). Keep in mind that if you do use the first approach you will probably have to have a type identifier in your categories and status tables to identify the kind-of category or status the row is related to (vendor, item, order).
The benefit of the first approach is that it's easier to add statuses and categories for new objects, and there is a simplicity in only having two tables. The problem arises when you want to add additional information to a specific category or status. For example, if order statues need to have an effective_date, but item statues should not have an effective_date. Once you reach this point you'll either have to move to the second approach or add an effective_date that will be null for the other statuses to which the attribute does not apply.
Keep in mind that another approach would be to not create statuses and categories tables at all, but to store the status and category values in the original tables. You can accomplish this with an enumerable (ENUM) in MySQL or in Rails. In MySQL an ENUM is stored in the database as an integer, but it resolves to a word value, like 'processed', 'shipped' or 'canceled'. The benefit to this is that if your statuses do not change often you have one less join to do and it's easier to read the database and Ruby model. In Ruby an ENUM can simply be a list of constants that have a key (integer) and a value (string) associated with them. You can use the integer value to query and update the database and the word value on your application side.
I believe both approaches are legitimate, the path you take really depends on your needs. If you are set on storing the data in the database, then analyze how you will be interacting with statuses and categories - your approach may be different. Which approach will be faster and easier to query? Which one will be easier to update or modify? How often do you read; how often do you write? Finally, keep in mind that you are Agile! Either approach can be transformed into the other with a simple migration and some refactoring. The approach that is simplest for your application now may not be the best one to use in the future, and that's perfectly okay. That's what's so great about being Agile!
根据我的经验,枚举名称表最终总是会演变成它们自己的成熟模型。通常,它首先添加布尔标志,或者如上面的答案中提到的,引用类型或有效日期范围。
从关系的角度来看,这两种方法(将所有状态枚举放入一个表中,或将它们分成单独的表)都比另一种方法“更”规范化。但从类型理论的角度来看,将part_categories和vendor_categories放在各自单独的表中更有意义,因为它不需要模型中的任何代码来确保您不会意外地将供应商类别与零件关联起来。
如果您最终将它们全部放在同一个表中,Rails 有一个很好的功能,称为多态关联,它将为您自动执行类型和 id 列。这是两种方法之间的合理折衷。
最重要的是,我认为枚举最终将呈现出它们自己的模型生命,在这种情况下,您需要在各个表中找到所有枚举并将它们重新转换到它们自己的表中,这是一项非常混乱的工作。桌子很便宜;为什么要对他们节俭?
In my experience, tables of enumerated names invariably evolve into their own full-fledged model eventually. Typically, it begins by adding boolean flags, or as mentioned in the answer above, referent types or valid date ranges.
From a relational perspective, neither approach - putting all status enums in one table, or breaking them into separate tables - is "more" normalized than the other. But from a type-theoretic standpoint, it makes more sense to put part_categories and vendor_categories in their own separate tables, for no other reason than it requires no code in the model to make sure you don't accidentally associate a vendor category with a part.
If you do end up putting them all in the same table, Rails has a nice feature called polymorphic associations that will automate the type and the id columns for you. It's a reasonable compromise between the two approaches.
Most importantly, I would contend that the enums will eventually take on a model life of their own, in which case you have a very messy job of finding all of them in the various tables, and recasting them in their own table. Tables are cheap; why be frugal with them?