数据库设计:灵活性与简单性
我正在尝试权衡简单数据库结构的相对优缺点,如下所示:
1.
CREATE TABLE x (
my_id INT PRIMARY KEY,
...,
text_attribute_blah TEXT,
text_attribute_blah_blah TEXT
);
vs:
2.
CREATE TABLE x (
my_id INT PRIMARY KEY,
...
)
CREATE TABLE attributes (
my_id INT, /* foreign key to x.my_id */
text_attribute_type INT,
text_attribute TEXT
)
其中 attribute_type 可以是废话还是废话。
选项 1 提供了简单性 - 该表更易于读/写;选项 2 提供了灵活性(如果我们想添加另一个属性,例如 blah_blah_blah,我们不需要进行架构更改,因此可能会减少代码更改。)
这个难题有正确/错误的答案吗?这些选项之一是否被认为比其他选项更好?您能给我指出可能有助于确定前进方向的进一步阅读吗?
I am trying to weigh up the relative pros and cons of a simple database structure such as this:
1.
CREATE TABLE x (
my_id INT PRIMARY KEY,
...,
text_attribute_blah TEXT,
text_attribute_blah_blah TEXT
);
vs:
2.
CREATE TABLE x (
my_id INT PRIMARY KEY,
...
)
CREATE TABLE attributes (
my_id INT, /* foreign key to x.my_id */
text_attribute_type INT,
text_attribute TEXT
)
Where attribute_type could be blah or blah_blah.
Option 1 offers simplicity - the table is easier to read/write; Option 2 offers flexibility (if we want to add another attribute such as blah_blah_blah, we don't need to make schema changes and so probably fewer code changes.)
Is there a right/wrong answer to this conundrum? Is one of these options considered better practice than the others? Can you point me at further reading that might help be determine the way forward?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我几乎总是选择#1 - 我只是更喜欢将属性作为表中的列 - 使查询、性能索引和一般处理变得更加容易和透明。
#2 选项称为 EAV - 实体属性值 - 它有一些主要缺点 - 请参阅
I'd almost always choose #1 - I just prefer to have attributes as columns in my tables - makes querying, indexing for performance and the general handling much easier and more transparent.
the #2 option is called EAV - Entity Attribute Value - and it has some major drawbacks - see
有趣的是,您没有提到性能或数据完整性作为问题。就其价值而言,模型#1 是考虑这些因素的最佳方法。
就数据模型而言,灵活性被大大高估了。大多数表结构在开发之初都是众所周知的,并且在数据库的整个生命周期中保持稳定。如果您的应用程序的模型确实是流动且不可知的,那么您可能根本不应该使用 RDBMS。请选择一种 NoSQL 产品。
这是对#1 的另一次投票。
It's interesting that you don't mention either performance or data integrity as concerns. For what it's worth, model #1 is the best approach for those considerations.
Flexibility is vastly over-rated with regards to data models. Most table structures are well-known at the start of development and remain stable throughout the lifetime of a database. If you have an application where the model is genuinely fluid and unknowable then probably you should not be using an RDBMS at all. Choose one of the NoSQL products instead.
So that's another vote for #1.
每个解决方案都有一个需要解决的问题。如果您预先知道需要的列,#1 将是一个好方法。然而,在某些情况下,预先并不知道这些列。例如,用户添加到功能中的自定义字段。
话虽如此,EAV 也存在很多问题。在我看来,如果使用得当,它们是有用的。
Every solution has a problem to solve. #1 will be a good approach if you know the columns that you need upfront. However, in some cases, the columns are not known upfront. For example, custom fields that a user adds to a functionality.
Having said that, EAVs have abundance of problems. When used properly, IMO, they are useful.
几乎每次都选择1。选项2的效率非常低。当你必须更高效地做某事时,轻松查询也是相当笨拙的。话虽如此,我已经看到许多产品针对用户定义的属性执行此操作。使用选项 2 技术的系统示例有 Agresso 和 Kalido。
如果您正在开发定制应用程序,那么迄今为止添加属性的最佳方法就是在您使用时扩展数据库架构。需要。由于更改将伴随着对代码的修改,因此可以作为发布过程的一部分来完成。
如果您正在开发一个打包的应用程序,并且希望客户自行配置,那么您可以采用三种广泛的方法。
EAV 结构,如选项 2。这很灵活,但查询效率低下,特别是当查询因多个联接而变得复杂时。
EAV 结构,如选项
在表上创建一组“用户”字段(User1、User2 等)。这限制了您的有限数量,但这可能非常大(如果您愿意,您可以拥有 User01-User99)。然而,它是最高效、最简单的查询方式。另一个缺点是字段有些不透明。您必须有权访问配置信息才能了解“User3”的含义。它还牺牲了一些类型安全性。然而,总的来说,您的用户字段机制将拥有一些自己的元数据和某种通用框架,因此可以通过此提供一些类型安全性。
这看起来最不优雅,但在大多数情况下是执行此操作的最佳方法,因为它具有最佳性能和最简单的查询。这是迄今为止最容易使用的方案。
XML。这是无限灵活的,但是大多数围绕数据库的工具在处理 XML 方面做得很差。它还将 XML 存储在与主表不同的分配单元中,因此可能会导致查询性能出现严重问题。基于 XML 的策略非常以应用程序为中心,但会牺牲其他数据使用者的利益。
根据我的经验,在数据库的 XML 字段中存储大量数据将显着增加应用程序的 TCO。大多数情况下不建议用于用户数据字段。
Option 1 almost every time. Option 2 is very inefficient. It is also quite clumsy to query easily when you have to do something with more efficient. Having said that, I have seen a number of products that do this for user defined attributes. Examples of systems that use the option 2 technique are Agresso and Kalido.
If you're doing a bespoke application the best way by far to add attributes is simply to extend the database schema when you need to. As the change will be accompanied by modifications to the code it can be done as a part of the release process.
If you're doing a packaged application that you intend customers to configure themselves you have three broad approaches that you can take.
EAV structure like option 2. This is flexible, but is inefficient to query, particularly as the queries get complex with multiple joins.
Make a set of 'User' fields (User1, User2 etc.) on the tables. This limits you to a finite number, but this can be quite large (you could have User01-User99 if you wanted). However, it is the most efficient and simplest to query. The other con is that the fields are somewhat opaque. You have to have access to configuration information to know the meaning of 'User3'. It also sacrifices some type safety. On balance, however, your user field mechanism is going to have some of its own metadata and a generic framework of some sort, so some of that type safety can be provided through this.
This looks the most inelegant but is the best way to do this in most cases as it has the best performance and simplest queries. It is by far the easiest scheme to work with.
XML. This is infinitely flexible but most of the tooling surrounding the database does a poor job of working with XML. It also stores the XML in separate allocation units from the main table, so it can cause significant issues with query performance. XML based strategies are very application-centric at the expense of other consumers of the data.
In my experience storing significant amounts of data in XML fields in a database will significantly increase your application's TCO. Not recommended for user data fields in most cases.
@marc_s
我不相信人们可以“几乎总是”在上述选项中做出任何一种选择。有一个案例支持这两种解决方案。
选项#1
当实体 X 定义良好时,即您确切地知道定义 X 需要捕获什么时,请执行此操作。在这种情况下,X 的一条记录几乎捕获了 X 实例所代表的所有内容。
选项#2
当这样的实体 X 不能被完全定义时,即您不知道需要什么属性集来“完全”定义它时,请这样做。
例如,以“您应该避免的五个简单数据库设计错误”一文中提到的员工记录为例[链接由@marc_s提供]。是的!!!您会很想选择选项 1,但如果您考虑在大型组织中工作的员工的情况,一旦记录了员工信息,其定义和内容都是高度动态的,并且需要选项 1 和选项 2 的组合。
@marc_s
I don't believe one can "almost always" make any one selection among above options. There is a case to support both the solutions.
Option #1
Go for this when the entity X is well defined i.e. you know exactly what you need to capture in order to define X. In such a case one single record of X pretty much captures everything an instance of X stands for.
Option #2
Go for this when such an entity X can not be completely defined i.e. you dont know what set attributes are required to define it "completely".
For e.g. take a example of employee record as mentioned in article "Five simple database design errors you should avoid" [link provided by @marc_s]. Yes!!! you will be tempted to got for Option 1 but if you consider the case of employees working in large organizations, once single record the employee information - both its definition and content is highly dynamic and the combination of option#1 and option#2 required.
@marc_s
虽然我提到了员工记录的例子,但我确信这不是很有说服力。
这是金融领域的例子。
如果您想捕获一笔交易的所有属性,那么这取决于其工具类型。捕获大多数外汇、货币市场甚至债券工具要容易得多,因为它们非常结构化。
但当我们转向衍生产品时,它就变得非常麻烦。它们本质上非常奇特,并且在结构方面不断变化(因此具有其含义等)。为了捕获这种动态变化的信息,我们应该选择 EAV。当然,在做出这一选择时,应该意识到它会带来上面在您的评论中列出的许多负面因素。
我不能谈论其他领域,但我相信您会发现许多业务领域的 IT 系统都面临这种情况,因此充分了解 EAV 策略(而不是彻底拒绝它)将是一个好主意。
什里尼
@marc_s
Though I mentioned the example of employee record I am sure that is not very convincing.
Here is the example from financial domain.
If you want to capture all attributes of a deal then it depends on its type of instrument. It is lot easier to capture most Forex, Money Market even Bond instruments as they very structured.
But as we move towards derivative products it becomes very cumbersome. They are very exotic in nature and keep changing in terms of structure (hence its meaning etc). To capture such a dynamically changing information we should opt for EAV. Ofcourse while making this choice one should be aware that it brings lot of negatives listed above in your comment.
I can not talk about other domains, but I am sure you will find that IT systems in lot of business domains face this situation and hence having a good understanding of EAV strategy - as oppose to its outright rejection - will be a good idea.
Shrini
正如前面所说,这取决于您的要求。仅当您需要添加新类型的属性作为程序工作流程的一部分时,才应选择#2。通过在表中添加新列来执行此操作肯定比在查询中添加一个额外的表和一个额外的联接更糟糕。
As it was said earlier, it depends on your requirements. You should choose #2 only if you need, for example, add new types of attributes as a part of your program workflow. Doing this with adding new columns in your tables is certainly worse than have one extra table and one extra join in your queries.