数据库设计：灵活性与简单性

发布于 2024-09-06 16:15:33 字数 651 浏览 6 评论 0原文

我正在尝试权衡简单数据库结构的相对优缺点，如下所示：

CREATE TABLE x (
    my_id INT PRIMARY KEY,
    ...,
    text_attribute_blah TEXT,
    text_attribute_blah_blah TEXT
);

vs:

CREATE TABLE x (
    my_id INT PRIMARY KEY,
    ...
)

CREATE TABLE attributes (
    my_id INT,  /* foreign key to x.my_id */
    text_attribute_type INT,
    text_attribute TEXT
)

其中 attribute_type 可以是废话还是废话。

选项 1 提供了简单性 - 该表更易于读/写；选项 2 提供了灵活性（如果我们想添加另一个属性，例如 blah_blah_blah，我们不需要进行架构更改，因此可能会减少代码更改。）

这个难题有正确/错误的答案吗？这些选项之一是否被认为比其他选项更好？您能给我指出可能有助于确定前进方向的进一步阅读吗？

原文

I am trying to weigh up the relative pros and cons of a simple database structure such as this:

CREATE TABLE x (
    my_id INT PRIMARY KEY,
    ...,
    text_attribute_blah TEXT,
    text_attribute_blah_blah TEXT
);

vs:

CREATE TABLE x (
    my_id INT PRIMARY KEY,
    ...
)

CREATE TABLE attributes (
    my_id INT,  /* foreign key to x.my_id */
    text_attribute_type INT,
    text_attribute TEXT
)

Where attribute_type could be blah or blah_blah.

Option 1 offers simplicity - the table is easier to read/write; Option 2 offers flexibility (if we want to add another attribute such as blah_blah_blah, we don't need to make schema changes and so probably fewer code changes.)

Is there a right/wrong answer to this conundrum? Is one of these options considered better practice than the others? Can you point me at further reading that might help be determine the way forward?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

九八野马 2024-09-13 16:15:33

我几乎总是选择#1 - 我只是更喜欢将属性作为表中的列 - 使查询、性能索引和一般处理变得更加容易和透明。

#2 选项称为 EAV - 实体属性值 - 它有一些主要缺点 - 请参阅

您应该避免的五个简单数据库设计错误 - EAV 方法在此列表中排名第三
Joe Celko：避免 EAV 破坏

回复收藏 0 原文

在梵高的星空下 2024-09-13 16:15:33

有趣的是，您没有提到性能或数据完整性作为问题。就其价值而言，模型#1 是考虑这些因素的最佳方法。

就数据模型而言，灵活性被大大高估了。大多数表结构在开发之初都是众所周知的，并且在数据库的整个生命周期中保持稳定。如果您的应用程序的模型确实是流动且不可知的，那么您可能根本不应该使用 RDBMS。请选择一种 NoSQL 产品。

这是对#1 的另一次投票。

回复收藏 0 原文

忆离笙 2024-09-13 16:15:33

每个解决方案都有一个需要解决的问题。如果您预先知道需要的列，#1 将是一个好方法。然而，在某些情况下，预先并不知道这些列。例如，用户添加到功能中的自定义字段。

话虽如此，EAV 也存在很多问题。在我看来，如果使用得当，它们是有用的。

确保您不会为所有内容创建 EAV。它仅适用于“未知项目”。
请记住，EAV 没有可以依赖的外键关系。
由于查询繁琐，性能较低，并且维护可能较多。
请记住，必须调整 EAV 才能使其有意义（嗯，大多数情况下）。

回复收藏 0 原文

暖心男生 2024-09-13 16:15:33

几乎每次都选择1。选项2的效率非常低。当你必须更高效地做某事时，轻松查询也是相当笨拙的。话虽如此，我已经看到许多产品针对用户定义的属性执行此操作。使用选项 2 技术的系统示例有 Agresso 和 Kalido。

如果您正在开发定制应用程序，那么迄今为止添加属性的最佳方法就是在您使用时扩展数据库架构。需要。由于更改将伴随着对代码的修改，因此可以作为发布过程的一部分来完成。

如果您正在开发一个打包的应用程序，并且希望客户自行配置，那么您可以采用三种广泛的方法。

EAV 结构，如选项 2。这很灵活，但查询效率低下，特别是当查询因多个联接而变得复杂时。
EAV 结构，如选项
在表上创建一组“用户”字段（User1、User2 等）。这限制了您的有限数量，但这可能非常大（如果您愿意，您可以拥有 User01-User99）。然而，它是最高效、最简单的查询方式。另一个缺点是字段有些不透明。您必须有权访问配置信息才能了解“User3”的含义。它还牺牲了一些类型安全性。然而，总的来说，您的用户字段机制将拥有一些自己的元数据和某种通用框架，因此可以通过此提供一些类型安全性。
这看起来最不优雅，但在大多数情况下是执行此操作的最佳方法，因为它具有最佳性能和最简单的查询。这是迄今为止最容易使用的方案。
XML。这是无限灵活的，但是大多数围绕数据库的工具在处理 XML 方面做得很差。它还将 XML 存储在与主表不同的分配单元中，因此可能会导致查询性能出现严重问题。基于 XML 的策略非常以应用程序为中心，但会牺牲其他数据使用者的利益。
根据我的经验，在数据库的 XML 字段中存储大量数据将显着增加应用程序的 TCO。大多数情况下不建议用于用户数据字段。

Option 1 almost every time. Option 2 is very inefficient. It is also quite clumsy to query easily when you have to do something with more efficient. Having said that, I have seen a number of products that do this for user defined attributes. Examples of systems that use the option 2 technique are Agresso and Kalido.

If you're doing a bespoke application the best way by far to add attributes is simply to extend the database schema when you need to. As the change will be accompanied by modifications to the code it can be done as a part of the release process.

If you're doing a packaged application that you intend customers to configure themselves you have three broad approaches that you can take.

EAV structure like option 2. This is flexible, but is inefficient to query, particularly as the queries get complex with multiple joins.
Make a set of 'User' fields (User1, User2 etc.) on the tables. This limits you to a finite number, but this can be quite large (you could have User01-User99 if you wanted). However, it is the most efficient and simplest to query. The other con is that the fields are somewhat opaque. You have to have access to configuration information to know the meaning of 'User3'. It also sacrifices some type safety. On balance, however, your user field mechanism is going to have some of its own metadata and a generic framework of some sort, so some of that type safety can be provided through this.
This looks the most inelegant but is the best way to do this in most cases as it has the best performance and simplest queries. It is by far the easiest scheme to work with.
XML. This is infinitely flexible but most of the tooling surrounding the database does a poor job of working with XML. It also stores the XML in separate allocation units from the main table, so it can cause significant issues with query performance. XML based strategies are very application-centric at the expense of other consumers of the data.
In my experience storing significant amounts of data in XML fields in a database will significantly increase your application's TCO. Not recommended for user data fields in most cases.

回复收藏 0 原文

一梦等七年七年为一梦 2024-09-13 16:15:33

@marc_s
我不相信人们可以“几乎总是”在上述选项中做出任何一种选择。有一个案例支持这两种解决方案。

选项#1
当实体 X 定义良好时，即您确切地知道定义 X 需要捕获什么时，请执行此操作。在这种情况下，X 的一条记录几乎捕获了 X 实例所代表的所有内容。

选项#2
当这样的实体 X 不能被完全定义时，即您不知道需要什么属性集来“完全”定义它时，请这样做。

例如，以“您应该避免的五个简单数据库设计错误”一文中提到的员工记录为例[链接由@marc_s提供]。是的！！！您会很想选择选项 1，但如果您考虑在大型组织中工作的员工的情况，一旦记录了员工信息，其定义和内容都是高度动态的，并且需要选项 1 和选项 2 的组合。

回复收藏 0 原文

╰つ倒转 2024-09-13 16:15:33

@marc_s

虽然我提到了员工记录的例子，但我确信这不是很有说服力。

这是金融领域的例子。

如果您想捕获一笔交易的所有属性，那么这取决于其工具类型。捕获大多数外汇、货币市场甚至债券工具要容易得多，因为它们非常结构化。
但当我们转向衍生产品时，它就变得非常麻烦。它们本质上非常奇特，并且在结构方面不断变化（因此具有其含义等）。为了捕获这种动态变化的信息，我们应该选择 EAV。当然，在做出这一选择时，应该意识到它会带来上面在您的评论中列出的许多负面因素。

我不能谈论其他领域，但我相信您会发现许多业务领域的 IT 系统都面临这种情况，因此充分了解 EAV 策略（而不是彻底拒绝它）将是一个好主意。

什里尼

回复收藏 0 原文