实体属性值 (EAV) 的替代方案?

发布于 2024-09-30 02:05:14 字数 297 浏览 9 评论 0 原文

我们的数据库是基于EAV(实体-属性-值)模型设计的。那些使用过 EAV 模型的人都知道为了灵活性而带来的所有废话。

我问我的客户为什么使用EAV模型(灵活性),他们的回答是:他们的实体随着时间的推移而改变。因此,今天他们可能有一个包含一些属性的表,但一个月后,可能会添加一些新属性,或者可能会重命名现有属性。他们需要生成报告以及时返回到任何阶段,并根据该阶段实体的形状查询数据。

我知道这对于传统的关系模型来说是不可行的,但我个人认为 EAV 是反模式。是否有任何其他替代模型使我们能够捕获实体和实例变化的时间维度?

干杯, 莫什

Our database is designed based on EAV (Entity-Attribute-Value) model. Those who have worked with EAV models know all the crap that comes with for the purpose of flexibility.

I asked my client about the reasons why using EAV model (flexibility), and their response was: Their entities change over time. So, today they may have a table with a few attributes, but in a month time, a few new attributes may be added, or an existing attribute may be renamed. They need to produce reports to get back to any stage in time and query the data based on the shape of entities at that stage.

I understand this is not feasible with a conventional relational model, but I personally see EAV as anti-pattern. Are there any other alternative models that enables us to capture the time dimension in changes to the entities and instances?

Cheers,
Mosh

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

梦毁影碎の 2024-10-07 02:05:14

EAV 的正确与否是有区别的; 5NF 由熟练的人或无知的人完成。

第六范式是不可约范式(不可能进一步标准化)。它消除了许多常见问题,例如空问题,并提供了识别缺失值的终极方法。它是学术和技术上强大的 NF。没有产品支持它,并且不常用。为了正确且一致地实施,需要实施元数据目录。当然,导航所需的 SQL 变得更加麻烦(SQL 重新连接已经很麻烦),但是通过从元数据自动生成 SQL 可以轻松克服这一点。

EAV 是 6NF 的部分集或子集。问题是,通常这样做是有目的的(允许添加列而不必进行 DDL 更改),并且由不了解 6NF 且不实现元数据的人完成。关键是,6NF 和 EAV 作为原则和概念提供了实质性的好处,并且性能得到提高;但通常情况下,它没有得到适当的实施,其好处也没有实现。相当多的 EAV 实施都是灾难,不是因为 EAV 不好,而是因为实施很差。

例如。有些人认为从 6NF/EAV 数据库构造 3NF 行所需的 SQL 很复杂:不,它很麻烦但并不复杂。更重要的是,可以提供普通的SQL VIEW,以便所有用户和报表工具只能看到直接的3NF VIEW,而6NF/EAV问题对他们来说是透明的。最后,所需的SQL可以自动化,因此许多人承受的劳动力成本是完全不必要的。

所以答案确实是,第六范式作为 EAV 之父,也是一种更纯粹的形式,是它的替代品。警告是,确保正确完成。我有一个大型 6NF 数据库,它没有遇到人们发布的任何问题,它的性能非常好,客户非常满意(没有进一步的工作是功能完全满意的标志)。

我已经发布了另一个问题的非常详细的答案,该答案也适用于您的问题,您可能对此感兴趣。

其他 EAV 问题

There is a difference between EAV done faithfully or badly; 5NF done by skilled people or by those who are clueless.

Sixth Normal Form is the Irreducible Normal Form (no further Normalisation is possible). It eliminates many of the problems that are common, such as The Null Problem, and provides the ultimate method identifying missing values. It is the academically and technically robust NF. There are no products to support it, and it is not commonly used. To be implemented properly and consistently, it requires a catalogue for metadata to be implemented. Of course, the SQL required to navigate it becomes even more cumbersome (SQL already being cumbersome re joins), but this is easily overcome by automating the production of SQL from the metadata.

EAV is a partial set or a subset of 6NF. The problem is, usually it is done for a purpose (to allow columns to be added without having to make DDL changes), and by people who are not aware of the 6NF, and who do not implement metadata. The point is, 6NF and EAV as principles and concepts offer substantial benefits, and performance increases; but commonly it is not implemented properly, and the benefits are not realised. Quite a few EAV implementations are disasters, not because EAV is bad, but because the implementation is poor.

Eg. Some people think that the SQL required to construct the 3NF rows from the 6NF/EAV database is complex: no, it is cumbersome but not complex. More important, an ordinary SQL VIEW can be provided, so that all users and report tools see only the straight 3NF VIEW, and the 6NF/EAV issues are transparent to them. Last, the SQL required can be automated, so the labour cost that many people endure is quite unnecessary.

So the answer really is, Sixth Normal Form, being the father of EAV, and a purer form, is the replacement for it. The Caveat is, ensure it is done properly. I have one large 6NF db, and it suffers none of the problems people post about, it performs beautifully, the customer is very happy (no further work is a sign of complete functional satisfaction).

I have already posted a very detailed answer to another question which applies to your question as well, which you may be interested in.

Other EAV Question

╄→承喏 2024-10-07 02:05:14

无论您使用哪种关系模型,跟踪字段名称更改都需要大量元数据,您必须在事务日志或审核表中跟踪这些元数据。不幸的是,查询其中任何一个特定日期的状态都非常复杂。但是,如果您的客户端仅需要特定时间日期的状态,即整个状态,而不仅仅是名称更改,则您可以复制数据库并将事务日志回滚到所需的特定时间,然后在新实例上运行查询。但是,如果在指定日期之后添加的实体需要以旧字段名称显示在查询中,那么您将面临一个非常大的工程问题。在这种情况下,根据您在问题中提供的信息,我建议要么与客户协商替代方案,要么获取有关使用报告的更多信息来寻找替代解决方案。

您可以迁移到基于文档的数据存储,但这仍然无法解决第二种情况的问题。抱歉,这并不是一个真正的答案,但在经历过类似情况后,客户可能需要一个更现实的报告解决方案,或者需要一些愿意为工程提供资金的其他投资者。

当我们遇到这个问题时,我们保持数据库模式不变,并实现基于时间戳的实体映射工厂。最后,客户不断改变(每周到每月)关于如何计算聚合字段的要求,但从未完全满足。

Regardless of the kind of relational model you use, tracking field name changes requires a lot of meta data which you must keep track of in either transaction logs or audit tables. Unfortunately, querying either of those for state at a particular date is very complicated. If your client only requires state at a particular time date however, meaning the entire state, not just with respect to name changes, you can duplicate the database and roll back the transaction log to the particular time required and run your queries on the new instance. If entities added after the specified date need to show up in the query with the old field names however, you have a very large engineering problem ahead of you. In that case, with the information you provided in your question, I would suggest either negotiating alternatives with the client or getting more information about the use of the reports to find alternative solutions.

You could move to a document based datastore, but that still wouldn't solve the problem in the second case. Sorry this isn't really an answer, but having worked through similar situations, the client likely needs a more realistic reporting solution or a number of other investors willing to front the capital for the engineering.

When this problem came up for us, we kept the db schema constant and implemented an entity mapping factory based on a timestamp. In the end, the client continually changed requirements (on a weekly to monthly basis) as to how aggregate fields were calculated and were never fully satisfied.

淡墨 2024-10-07 02:05:14

添加到 @NickLarsen 和 @PerformanceDBA 的答案

如果您需要跟踪字段名称等内容的历史更改,您可能需要查看类似 缓慢改变尺寸。在我看来,您正在使用 EAV 来建模动态维度模型(可能是查找列表)。

实现此目的的最简单(也可能是效率最低)的方法是在 EAV 表上包含“截至”日期字段,并且每当发生更改时,就用当前日期插入新记录(而不是更新现有记录)。这意味着您需要更改查询以始终包含或查找“截至”日期,或者如果未提供则默认为“现在”。然后,连接到 EAV 对象的基本实体必须从 EAV 表中查询“top 1”,其中“截至”日期小于或等于该行的“上次更新”日期,按“截至”排序下降。最坏的情况是,如果您需要跟踪给定行的最新更改,其中名称(存储在“属性”表中)和值都已更改,则可以使用“上次修改”将此逻辑链接到值表查找该特定日期的适当值。

如果发生大量变化,这显然有可能生成大量数据。这就是为什么这种方法被称为“缓慢”改变。它适用于可能变化但不会经常变化的尺寸值。为了提高查询性能,“截至”和“上次修改”字段上的索引应该会有所帮助。

To add to the answers from @NickLarsen and @PerformanceDBA

If you need to track historical changes to things like field name, you may want to look into something like Slowly Changing Dimensions. It appears to me like you are using the EAV to model dynamic dimensional models (probably lookup lists).

The simplest (and probably least efficient) way of achieving this would be to include an "as of" date field on EAV tables, and whenever a change occurs, insert a new record (instead of updating an existing record) with the current date. This means that you need to alter your queries to always include or look for an "as of" date, or deafult to "now" if none provided. Your base entity that joins to the EAV objects would then have to query "top 1" from the EAV table where "as of" date is less than or equal to the 'last updated' date of the row, ordered by "as of" descending. Worst case scenario, if you need to track the most recent change to a given row where both the name (stored in the 'attribute' table) and the value have changed, you would chain this logic to the value table using 'last modified' of the row to find the appropriate value for that particular date.

This obviously has the potential to generate LARGE amounts of data if there are a lot of changes. That's why this approach is referred to as "slowly" changing. It's intended for dimensional values that may change, but not very often. To help with query performance, indexes on the "as of" and "last modified" fields should help.

江挽川 2024-10-07 02:05:14

如果您的客户需要这种灵活性,那么关系数据库可能不适合。

考虑存储 JSON 结构的 MongoDB。您可以无限制地添加或不添加字段。您甚至可以使用嵌套。

If your client needs such flexibility, then a relational database might not be the right match.

Consider MongoDB where JSON structures are stored. You can add or not add fields without limitations. You can even use nesting.

生寂 2024-10-07 02:05:14

为每个实体描述版本创建一个新的表描述
还有一张附加表告诉您哪个表是哪个版本。
查询系统也应该更新。

我认为创建一个生成表和查询的脚本是最好的选择。

Create a new table description for each Entity description Version
and one additional table that tells you which table is which version.
The query system should be updated as well.

I think creating a script that generates, tables and queries is your best shot.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文