如何克服EAV数据库报告的缺点?

发布于 2024-08-26 15:52:24 字数 2841 浏览 13 评论 0原文

SQL 中实体-属性-值数据库设计的主要缺点似乎都与能够高效、快速地查询和报告数据有关。由于这些问题以及几乎所有应用程序的查询/报告的共性,我读到的有关该主题的大多数信息都警告不要实施 EAV。

我目前正在设计一个系统,其中一个实体的字段在设计/编译时未知,并且由系统的最终用户定义。 EAV 似乎很适合此要求,但由于我读过的问题,我对实施它犹豫不决,因为该系统也有一些相当繁重的报告要求。我认为我已经想出了解决这个问题的方法,但想向 SO 社区提出这个问题。

鉴于典型的规范化数据库 (OLTP) 仍然并不总是运行报告的最佳选择,一个好的做法似乎是拥有一个“报告”数据库 (OLAP),其中来自规范化数据库的数据被复制到其中,广泛建立索引,并且可能会被非规范化以便于查询。是否可以使用相同的想法来解决 EAV 设计的缺点?

我看到的主要缺点是将数据从 EAV 数据库传输到报告的复杂性增加,因为在 EAV 数据库中定义新字段时,您可能最终不得不更改报告数据库中的表。但这几乎是不可能的,而且对于 EAV 设计所提供的灵活性的提高来说,这似乎是一个可以接受的折衷方案。如果我使用非 SQL 数据存储(即 CouchDB 或类似的)作为主要数据存储,这个缺点也存在,因为所有标准报告工具都期望 SQL 后端进行查询。

如果您有单独的报告数据库用于查询,EAV 系统的问题是否大部分都会消失?

编辑:感谢迄今为止的评论。关于我正在开发的系统,最重要的事情之一是我实际上只是谈论将 EAV 用于其中一个实体,而不是系统中的所有内容。

该系统的整个要点是能够从提前未知的多个不同来源提取数据,并处理数据以得出有关特定实体的一些“最知名”数据。因此,我正在处理的每个“字段”都是多值的,并且我还需要跟踪每个“字段”的历史记录。标准化设计最终是每个字段 1 个表,这使得查询变得有点痛苦。

以下是我正在查看的表架构和示例数据(显然与我正在处理的内容有所不同,但我认为它很好地说明了这一点):

EAV 表

Person
-------------------
-  Id - Name      -
-------------------
- 123 - Joe Smith -
-------------------

Person_Value
-------------------------------------------------------------------
- PersonId - Source - Field       - Value         - EffectiveDate -
-------------------------------------------------------------------
-      123 -    CIA - HomeAddress - 123 Cherry Ln -    2010-03-26 -
-      123 -    DMV - HomeAddress - 561 Stoney Rd -    2010-02-15 -
-      123 -    FBI - HomeAddress - 676 Lancas Dr -    2010-03-01 -
-------------------------------------------------------------------

报告表

Person_Denormalized
----------------------------------------------------------------------------------------
-  Id - Name      - HomeAddress   - HomeAddress_Confidence - HomeAddress_EffectiveDate - 
----------------------------------------------------------------------------------------
- 123 - Joe Smith - 123 Cherry Ln -                  0.713 -                2010-03-26 -
----------------------------------------------------------------------------------------

标准化设计

Person
-------------------
-  Id - Name      -
-------------------
- 123 - Joe Smith -
-------------------

Person_HomeAddress
------------------------------------------------------
- PersonId - Source - Value         - Effective Date - 
------------------------------------------------------
-      123 -    CIA - 123 Cherry Ln -     2010-03-26 -
-      123 -    DMV - 561 Stoney Rd -     2010-02-15 -
-      123 -    FBI - 676 Lancas Dr -     2010-03-01 -
------------------------------------------------------

这里的“置信度”字段是使用无法使用 SQL 轻松表达(如果有的话)的逻辑生成的,因此除了插入新值之外,我最常见的操作是提取有关一个人的所有数据所有字段,以便我可以生成报告表的记录。这实际上在 EAV 模型中更容易,因为我可以执行单个查询。在规范化设计中,我最终必须对每个字段执行 1 次查询,以避免大量笛卡尔积将它们连接在一起。

The major shortcomings with Entity-Attribute-Value database designs in SQL all seem to be related to being able to query and report on the data efficiently and quickly. Most of the information I read on the subject warn against implementing EAV due to these problems and the commonality of querying/reporting for almost all applications.

I am currently designing a system where the fields for one of the entities are not known at design/compile time and are defined by the end-user of the system. EAV seems like a good fit for this requirement but due to the problems I've read about, I am hesitant in implementing it as there are also some pretty heavy reporting requirements for this system as well. I think I've come up with a way around this but would like to pose the question to the SO community.

Given that typical normalized database (OLTP) still isn't always the best option for running reports, a good practice seems to be having a "reporting" database (OLAP) where the data from the normalized database is copied to, indexed extensively, and possibly denormalized for easier querying. Could the same idea be used to work around the shortcomings of an EAV design?

The main downside I see are the increased complexity of transferring the data from the EAV database to reporting as you may end up having to alter the tables in the reporting database as new fields are defined in the EAV database. But that is hardly impossible and seems to be an acceptable tradeoff for the increased flexibility given by the EAV design. This downside also exists if I use a non-SQL data store (i.e. CouchDB or similar) for the main data storage since all the standard reporting tools are expecting a SQL backend to query against.

Do the issues with EAV systems mostly go away if you have a seperate reporting database for querying?

EDIT: Thanks for the comments so far. One of the important things about the system I'm working on it that I'm really only talking about using EAV for one of the entities, not everything in the system.

The whole gist of the system is to be able to pull data from multiple disparate sources that are not known ahead of time and crunch the data to come up with some "best known" data about a particular entity. So every "field" I'm dealing with is multi-valued and I'm also required to track history for each. The normalized design for this ends up being 1 table per field which makes querying it kind of painful anyway.

Here are the table schemas and sample data I'm looking at (obviously changed from what I'm working on but I think it illustrates the point well):

EAV Tables

Person
-------------------
-  Id - Name      -
-------------------
- 123 - Joe Smith -
-------------------

Person_Value
-------------------------------------------------------------------
- PersonId - Source - Field       - Value         - EffectiveDate -
-------------------------------------------------------------------
-      123 -    CIA - HomeAddress - 123 Cherry Ln -    2010-03-26 -
-      123 -    DMV - HomeAddress - 561 Stoney Rd -    2010-02-15 -
-      123 -    FBI - HomeAddress - 676 Lancas Dr -    2010-03-01 -
-------------------------------------------------------------------

Reporting Table

Person_Denormalized
----------------------------------------------------------------------------------------
-  Id - Name      - HomeAddress   - HomeAddress_Confidence - HomeAddress_EffectiveDate - 
----------------------------------------------------------------------------------------
- 123 - Joe Smith - 123 Cherry Ln -                  0.713 -                2010-03-26 -
----------------------------------------------------------------------------------------

Normalized Design

Person
-------------------
-  Id - Name      -
-------------------
- 123 - Joe Smith -
-------------------

Person_HomeAddress
------------------------------------------------------
- PersonId - Source - Value         - Effective Date - 
------------------------------------------------------
-      123 -    CIA - 123 Cherry Ln -     2010-03-26 -
-      123 -    DMV - 561 Stoney Rd -     2010-02-15 -
-      123 -    FBI - 676 Lancas Dr -     2010-03-01 -
------------------------------------------------------

The "Confidence" field here is generated using logic that cannot be expressed easily (if at all) using SQL so my most common operation besides inserting new values will be pulling ALL data about a person for all fields so I can generate the record for the reporting table. This is actually easier in the EAV model as I can do a single query. In the normalized design, I end up having to do 1 query per field to avoid a massive cartesian product from joining them all together.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

寂寞美少年 2024-09-02 15:52:24

在这个方案中,首先我们提出一个系统,允许用户存储任何类型的数据,无论其结构如何,也无论未来的预期用途如何。然后,当需要发布报告时,我们必须弄清楚我们已经得到了什么,以及它与我们需要的内容有何关系。

由于您明确地将问题的性质归因于“处于此方案中”,因此在我看来,EAV 的问题确实是由 EAV 本身造成的。

事实上,想一想:“一个允许用户存储任何类型数据的系统”相当于一个允许用户定义其相关变量的系统。但是该系统的哪一部分允许用户定义每个属性的约束?哎呀,EAV 人群似乎错过了数据管理的一个不那么不重要的方面,看起来......

In this scheme, first we come up with a system that lets users store any kind of data whatsoever, regardless of its structure, and regardless of the future intended use. Then, when it's time to get the reports out, we have to figure out what we've got, and how that relates to what we need.

Since you clearly attribute the nature of the problem to "being in this scheme", it really seems to me as if the problem with EAV really is due to EAV as such.

In fact, come to think of it: "a system that lets users store any kind of data whatsoever" is the equivalent of a system that allows users to just define their relvars. But what portion of that system allows the users to define constraints on each attribute? Oops, the EAV crowd seems to have missed a not-so-unimportant aspect of data management, it seems...

娇妻 2024-09-02 15:52:24

简短的回答 - 是的,报告数据库是解决 EAV 数据模型报告问题的合理方法。

我花了很多年的时间研究一个信息管理解决方案,该解决方案允许最终用户完全自由地定义自己的数据模型,其中模式和数据都使用 EAV 模型存储。有趣的是,该产品提供了用于满足报告需求的元模式对象(例如提供对象导航的图表、执行投影的视图等)。这意味着最终用户可以使用他们在第一个实例中用于构建数据模型的相同术语和概念来自由定义查询。报告的行为本质上是通过导航这些定义来计算数据集,并将结果传递给传统的报告编写工具,就好像它是关系数据一样。

这种方法的优点之一是,将 EAV 模型转换为用户可以使用的模型的相同机制可以重复使用并应用于报告功能。

Short answer - yes, a reporting database is a reasonable approach to solving the problems of reporting from an EAV data model.

I spent a number of years working with an information management solution which allowed end users complete freedom to define their own data model, with both the schema and the data stored using an EAV model. Interestingly, this product provided meta-schema objects used to fulfill reporting requirements (e.g. graphs to provide object navigation, views to perform projection, etc.). This meant that the end user was free to define queries using the same terms and concepts that they'd used to build the data model in the first instance. The act of reporting was essentially to compute the data set by navigating these definitions, and hand the result over to a traditional report writing tool as if it were relational data.

One of the strengths of this approach was that the same mechanism that was already in place to transform the EAV model to something the user could work with could be reused and applied to the reporting function.

念﹏祤嫣 2024-09-02 15:52:24

EAV 的问题并不是由 EAV 本身造成的。这是由于在设计和构建数据库时没有了解数据的真正需求是什么,以及数据必须具有什么逻辑结构才能满足这些需求。 EAV 以及任何其他允许用户设计自己的数据的系统都颠覆了这一点。

在这个方案中,首先我们提出一个系统,允许用户存储任何类型的数据,无论其结构如何,也无论未来的预期用途如何。然后,当需要发布报告时,我们必须弄清楚我们已经得到了什么,以及它与我们需要的有何关系。

祝你好运。

The problem with EAV is not due to EAV as such. It's due to designing and building a database without understanding what the data requirements really are, and what logical structure the data must have in order to meet these requirements. EAV, and any other system that lets the users design their own data, turns this on its head.

In this scheme, first we come up with a system that lets users store any kind of data whatsoever, regardless of its structure, and regardless of the future intended use. Then, when it's time to get the reports out, we have to figure out what we've got, and how that relates to what we need.

Good luck with that.

怪我入戏太深 2024-09-02 15:52:24

EAV没有问题我花了相当多的时间从MASSIVE EAV数据库中查询。任何说从 EAV 进行报告很困难或不可能的人都会遇到以下两个问题之一:要么他们的 EAV 系统设计很糟糕,要么他们不明白如何从系统中进行查询。一旦您完成了几次,从 EAV DB 获取良好的可报告数据就变得非常容易。不需要报告数据库或任何特殊的东西,只需一些好的查询即可。

如果您在构建 EAV DB 时花费了大量时间来设计它,那么该设计要么会成就您的应用程序,要么会毁掉您的应用程序,并且尝试修复或处理设计不佳的应用程序将是一场噩梦。

There is no problem with EAV I spend quite a large amount of time querying from MASSIVE EAV databases. Anyone that says reporting from EAV is difficult or impossible has 1 of 2 problems, either they have a very poorly designed EAV system or they don't understand how to query from one. getting nice report-able data from an EAV DB is quite easy once you've done it a few times. There's no need for a reporting database or anything special, just a few good queries.

If you're building an EAV DB spend A LOT of time designing it, the design will either make or break your application and it will be a nightmare trying to fix or deal with a poorly designed one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文