“大师”关联表?

发布于 2024-10-04 20:16:28 字数 633 浏览 5 评论 0原文

考虑一个匹配客户端和服务的模型。客户在不同时期可能既是服务的提供者,又是服务的消费者。客户可以是个人或团体(公司),后者有多个联系人。联系人可能有多个地址、电话、电子邮件。其中一些关系将是一对一的(例如,向提供商提供服务),但大多数关系将是一对多或多对多(公司的多个联系人将具有相同的地址)。

在此模型中,通常会存在多个关联表,例如 client_contact、contract_addr、contact_phone、contact_email、service_provider、service_consumer 等。

假设您针对给定服务的消费者发出一个简单的联系信息查询。除了包含数据的六个实体表之外,连接还将引用五个关联表。当然,这种查询没有什么特别有趣的地方——我们每天都会这样做。

但我突然想到:为什么不使用一个单一的“主”关联表来保存所有关联呢?除了两个 PK 之外,还要求该主表具有“关联类型”,并且所有 PK 都具有相同类型(整数、GUID 等)。

一方面,查询会变得更加复杂,因为每个连接都需要指定类型和 PK。另一方面,所有联接都将访问同一个表,并且通过适当的索引和缓存性能可以显着提高。

我认为可能有一个模式(或反模式)描述这种方法,但在网上没有找到任何内容。有人尝试过吗?如果是这样,它会扩展吗?

您能提供的任何参考资料将不胜感激。

Consider a model for matching clients and sevices. Clients may be both providers of and consumers of services at various times. Clients may be individuals or groups (companies), the latter having multiple contacts. Contacts may have multiple addresses, phones, e-mails. Some of these relationships will be one-to-one (e.g., service to provider), but most will be one-to-many or many-to-many (multiple contacts at a company would have the same address).

In this model several associative tables would typically exist, e.g., client_contact, contract_addr, contact_phone, contact_email, service_provider, service_consumer, etc.

Say you issue a simple query for contact information for consumers of a given service. In addition to the six entity tables containing the data, the joins would reference five associative tables. Nothing particularly interesting about about this kind of query, of course - we do it every day.

It occurred to me though: why not have a single "master" associative table holding all associations? It would require this master table to have an "association type" in addition to the two PKs, and for all PKs to be of the same type (ints, GUIDs, etc.).

On the one hand, queries would become more complicated because each join would need to specifiy the type and PK. On the other hand, all joins would access the same table, and with appropriate indexng and caching performance could improve dramatically.

I assumed there might be a pattern (or anti-pattern) describing this approach, but haven't found anything on-line. Has anyone tried it? If so, does it scale?

Any references you can provide would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

若水般的淡然安静女子 2024-10-11 20:16:28

你所描述的让我想起数据仓库中的事实表。我的理解是,您从一个典型的事务模式开始,用一个表来对每个多对多关系进行建模。然后,为了重构数据以便于进行维度分析,您可以将架构中的部分/所有关系聚合到一个宽表中,其中每一列都是一个键。这可以有效地提前执行所有可能的联接并将它们转储到表中,从而将查询联接的目的从遵循关系转变为获取实体的属性。

不管怎样,我对这些东西的理解很模糊,我的经验实际上为零,但也许你的想法是另一个名称的事实表,使它们对调查有用。

What you're describing reminds me of fact tables from data warehousing. My understanding is that you start with a typical transactional schema with a table to model every many-to-many relationship. Then, to restructure the data for easier dimensional analysis, you can aggregate some / all the relationships in your schema into one wide table where each column is a key. This effectively performs all possible joins ahead of time and dumps them into a table, inverting the purpose of query joins from relationship following to getting to the properties of your entities.

Anyway, my understanding of this stuff is hazy and my experience effectively nil, but maybe your idea is a fact table by another name, making them useful to investigate.

半岛未凉 2024-10-11 20:16:28

首先,我认为您肯定在可维护性方面付出了代价。每当我有这样的“类型”栏时,我就会想到危险信号。它似乎可能会导致程序中出现魔术字符串——您需要确保插入和选择之间的类型一致,例如。因此,任何性能提升都需要足够大才能证明这一令人头痛的问题。

其次,您要为存储更多数据付出代价——每个关联的额外“类型”列。然后在运行查询时需要检索这些数据,这会影响内存中一次可以有多少行(也许)。

第三,每个查询可能需要访问相同的总行数,无论它们是存储在多个表还是一个表中。因此,除非您了解有关数据的一些信息,可以让您创建聚集索引或其他内容,否则您在运行查询时可能会检索相同数量的页面。

第四,可能的性能提升来自于假设索引具有对数行为,并注意到 5log(N) 大于 log(5N),因此使用 1 个大索引比使用 5 个较小索引更好。然而,类型列的添加将会减少这种好处。我不太确定如何分析它是否会完全消除它,或者只是减少它。

第五,看起来很可能至少对于某些查询,您最终将加入该巨大表的多个副本,这看起来确实会成为一个杀手。

我很想看看您会得到什么结果,但如果有性能优势,我会感到惊讶。

First off, I think you're definitely paying a price in maintainability. Any time I have a "type" column like that, I think red flag. It seems likely to lead to magic strings in your procedures--you need to make sure type is consistent across inserts and selects, eg. So any performance increase needs to be big enough to justify this headache.

Second, you're paying a price in storing more data--the extra "type" column for each association. And then this data needs to be retrieved when running a query, which affects how many rows can be in memory at once (maybe).

Third, each query probably needs to access the same total number of rows, regardless of whether they're stored in multiple tables or one. So, unless you know something about your data that will let you create clustered indexes or something, you're probably retrieving the same number of pages when you run queries.

Fourth, the likely performance gains come from assuming that the index has a logarithmic behavior, and noting that 5log(N) is greater than log(5N), so it's better to use one big index than 5 smaller ones. However, the addition of the type column is going to reduce this benefit. I'm not really sure how to analyze if it would eliminate it completely, or just reduce it.

Fifth, it seems pretty likely that for at least some queries, you're going to end up joining multiple copies of that huge table, which really seems like it's going to be a killer.

I'd be interested to see what results you get, but I'd be surprised if there's a performance benefit.

海螺姑娘 2024-10-11 20:16:28

这可以通过抽象和表继承来解决。

个人客户、组织客户、服务提供商都是扮演角色的各方。

电子邮件地址、电话号码、网址和实际地址都是地址。

This can be solved with abstraction and table inheritance.

An Individual Client, Organization Client, Service Provider are all Parties, which play Roles.

An Email Address, Telephone Number, Web Address, and Physical Address are all Addresses.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文