标准化跨表共享的通用 ID 类型

发布于 2024-07-08 15:58:50 字数 1047 浏览 6 评论 0原文

这是问题的简化版本。

我们的客户向我们发送大量数据然后进行查询。他们要求我们拥有几个“公共”ID，以便他们可以查询我们的数据。（大多数人希望通过随数据一起发送的 ID 来查询我们的系统，但并非总是如此）。为了简单起见，我们将它们称为“pid”、“crid”和“musicbrainzid”。我们有一个“实体”表来存储这些信息。它看起来像这样（“权威”是发送数据的人）：

entity 
-- 
entity_id   
authority  // who sent the data
type       // 'pid', 'crid', 'musicbrainz', etc.
value      // the actual id value

然后我们有单独的实体，例如“剧集”，“系列”和“广播”（实际上，还有更多，但我保持简单这里）。其中每个都有一个指向实体表的entity_id。

外部客户如何通过 pid 或 crid 进行搜索并获得适当的剧集或连续剧，并正确识别它是什么？给定 pid，我们可以获取实体 id，但随后我们需要在剧集、系列和广播表中搜索该值。此外，并非所有 id 都必然与所有其他表相关，但任何实体（例如，“剧集”）可能有多个 id（pid、crid 等）。

策略：

查找 pid 的实体 id 并搜索pid 的所有其他表。
在实体上放置一个“entity_type”列，但是如果它是剧集表中的 pid，但我们不小心将 Episode.type 设置为系列怎么办？我们不想重复数据，我不想将数据库元数据放入列值中。

选项 1 很慢并且似乎是错误的（此外，各个表具有不同的结构，这会产生问题）。

选项 2 意味着重复数据，并且该数据可能不同步。我们可以使用触发器来强制执行此操作，但这看起来确实很令人讨厌，而且无论如何，mysql 触发器实现中的错误已经多次困扰我们。我们现在正在使用这个策略，但没有触发器。

选项3是什么？

旁注：我们知道我们需要将“权限”分解到一个单独的表中，因为并非所有权限/类型组合都是有效的。

原文

This is a simplified version of the problem.

We have customers who send us lots of data and then query it. We are required by them to have several "public" ids they can query our data by. (Most want to query our system via the id they send along with the data, but not always). For simplicity, we'll call them "pid", "crid" and "musicbrainzid". We have an "entity" table which stores this information. It looks something like this (the "authority" is who sent the data):

entity 
-- 
entity_id   
authority  // who sent the data
type       // 'pid', 'crid', 'musicbrainz', etc.
value      // the actual id value

Then we have separate entities such as "episode", "series" and "broadcast" (actually, there's a lot more, but I'm keeping it simple here). Each of these has an entity_id pointing to the entity table.

How can external customers search, via pid or crid and get the appropriate episode or series, along with proper identification of what it is? Given a pid, we can fetch the entity id, but then we need to search the episode, series and broadcast tables for this value. Further, not all ids will necessarily be related to all of the other tables, but any entity (e.g., an "episode") might have several ids (pid, crid, etc.)

Strategies:

Find the entity id for a pid and search every other table for the pid.
Put an "entity_type" column on entity, but what if it's a pid in the episode table but we accidentally set episode.type as series? We don't want to duplicate data and I don't want to put database metadata into column values.

Option number 1 is slow and seems wrong (further, the various tables have different structures making problematic).

Option 2 means duplicate data and this data can get out of synch. We can use triggers for force this, but this seems really nasty and, in any event, bugs in the implementation of mysql triggers have hit us several times. We're using this strategy right now, but without triggers.

What's option 3?

Side note: we know we need to break "authority" out into a separate table because not all authority/type combinations are valid.

分享到QQ

分享到微博