在关系数据库中实现全局唯一标识符的优点/缺点和方法?

发布于 2024-10-21 06:22:16 字数 1131 浏览 5 评论 0原文

关于我的问题的第一部分:我最近问自己在关系数据库中的某些表拥有唯一标识符的好处和权衡是什么。举个例子,Facebook (FB) Graph API 允许使用相同的 URL 获取不同类型的对象,例如“用户”、“事件”、“页面”等,例如 https://domain/251906384206 返回类型为“Event”的对象,而 https ://domain/195466193802264 返回“Group”类型的对象。

与提供不太“通用”的 API(以这种方式使用)相比,这种方法有什么好处:https:// /domain/event/251906384206https://domain/group/195466193802264。在这种情况下,类似的标识符可能用于不同的对象类型,因为每个对象类型都有其标识符范围。

关于问题的第二部分:实施全球唯一标识符有哪些选择?

我想到的两个选项是:

  1. 使用基于继承的方法(每个类一个表、单个表等)。假设使用每类表的方法(超级表仅包含唯一标识符作为主键,表示对象类型的子表包含与超级表相同的标识符和附加数据),超级表和子表之间需要连接,这似乎扩展性很差因为超级表成为瓶颈?

  2. 提供一个包含 3 列的表格,其中包含

    • 唯一标识符,
    • 对象类型特定主键,以及
    • 表名。

    每个对象类型的附加表包含引用唯一标识符作为外键的列。每个特定于对象类型的表都有其自己的主键范围。

这两种方法都允许提供通用 API,如上面提到的 FB API。第二种方法允许在内部使用对象表特定的主键并仅公开全局唯一标识符。但是,如果可以在内部使用全局唯一标识符,则第二种方法也需要联接。

关于全球唯一标识符的优缺点是否有任何经验以及实施它的最佳实践是什么?

Regarding the first part of my question: I was recently asking myself what are the benefits and trade-offs of having a unique identifier for certain tables in a relational database. Just as an example, the Facebook (FB) Graph API allows to fetch different types of objects such as "Users", "Events", "Pages", etc. using the same URL, e.g https://domain/251906384206 returns an object of type "Event" whereas https://domain/195466193802264 returns an object of type "Group".

What is the benefit of this approach compared to providing a less "generic" API, one which would be used in this way: https://domain/event/251906384206 or https://domain/group/195466193802264. In this case, a similar identifier might be used for different objects types because each object type has it's identifier scope.

Regarding the second part of the question: What are the options for implementing a globally unique identifer?

Two options that come to my mind are:

  1. Using an inheritance-based approach (table-per-class, single table, etc.). Assuming a table-per-class approach is used (super table contains unique identifier as primary key only, sub table representing object type contains same indentifier as super table and additional data), joins are required between super and sub table which seems to scale badly because the super table becomes a bottleneck?

  2. Providing a table with 3 columns, containing

    • unique identifier,
    • object type specifc primar key, and
    • table name.

    Additional tables per object type containing a column referencing the unique identifier as foreign key. Each object type specific table has it's own primary key scope.

Both approaches would allow to provide a generic API like the FB API mentioned above. The second approach would allow to use object table specific primary keys internally and to expose the globally unique identifier only. However, if a global unique identifier might be used internally, the second approach would require a join as well.

Are there any experiences regarding pros/cons of a globally unique identifier and what are the best practices for implementing it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

〃安静 2024-10-28 06:22:16

您提出的两种实现全局标识符的方法都涉及大表的联接以及数据库中记录数量的有效加倍(每个对象都单独存在,但其父级/记录及其全局 ID 也存在)。

我有一种感觉,在应用程序/数据访问层中强制执行全局 ID 会更好。
这可以通过强制每个特定类型的对象的 ID 仅来自可能 ID 的子集来轻松完成。例如,您可以保留所有 ID 的最后/前 x 位来指定对象类型。 ID 的其余部分将是“实际 ID”。

如果您担心在为特定表分配 ID 时出现错误,您可以添加一个检查约束来强制 ID 正确(例如 ID < 4000 AND ID > 10000)。
如果您担心标识符中的对象类型浪费了位/字节,则可以仅在数据库访问 API 中公开全局 ID,这会将对象的 ID(实际上存储在表中)与其类型 ID 连接起来(从对象类型派生)。

Both of your proposed ways of implementing global identifier involve joins of big tables and effective doubling of the number of records in your database (each objects exists on its own but so does his parent/record with it's global ID).

I have a feeling it would be better to enforce global IDs in the application/data access layer.
This can be done trivially by enforcing that IDs for each specific type of object come only from a subset of possible IDs. You could, for instance, reserve last/first x bits of all IDs to specify object type. Remaining part of IDs, would be the "actual ID".

If you're afraid of errors while assigning IDs for spefic table, you can add a check constraint that will enforce the ID is correct (e.g. ID < 4000 AND ID > 10000).
If you are concerned for the bits/bytes wasted for the type of object in its identifier, you could expose the global ID only in your database access API, which would concatenate objects' ID (actually stored in a table) with their type IDs (derived from the object type).

蓝颜夕 2024-10-28 06:22:16

“问题表述得好,问题就已经解决了一半”。

在我看来,你正在混合几个概念。您检查了其他数据库应用程序,但似乎您变得更加困惑而不是更加了解。

您有多个不同类的对象,并且您想知道如何将它们存储在数据库中。这通常被称为对象关系映射(ORM)的“奇特名称”。

此外,您希望使用全局唯一标识符 (GUID) 将对象标识为业务/编程对象和表中的行。

此外,您还希望使用 GUID 来标识特定类型的类或对象。

假设您正在构建一个应用程序。你有几个对象的地方。对象有多个类别,例如“用户”、“事件”、“页面”等。您可以拥有多个同一类/类型的对象,但您需要一种方法来识别一个对象。识别来自密歇根州的“John Doe”和来自昆士兰州的“John Doe”。假设您的对象将使用 GUID 类型的属性

,那么我们假设您为每个类创建一个表(“user”代表“Users”,表标准 id. 是单数且小写,尽管您可能会忽略它,“event ”代表“事件”,等等)。每个表都有几个字段,表示每个对象的属性。因此,“user”将具有类似“user_key GUID”的字段,可能还有“user_name varchar(100)”和“user_birthdate datetime”。其他表也同样如此。

我使用过“supertable”,但仅用于非常具体的而不是常见的应用程序。我认为你不需要一个混合“用户”、“事件”、“页面”的表。我有一个案例,我们有一个超级表“客户”,加上带有特定附加字段的“公司”和“人员”子表。有时,我们必须检查所有客户的销售情况,并与“客户”表进行连接。有时,我们必须为产品提供企业折扣,并浏览“公司”子表。

如果您想要这个泛化/“IS a”超表,则不需要为超表主键和明细表主键设置不同的字段,可以是相同的类型。

我建议不惜一切代价避免使用复合/复合键(“主键”加“其他”字段),使用单个字段主键。我还建议使用编程来分配 GUID 密钥,而不是在数据库中。

GUID 比整数密钥使用更多的内存和磁盘空间,但是,它非常快速且容易地获取很难复制的密钥。

同样,您的问题更多的是如何表示数据库中的对象,而不是 GUID 的使用

"A problem well stated, is a problem already half solved".

Seems to me that you are mixing several concepts. You check other database apps., but it seems you got more confused instead of more informed.

You have several objects of different classes, and you want to know how to store them in a database. This is usually called by the "fancy name" of the Object Relational Mapping (O.R.M.).

Additionally, you want to use a Global Unique Identifier (G.U.I.D.) to identify an object both as Business / Programming Object and a row in a table.

Additionally, you also, want to use a G.U.I.D. to identify a class or object of certain type.

Let's say you are building an app. where you have several objects. There are several classes of the objects such as "Users", "Events", "Pages", and others. You can have several objects of the same class / type, but you need a way to identify one from another. To identify "John Doe" from Michigan, from "John Doe" form Queensland. Lets say that your objects are going to use a property of type G.U.I.D.

So let's suppouse you create a table for each class ("user" for "Users", table standard id. is singular and lowercase, altought you may ignore it, "event" for "events", and so on). Each table has several fields that represent the properties of the each object. So "user" will have a field like "user_key GUID", and maybe "user_name varchar(100)", and "user_birthdate datetime". The same goes for the other tables.

I have used "supertable" but for only a very specific, not common apps. I don't think you need a table that mixes "users", "events", "pages". I had a case where we had a supertable "customers", plus "company" and "person" subtables with specific additional fields. Sometime, we had to check sales for all customers, and make joins with the "customers" table. Sometimes, we had to offer a corporate discount for products, and browse the "company" subtable.

In case you want this Generalization / "IS a" supertable, you don't need to have a different field for the supertable primary key and detail table primary key, can be of the same type.

I suggest avoid at all cost use composite / compound keys ("master key" plus "other" fields), use a single field primary key. I also suggest that assign the G.U.I.D. key using programming, not in the database.

The G.U.I.D. uses more memory and disk space, than an integer key, but, its very fast and easy to get a key that is very difficult to be duplicated.

Again, you question is more of how to represent objects in a database, than the usage of G.U.I.D.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文