如何使用 UUID 但保持与现有数据库 ID 兼容?
我们目前正在为我们的产品设计一个 API。 API 提供对由类型(例如用户、帖子等)之间的关系组成的图表的访问。
现在,这些对象由我们数据库中的主键 id 引用,但由于每个对象都位于不同的表中,因此这些 id 会在没有对象类型(=表)信息的情况下发生冲突。
这看起来可能不是问题,但对我们来说确实是个问题——API 设计与错误位置的 ID 冲突/类型信息更加不一致。
现在出现了使用 UUID 的想法,而且由于我们将来可能会从 SQL 数据库转向 K/V 存储,这可能不是最糟糕的想法,而且 UUID 提供了更好的唯一性,并且可以更好地扩展。因此,从各个角度来看,在我们的 API 中实现 UUID 并不是最糟糕的事情。
但是,在过渡期间,我们仍然需要通过 DB id 访问对象,并且 UUID 应该从 id 生成,并允许从 UUID 推断 id,反之亦然。
我想到了类似 550e8400-e29b-11d4-YYYY-XXXXXXXXXXXX 的内容,其中 X 是数据库中的主键,YYYY 是对象类型的代码。
有没有“正确”的方法来做到这一点?我可以用这种方法破坏任何东西吗?全部或部分保存额外的 UUID 信息并不是一个真正的选择。
感谢您的洞察力, 菲利普
we are currently designing an API for our product. The API offers access to a graph consisting of relations between types, such as users, posts or the like.
Now, those objects are referenced by primary key id in our database, but since every object is in a different table, those ids collide without the information of the type of object (=table).
This might not seem to be a problem but it really is for us - the API design gets much more inconsistent with those ID collissions/type informations in the wrong place.
Now the idea of using UUIDs came up, and since we probably will move away from an SQL db to K/V store in the future, that might not be the worst idea, also UUIDs offer far better uniqueness and also would scale better. So implementing UUIDs in our API wouldn't be the worst thing ever from various perspectives.
However, for the transition period, we still need to access the objects by DB id, and the UUID should be generated from an id and allow to infer the id from the UUID vice versa.
Something like 550e8400-e29b-11d4-YYYY-XXXXXXXXXXXX came to mind, where X would be the primary key from the DB, and YYYY would be a code for the type of object.
Is there a "right" way to do this? Can i break anything with this approach? Saving additional UUID information in whole or part is not really an option.
Thanks for your insight,
Philip
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我想到了两件事:
创建一个序列,并使用它来填充所有主键。这不是真正的 UUID,但在当前系统中是唯一的。 (即,在一种情况下您会没事,但与其他系统共享数据可能会出现问题)
创建一个映射表。类似于 EXTERNAL_KEY 与内部_id、内部_表、外部_id。使用每个表上的 ON INSERT 触发器中的一行填充此内容,并使用 external_id 作为应用程序中的 UUID。
一句 - 我强烈建议不要创建带有某些值的子字符串等的“智能”密钥。
顺便说
two things come to mind:
create a single sequence, and use that to populate all primary keys. This will not be a true UUID, but will be unique within the current system. (i.e. you will be fine in one instance, but sharing data may be an issue with other systems)
create a map table. something like EXTERNAL_KEY with internal_id, internal_table, external_id. Populate this with a row from ON INSERT triggers on every table, and use the external_id as the UUID in your application.
btw - i would advise strongly against creating an 'intelligent' key with substrings for certain values etc..
hth
根据定义,代理键与数据本身无关。因此,无论您使用的是整数还是 guid,代理键的假设是无法仅根据键来确定有关实体本身的任何信息。正因为如此,除了代理主键之外,您还必须对表中的其他内容具有唯一约束。因此,我不会尝试将任何信息烘焙到新的代理键模式中。相反,我会让引导“唯一”并像整数键一样使用它们。反对将信息写入指南的理由是简单性和成本效益。我怀疑你是否会使用嵌入到指南中的魔法,但这会让生成它们变得很痛苦。
现在,通过“唯一”,我暗示了指南的一个问题:它们的索引不好。虽然它们的独特性是一个优势,但在这方面也是一个弱点。常见的解决方案是使用所谓的 COMB guid,它将部分 guid 替换为日期时间值。该 guid 大部分仍然是唯一的,但其中一部分现在是连续的并且可以很好地索引。
By definition, a surrogate key has no relation to the data itself. So, whether you are using an integer or a guid, the presumption with a surrogate key is that there is no means to determine anything about the entity itself solely from the key. It is because of this, that you must have a unique constraint on something else in the table beyond the surrogate primary key. Thus, I would not try to bake any information into a new surrogate key schema. Instead, I would make the guids "unique" and use them just like the integer key. The argument against trying to bake information into the guid is simplicity and thereby cost effectiveness. I doubt that you will ever use the magic baked into the guid but it will make it a pain to generate them.
Now, by "unique" I am hinting at a problem with guids: they do not index well. While their uniqueness is an advantage, it is also a weakness in this regard. The common solution is to use what is called a COMB guid which replaces part of the guid with a datetime value. The guid is still mostly unique but a portion of it is now sequential and will index well.
我不知道有什么具体问题,但一个建议是使用 版本4 个 GUID:
您可以将数据放入“随机”部分。只是声称你有一个非常垃圾的随机数生成器。您还可以包含一个校验和来防止数据损坏冲突,因为您的 GUID 在数字上彼此接近。
I don't know of any specific problems, but one recommendation would be to use version 4 GUIDs:
You could put your data in the "random" sections. Just claim that you have a really rubbish random number generator. You could also include a checksum to protect against data-corruption collisions, since your GUIDs will be near each other numerically.
也许我有点过于简单化了,但是如果您愿意更改正在传递的标识符的数据类型,如何创建一个计算列,输入保存 ServerName/Instance.db_name.schema_name.table_name 的 varchar (?) 。ID。保证唯一性,因为服务器名称/实例对于您的环境是唯一的,数据库对于您的实例是唯一的,架构对于您的数据库是唯一的,表对于您的架构是唯一的,并且 id 对于您的表来说是唯一的 - 它们都应该是。
对于此类用途,我对数据的非规范化没有任何问题。我遇到的一个大问题是任何人使用这样的字段来提出问题 - 在代码中 - 例如,给我来自 Server02 的所有记录。这将是一个字符串字段,并且不会那么可搜索,就像 GUID 一样,但是,根据它们在原始表之外的存储位置/方式,我怀疑这不会成为索引碎片整理的原因他们所属的。
有关 sql server 中的“有序”向导,请参阅创建和修改标识符列
创建表 dbo.Globally_Unique_Data
(guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL,
员工姓名 varchar(60)
CONSTRAINT Guid_PK PRIMARY KEY (guid) );
Maybe I'm being a bit too simplistic, but if you are willing to change the data type of the identifier being passed around, hows about creating a computed column, type varchar ( ? ) that holds ServerName/Instance.db_name.schema_name.table_name.id. Uniqueness is guaranteed given servername/Instance is unique to your environment, db to your instance, schema to your db, tble to your schema and id to your table - which they all should be.
I don't have ANY problem with de-normalizing data for this sort of use. What I have a huge problem with is anyone using a field such as this to ask questions - in code - such as, give me all of the records that came from Server02. This would be a string field and won't be so searchable, much like a GUID, but, depending on where / how they are being stored, outside of their originating table, would, I suspect be less of a cause of defragmentation in indexes to which they belong.
for the 'ordered' guids in sql server see Creating and Modifying Identifier Columns
CREATE TABLE dbo.Globally_Unique_Data
(guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL,
Employee_Name varchar(60)
CONSTRAINT Guid_PK PRIMARY KEY (guid) );