分布式主键 - UUID、简单自动增量还是自定义顺序值?
我知道以前曾有人问过此类问题,但我找不到一个可以比较我想到的选项的问题。所以我打算把它们贴在这里,如果有重复请贴链接。
这篇文章已经很长了,如果你有时间请仔细阅读,因为问题在最后 em>
EDIT2:我已经接受了答案,因为我认为这将是目前最好的解决方案。但我想我想回答另外两个问题来回答我有关连接数字的查询。它们可以在这里找到:组合两个整数来创建一个唯一的数字 & 在 C# 中连接整数。如果我要尝试对数字进行编码(如下所示,如 51122222),我认为这会很有用。尽管对于我的小型应用程序来说,在 C# 中使用 String.Format 之类的东西可能已经足够快了。
我目前正在尝试找到一种方法来设置使用相同数据库架构并且可以同步的分布式应用程序也许有一个主数据库,所有其他数据库也与之同步。
我目前计划的程序将从一个相当简单的程序开始,用于跟踪信息。第一个版本可能包含两个表:Items 和 ItemHistory。这是可能字段的示例:
项目
物品 ID(PK) ?
名称字符串
内容字符串
ItemHistory
项目历史记录ID (PK) ?
物品 ID (FK) ?
事件名称字符串
CreatedOn DateTime
我已经列出了字段名称和类型,这个问题是关于 PK 类型使用什么,因此它们丢失了。
第一个版本将是一个标准的桌面应用程序,我目前计划使用带有 WPF 前端的 C# 和 SQLite 作为数据库。最终我还想创建一个也可以在我的 Android 手机上运行的版本。这就是分布式部分的用武之地。我并不总是有信号,因此需要应用程序离线运行并在再次在线时允许同步。
以下是我到目前为止关于如何处理 ID 的想法:
- 使用 UUID 作为 ID,这样就不会出现合并冲突
- 使用自动增量字段并以一定的增量设置应用程序每个版本的起始编号,例如 1对于第一个应用程序,第二个应用程序为 10000,第三个应用程序为 20000 等
- 使用带有偏移值的自动增量字段以避免数字之间没有大间隙的冲突(mysql 有 auto_increment_increment 和auto_increment_offset 为此)
- 生成我自己的 ID,对每个数据库的 ID 进行编码,以便它们可以拥有自己的自动增量值并且不会导致冲突。我发现其他人也有同样的想法: 什么数据type 是否建议用于 ID 列?
虽然选项 1 可行并且我过去曾使用过它,但我想看看其他选项是否可以避免 UUID 问题。我希望有一个在调试时更易于阅读并且可排序的解决方案。
选项 2 可行,但它确实对记录数量施加了限制。我知道在我的小型应用程序中它几乎永远不会超过那么多,但我想尝试看看是否有一个解决方案不需要这样的限制。选项 3 通过使用交替数字来避免限制,但我认为您需要知道要使用多少个数据库,否则您可能会填写所有数字。在 DB1 上使用从 1 开始、增量为 1 的方式,在 DB2 上使用从 2 开始、增量为 2 的方式将交替使用每个数字。您可以使用 50 作为增量,但这样您就会有另一个限制,但现在它取决于可以使用它的应用程序的数量。我再次知道,在我的情况下,它不会受到限制,但在突然变得非常流行的应用程序中可能会成为问题。
选项 4 似乎可以解决我的问题,但我不确定它在实践中是否有效。我的一个想法是允许在每个应用程序上设置一个前缀,然后可以将其与自动递增值一起使用。例如,PC1、PC2 用于 PC 上的记录,PHONE1、PHONE2 等用于来自 Android 的记录。这可行,但在字符串中使用数字会导致排序问题,1、11、100 彼此相邻显示,即使用较少的前导零,然后再次回到有限数量的记录。
我想知道是否可以使用数字作为数据库 ID 和自动增量。例如,PC = 1 且 PHONE = 2。那么 PC 上有 11、12、13 等,其中第 11 条记录可能为 111,PHONE 上的第 304 条记录可能为 2304。但我不知道这将如何完成,或者是否可以轻松完成并且不会导致生成值的额外开销。
在工作中,他们使用了类似的编号系统,他们使用类似 51122222.5 的内容来引用应用程序的实例,然后是 2 位数的年份,最后是自动递增的数字。我还没有得到明确的答案,如果我们一年内超过 99999 条记录会发生什么。我认为他们可能认为这不会发生,并且很高兴他们计算了风险。
所以最后一个问题是,有没有一种方法可以为分布式应用程序创建一个主键系统,允许排序并且不强制限制(除了数据类型本身的大小,例如最大整数)?
编辑:这是有关我计划编写的应用程序的更多信息。我想创建一些东西,让我可以存储我可能获得的任何类型的信息,该系统将包括标记条目的功能,以便我可以搜索某个主题。到目前为止,我看到的信息类型可能是书籍、DVD、网站等的推荐。或者可能是我居住的地方的本地提示。一个总体想法是停止将这些信息以不同的格式分布在多台计算机/笔记本电脑/手机上。
I know this type of question has been asked before, but I could not find one that compared the options I have in mind. So I am going to post them here, please post links if there are duplicates.
This has ended up a rather long post, if you have time please read it through as the question is at the end
EDIT2: I've accepted an answer as I think that will be the best solution for now. But I thought I would like to two other questions that answer my query about concatenating numbers. They can be found here: Combine two integers to create a unique number & Concatenate integers in C#. If I was going to try encoding the number (as below like 51122222) I think this would be useful. Though maybe just using something like String.Format in c# would be fast enough for my small application.
I'm currently trying to find a way to setup distributed applications that use the same database schema and can synchronise with maybe one master database that all others also sync with.
The program I am planning currently will start as a fairly simple program to track information. The first version might contain two tables: Items and ItemHistory. This is an example of possible fields:
Items
ItemID(PK) ?
Name String
Content String
ItemHistory
ItemHistoryID (PK) ?
ItemID (FK) ?
EventName String
CreatedOn DateTime
I've listed the field name and type, this question is about what to use for the PK types so they are missing.
The first version will be a standard desktop app, I'm currently planning on using C# with a WPF front end and SQLite for the database. Eventually I also want to create a version to run on my Android phone as well. This is where the distributed part comes in. I don't always have a signal so will need the app to run offline and allow synchronisation when online again.
Here are the ideas I have so far on how to deal with the ID's:
- Use a UUID for the IDs so there are no merge conflicts
- Use a auto increment field and set the starting number on each version of the app in some increment, e.g. 1 for first app, 10000 for second, 20000 for third etc
- Use a auto increment field with an offset value to avoid conflicts without the large gaps between numbers (mysql has auto_increment_increment and auto_increment_offset for this)
- Generate my own ID that encodes an ID for each database so they can have their own auto increment value and not cause a conflict. I found someone else that had the same idea: What data type is recommended for ID columns?
While option 1 would work and I have used it in the past I want to look at the possibility of other options to avoid the issues with UUIDs. I would like to have a solution that is easier to read while debugging and is sortable.
Option 2 would work but it does force a limit on the number of records. I know in my small application it will almost never go over that many but I would like to try and see if there's a solution that does not require such a limit. Option 3 avoids the limit by using alternating numbers, but I think you would need to know how many database are to be used or you might fill all numbers otherwise. Using a start of 1 and increment of 1 on DB1 and start of 2 and increment of 2 on DB2 would use every number alternatively. You could use 50 as the increment but then you just have another limit but now its on the number of applications that can use it. Again I know its a limit that is not going to be hit in my situation but could be an issue in an application that suddenly becomes very popular.
Option 4 seems like it could solve the issue for me, but I'm not sure if it would work in practice or not. One idea I had was to allow a prefix to be set on each application then that could be used with an auto incrementing value. e.g. PC1, PC2 for records on a pc and maybe PHONE1, PHONE2 etc for records from the Android. This would work but using numbers in strings causes the sorting issue with 1, 11, 100 showing up next to each other, that is in less leading zeros are used and then its back to a limited number of records again.
I have wondered if it would be possible to use a number for the DB ID and the auto increment. e.g PC = 1 and PHONE = 2. then we have 11, 12, 13 etc for the PC with maybe 111 for the 11th record and 2304 for the 304th record on PHONE. But I don't know how this would be done or if it can easily be done and not cause excess overheads for generating values.
At work they have used a similar numbering system, they use something like this 51122222. 5 would refer to the instance of the application, then its a 2 digit year and finally a auto incrementing number. I've not got a clear answer yet what happens if we go over 99999 records in a year. I think they might have figured that its not going to happen and are happy they have calculated the risk.
So finally a question, is there a way to create a primary key system for a distributed application that allows for sorting and does not enforce limits (besides the size of the data type itself e.g. max integer)?
EDIT: Here's a little bit more info on the app I plan to write. I want to create something that will let me store just about any type of information that I might gain, the system will include the ability to tag the entries so I can search on a topic. Types of information I see so far could be recommendations on books, dvds, websites etc. Or maybe local tips for the place I'm living. One overall idea is to stop keeping these bits of information spread across multiple computers/laptops/phones in different formats.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从广义上讲,有两种方法。
您使用顺序值。这些可以分为几组,交错的,等等。它们是最有效的方法,但需要协作和协调。
您使用随机值(包括 UID)。这些更简单,但需要更多空间。从“生日冲突”中我们知道,如果您需要存储 N 个值,则必须从(超过)N*N 的范围中选择一个随机密钥 - http://en.wikipedia.org/wiki/Birthday_problem。向后推算,如果用作随机密钥,64 位整数可以容纳大约 32 位数据,即大约 40 亿个值。但这是针对 50% 碰撞的概率而言的。您想要更低的概率,因此实际限制约为 1000 万个条目。
因此,简单来说,如果您有 64 位密钥,则随机方法适用于大约 1000 万个条目,而顺序方法适用于更多条目。无论哪种情况,这可能都超出您的需要。
如果您有 32 位密钥,则随机方法适用于大约 1000 个值(顺序方法适用于大约 40 亿个值,如上所述)。
显然,如果你有一个文本值,那么你需要相应地修改它,但 UUID 被设计为具有“足够”的值无论如何 http://en.wikipedia.org/wiki/Universally_unique_identifier
通常数据库会提供一个连续的 ID,这就是您所需要的。如果没有,64 位随机方法通常是最简单的并且值得额外的空间。
in broad terms, there are two approaches.
you use sequential values. these may be divided up into groups, interleaved, whatever. they are the most efficient approach, but require collaboration and coordination.
you use random values (this includes UIDs). these are much simpler but require more space. from "birthday collisions" we know that you if you need to store N values then a random key must be chosen from (more than) a range of N*N - http://en.wikipedia.org/wiki/Birthday_problem. working backwards, a 64 bit integer can hold about 32 bits of data if used as a random key - that's about 4 billion values. but that's for a probability of 50% collisions. you want a much lower probability, so a practical limit is around 10 million entries.
so, in simple terms, if you have a 64 bit key, a random approach would work for around 10 million entries a sequential approach for many more. in either case, that is probably more than you need.
if you have a 32 bit key then a random approach works for around a thousand values (a sequential approach goes to about 4 billion, as above).
obviously if you have a text value then you need to modify this accordingly, but UUIDs are designed to have "enough" values anyway http://en.wikipedia.org/wiki/Universally_unique_identifier
typically a database will provide a sequential ID and that is all you need. if not, the 64 bit random approach is usually simplest and worth the extra space.