分布式序列号生成?
过去我通常使用数据库序列来实现序列号生成。
例如使用 Postgres SERIAL 类型 http://www.neilconway.org/docs/sequences/
我我很好奇如何为没有数据库的大型分布式系统生成序列号。是否有人对以线程安全方式为多个客户端实现序列号生成的最佳实践有任何经验或建议?
I've generally implemented sequence number generation using database sequences in the past.
e.g. Using Postgres SERIAL type http://www.neilconway.org/docs/sequences/
I'm curious though as how to generate sequence numbers for large distributed systems where there is no database. Does anybody have any experience or suggestions of a best practice for achieving sequence number generation in a thread safe manner for multiple clients?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
好的,这是一个非常古老的问题,我现在第一次看到。
您需要区分序列号和唯一 ID可以(可选)按特定标准(通常是生成时间)松散排序。真正的序列号意味着了解所有其他工作人员所做的事情,因此需要共享状态。没有简单的方法可以以分布式、大规模的方式做到这一点。您可以研究诸如网络广播、每个工作人员的窗口范围以及分布式唯一工作 ID 的哈希表,但这需要大量工作。
唯一 ID 是另一回事,有几种以分散方式生成唯一 ID 的好方法:
a) 您可以使用 Twitter 的 Snowflake ID 网络服务。 Snowflake 是一项:
b) 您可以使用源自如何生成 UUID的方法在客户端本身上生成唯一 ID a> 和 Snowflake 的 ID 已制作完成。 有多种选项,但大致如下:
最重要的 40 位左右:时间戳; 的生成时间ID。 (我们使用时间戳的最高有效位来使 ID 可按生成时间排序。)
接下来的 14 位左右:每个生成器计数器,每个生成器都会递增该计数器每生成一个新 ID,加一。这可以确保在同一时刻(相同时间戳)生成的 ID 不会重叠。
最后 10 位左右:每个生成器的唯一值。使用此功能,我们不需要在生成器之间进行任何同步(这非常困难),因为所有生成器都生成非- 由于该值而导致 ID 重叠。
c) 您可以仅使用时间戳和随机值在客户端上生成 ID。这避免了需要了解所有生成器并为每个生成器分配唯一值的需要。另一方面,此类 ID 不能保证全局唯一,它们只是极有可能是唯一的。 (为了发生冲突,一个或多个生成器必须在完全相同的时间创建相同的随机值。)大致如下:
d) 最简单的方法是使用 UUID / GUID。
OK, this is a very old question, which I'm first seeing now.
You'll need to differentiate between sequence numbers and unique IDs that are (optionally) loosely sortable by a specific criteria (typically generation time). True sequence numbers imply knowledge of what all other workers have done, and as such require shared state. There is no easy way of doing this in a distributed, high-scale manner. You could look into things like network broadcasts, windowed ranges for each worker, and distributed hash tables for unique worker IDs, but it's a lot of work.
Unique IDs are another matter, there are several good ways of generating unique IDs in a decentralized manner:
a) You could use Twitter's Snowflake ID network service. Snowflake is a:
b) You could generate the unique IDs on the clients themselves, using an approach derived from how UUIDs and Snowflake's IDs are made. There are multiple options, but something along the lines of:
The most significant 40 or so bits: A timestamp; the generation time of the ID. (We're using the most significant bits for the timestamp to make IDs sort-able by generation time.)
The next 14 or so bits: A per-generator counter, which each generator increments by one for each new ID generated. This ensures that IDs generated at the same moment (same timestamps) do not overlap.
The last 10 or so bits: A unique value for each generator. Using this, we don't need to do any synchronization between generators (which is extremely hard), as all generators produce non-overlapping IDs because of this value.
c) You could generate the IDs on the clients, using just a timestamp and random value. This avoids the need to know all generators, and assign each generator a unique value. On the flip side, such IDs are not guaranteed to be globally unique, they're only very highly likely to be unique. (To collide, one or more generators would have to create the same random value at the exact same time.) Something along the lines of:
d) The easy way out, use UUIDs / GUIDs.
您可以让每个节点都有一个唯一的 ID(无论如何您都可能拥有),然后将其添加到序列号之前。
例如,节点 1 生成序列 001-00001 001-00002 001-00003 等,节点 5 生成 005-00001 005-00002
唯一:-)
或者,如果您想要某种集中式系统,您可以考虑拥有序列服务器分块给出。这显着减少了开销。例如,您不必为每个必须分配的 ID 向中央服务器请求新的 ID,而是向中央服务器请求 10,000 个 ID 块,然后只需在用完时再进行一次网络请求。
You could have each node have a unique ID (which you may have anyway) and then prepend that to the sequence number.
For example, node 1 generates sequence 001-00001 001-00002 001-00003 etc. and node 5 generates 005-00001 005-00002
Unique :-)
Alternately if you want some sort of a centralized system, you could consider having your sequence server give out in blocks. This reduces the overhead significantly. For example, instead of requesting a new ID from the central server for each ID that must be assigned, you request IDs in blocks of 10,000 from the central server and then only have to do another network request when you run out.
现在有更多选择。
虽然这个问题是“旧的”,但我来到这里,所以我认为保留我所知道的选项(到目前为止)可能会有用:
干杯
Now there are more options.
Though this question is "old", I got here, so I think it might be useful to leave the options I know of (so far):
Cheers
可以使用 Redisson 来完成。它实现了
AtomicLong
的分布式和可扩展版本。这是示例:It can be done with Redisson. It implements distributed and scalable version of
AtomicLong
. Here is example:如果它确实必须是全局连续的,而不仅仅是唯一的,那么我会考虑创建一个简单的服务来分配这些数字。
分布式系统依赖于大量交互的小服务,对于这种简单的任务,您真的需要或者真的会从其他复杂的分布式解决方案中受益吗?
If it really has to be globally sequential, and not simply unique, then I would consider creating a single, simple service for dispensing these numbers.
Distributed systems rely on lots of little services interacting, and for this simple kind of task, do you really need or would you really benefit from some other complex, distributed solution?
有几种策略;但据我所知,没有一个可以真正分布并给出真正的序列。
memcached
具有快速原子计数器,在绝大多数情况下,它对于整个集群来说足够快。就我个人而言,如果我想拥有一个大部分连续的空间,我会倾向于使用 UUID 或 memcached。
There are a few strategies; but none that i know can be really distributed and give a real sequence.
memcached
has a fast atomic counter, in the vast majority of cases it's fast enough for your entire cluster.personally, i'd lean to UUIDs, or memcached if i want to have a mostly-contiguous space.
为什么不使用(线程安全的)UUID 生成器?
我可能应该对此进行扩展。
UUID 保证是全局唯一的(如果您避免使用基于随机数的 UUID,那么唯一性的可能性很大)。
无论您使用多少个 UUID 生成器,每个 UUID 的全局唯一性都可以满足您的“分布式”要求。
您可以通过选择“线程安全”UUID 生成器来满足“线程安全”要求。
假设每个 UUID 的保证全局唯一性可以满足您的“序列号”要求。
请注意,许多数据库序列号实现(例如 Oracle)不保证单调递增或(甚至)递增序列号(基于每个“连接”)。这是因为在每个连接的基础上,连续批次的序列号被分配在“缓存”块中。这保证了全局唯一性并保持足够的速度。但是,当多个连接分配时,实际分配的序列号(随着时间的推移)可能会变得混乱!
Why not use a (thread safe) UUID generator?
I should probably expand on this.
UUIDs are guaranteed to be globally unique (if you avoid the ones based on random numbers, where the uniqueness is just highly probable).
Your "distributed" requirement is met, regardless of how many UUID generators you use, by the global uniqueness of each UUID.
Your "thread safe" requirement can be met by choosing "thread safe" UUID generators.
Your "sequence number" requirement is assumed to be met by the guaranteed global uniqueness of each UUID.
Note that many database sequence number implementations (e.g. Oracle) do not guarantee either monotonically increasing, or (even) increasing sequence numbers (on a per "connection" basis). This is because a consecutive batch of sequence numbers gets allocated in "cached" blocks on a per connection basis. This guarantees global uniqueness and maintains adequate speed. But the sequence numbers actually allocated (over time) can be jumbled when there are being allocated by multiple connections!
分布式ID生成可以使用Redis和Lua进行归档。 Github 中提供了实现。它生成一个分布式且可排序的唯一 ID。
Distributed ID generation can be archived with Redis and Lua. The implementation available in Github. It produces a distributed and k-sortable unique ids.
我知道这是一个老问题,但我们也面临同样的需求,但无法找到满足我们需求的解决方案。
我们的要求是获得 id 的唯一序列 (0,1,2,3...n),因此雪花没有帮助。
我们创建了自己的系统来使用 Redis 生成 id。 Redis 是单线程的,因此它的列表/队列机制总是一次给我们 1 个弹出窗口。
我们所做的是,我们创建一个 ids 缓冲区,最初,队列将有 0 到 20 个 id,准备在请求时调度。多个客户端可以请求一个 id,redis 会一次弹出 1 个 id,每次从左侧弹出后,我们都会在右侧插入 BUFFER + currentId,这会保持缓冲区列表继续运行。 此处实现
I know this is an old question but we were also facing the same need and was unable to find the solution that fulfills our need.
Our requirement was to get a unique sequence (0,1,2,3...n) of ids and hence snowflake did not help.
We created our own system to generate the ids using Redis. Redis is single threaded hence its list/queue mechanism would always give us 1 pop at a time.
What we do is, We create a buffer of ids, Initially, the queue will have 0 to 20 ids that are ready to be dispatched when requested. Multiple clients can request an id and redis will pop 1 id at a time, After every pop from left, we insert BUFFER + currentId to the right, Which keeps the buffer list going. Implementation here
我编写了一个简单的服务,可以生成半唯一的非连续 64 位长数字。它可以部署在多台机器上以实现冗余和可扩展性。它使用 ZeroMQ 进行消息传递。有关其工作原理的更多信息,请查看 github 页面:zUID
I have written a simple service which can generate semi-unique non-sequential 64 bit long numbers. It can be deployed on multiple machines for redundancy and scalability. It use ZeroMQ for messaging. For more information on how it works look at github page: zUID
使用数据库,您可以使用单核达到每秒 1.000 多个增量。这很容易。您可以使用它自己的数据库作为后端来生成该数字(因为在 DDD 术语中它应该是它自己的聚合)。
我遇到了类似的问题。我有几个分区,我想为每个分区获取一个偏移计数器。我实现了这样的事情:
然后执行以下语句:
如果您的应用程序允许您,您可以立即分配一个块(这是我的情况)。
如果您需要更高的吞吐量并且无法提前分配偏移量,您可以使用 Flink 实现自己的服务进行实时处理。每个分区我能够获得大约 100K 的增量。
希望有帮助!
Using a database you can reach 1.000+ increments per second with a single core. It is pretty easy. You can use its own database as backend to generate that number (as it should be its own aggregate, in DDD terms).
I had what seems a similar problem. I had several partitions and I wanted to get an offset counter for each one. I implemented something like this:
Then executed the following statement:
If your application allows you, you can allocate a block at once (that was my case).
If you need further throughput an cannot allocate offsets in advance you can implement your own service using Flink for real time processing. I was able to get around 100K increments per partition.
Hope it helps!
问题类似于:
在 iSCSI 世界中,每个 LUN/卷都必须能够由客户端上运行的启动器唯一识别。
iSCSI 标准规定,前几位必须代表存储提供商/制造商信息,其余部分单调递增。
类似地,可以用分布式节点系统中的起始位来表示nodeID,其余的可以单调递增。
The problem is similar to:
In iscsi world, where each luns/volumes have to be uniquely identifiable by the initiators running on the client side.
The iscsi standard says that the first few bits have to represent the Storage provider/manufacturer information, and the rest monotonically increasing.
Similarly, one can use the initial bits in the distributed system of nodes to represent the nodeID and the rest can be monotonically increasing.
一种不错的解决方案是使用基于长时间的生成。
这可以在分布式数据库的支持下完成。
One solution that is decent is to use a long time based generation.
It can be done with the backing of a distributed database.
我给 gcloud 的两分钱。使用存储文件。
作为云函数实现,可以轻松转换为库。
https://github.com/zaky/sequential-counter
My two cents for gcloud. Using storage file.
Implemented as cloud function, can easily be converted to a library.
https://github.com/zaky/sequential-counter