在分布式并发环境中生成唯一序列号时有哪些权衡？

发布于 2024-09-08 18:38:01 字数 1291 浏览 3 评论 0原文

我很好奇在分布式并发环境中生成唯一序列号的限制和权衡。

想象一下：我有一个系统，它所做的只是在您每次询问时返回一个唯一序列号。这是此类系统的理想规格（约束）：

在高负载下保持正常状态。
允许尽可能多的并发连接。
分布式：将负载分散到多台机器上。
性能：尽可能快地运行并具有尽可能多的吞吐量。
正确性：生成的数字必须：
1. 不再重复。
2. 每个请求都是唯一的（如果任意两个请求同时发生，则必须有办法打破联系）。
3. 按（递增）顺序。
4. 请求之间没有间隙：1,2,3,4...（实际上是总共 # 个请求的计数器）
容错：如果一台或多台或所有计算机出现故障，它可以恢复到故障之前的状态。

显然，这是一个理想化的规范，并不能完全满足所有约束。请参阅CAP 定理。不过，我很想听听您对各种放松限制的分析。我们将留下什么类型的问题以及我们将使用什么算法来解决剩余的问题。例如，如果我们摆脱计数器约束，那么问题就会变得容易得多：由于允许间隙，我们可以对数字范围进行分区并将它们映射到不同的机器上。

欢迎任何参考文献（论文、书籍、代码）。我还想保留现有软件的列表（开源与否）。

软件：

Snowflake：一种用于大规模生成唯一 ID 号的网络服务有一些简单的保证。
keyspace：一个可公开访问的、唯一的 128 位 ID 生成器，其 ID 可用于任何目的
许多语言都存在 RFC-4122 实现。 RFC 规范可能是一个非常好的基础，因为它不需要任何系统间协调，UUID 是 128 位，并且当使用实现特定版本规范的软件中的 ID 时，它们包含一个时间代码部分，使得可以排序等

原文

I am curious about the contraints and tradeoffs for generating unique sequence numbers in a distributed and concurrent environment.

Imagine this: I have a system where all it does is give back an unique sequence number every time you ask it. Here is an ideal spec for such a system (constraints):

Stay up under high-load.
Allow as many concurrent connections as possible.
Distributed: spread load across multiple machines.
Performance: run as fast as possible and have as much throughput as possible.
Correctness: numbers generated must:
1. not repeat.
2. be unique per request (must have a way break ties if any two request happens at the exact same time).
3. in (increasing) sequential order.
4. have no gaps between requests: 1,2,3,4... (effectively a counter for total # requests)
Fault tolerant: if one or more, or all machines went down, it could resume to the state before failure.

Obviously, this is an idealized spec and not all constraints can be satisfied fully. See CAP Theorem. However, I would love to hear your analysis on various relaxation of the constraints. What type of problems will we left with and what algorithms would we use to solve the remaining problems. For example, if we rid of the counter constraint, then the problem becomes much easier: since gaps are allowed, we can just partition the numeric ranges and map them onto different machines.

Any references (papers, books, code) are welcome. I'd also like to keep a list of existing software (open source or not).

Software:

Snowflake: a network service for generating unique ID numbers at high scale with some simple guarantees.
keyspace: a publicly accessible, unique 128-bit ID generator, whose IDs can be used for any purpose
RFC-4122 implementations exist in many languages. The RFC spec is probably a really good base, as it prevents the need for any inter-system coordination, the UUIDs are 128-bit, and when using IDs from software implementing certain versions of the spec, they include a time code portion that makes sorting possible, etc.

分享到QQ

分享到微博