在分布式并发环境中生成唯一序列号时有哪些权衡?

发布于 2024-09-08 18:38:01 字数 1291 浏览 3 评论 0原文

我很好奇在分布式并发环境中生成唯一序列号的限制和权衡。

想象一下:我有一个系统,它所做的只是在您每次询问时返回一个唯一序列号。这是此类系统的理想规格(约束):

  • 在高负载下保持正常状态。
  • 允许尽可能多的并发连接。
  • 分布式:将负载分散到多台机器上。
  • 性能:尽可能快地运行并具有尽可能多的吞吐量。
  • 正确性:生成的数字必须:
    1. 不再重复。
    2. 每个请求都是唯一的(如果任意两个请求同时发生,则必须有办法打破联系)。
    3. 按(递增)顺序。
    4. 请求之间没有间隙:1,2,3,4...(实际上是总共 # 个请求的计数器)
  • 容错:如果一台或多台或所有计算机出现故障,它可以恢复到故障之前的状态。

显然,这是一个理想化的规范,并不能完全满足所有约束。请参阅CAP 定理。不过,我很想听听您对各种放松限制的分析。我们将留下什么类型的问题以及我们将使用什么算法来解决剩余的问题。例如,如果我们摆脱计数器约束,那么问题就会变得容易得多:由于允许间隙,我们可以对数字范围进行分区并将它们映射到不同的机器上。

欢迎任何参考文献(论文、书籍、代码)。我还想保留现有软件的列表(开源与否)。


软件

  • Snowflake:一种用于大规模生成唯一 ID 号的网络服务有一些简单的保证。
  • keyspace:一个可公开访问的、唯一的 128 位 ID 生成器,其 ID 可用于任何目的
  • 许多语言都存在 RFC-4122 实现。 RFC 规范可能是一个非常好的基础,因为它不需要任何系统间协调,UUID 是 128 位,并且当使用实现特定版本规范的软件中的 ID 时,它们包含一个时间代码部分,使得可以排序等

I am curious about the contraints and tradeoffs for generating unique sequence numbers in a distributed and concurrent environment.

Imagine this: I have a system where all it does is give back an unique sequence number every time you ask it. Here is an ideal spec for such a system (constraints):

  • Stay up under high-load.
  • Allow as many concurrent connections as possible.
  • Distributed: spread load across multiple machines.
  • Performance: run as fast as possible and have as much throughput as possible.
  • Correctness: numbers generated must:
    1. not repeat.
    2. be unique per request (must have a way break ties if any two request happens at the exact same time).
    3. in (increasing) sequential order.
    4. have no gaps between requests: 1,2,3,4... (effectively a counter for total # requests)
  • Fault tolerant: if one or more, or all machines went down, it could resume to the state before failure.

Obviously, this is an idealized spec and not all constraints can be satisfied fully. See CAP Theorem. However, I would love to hear your analysis on various relaxation of the constraints. What type of problems will we left with and what algorithms would we use to solve the remaining problems. For example, if we rid of the counter constraint, then the problem becomes much easier: since gaps are allowed, we can just partition the numeric ranges and map them onto different machines.

Any references (papers, books, code) are welcome. I'd also like to keep a list of existing software (open source or not).


Software:

  • Snowflake: a network service for generating unique ID numbers at high scale with some simple guarantees.
  • keyspace: a publicly accessible, unique 128-bit ID generator, whose IDs can be used for any purpose
  • RFC-4122 implementations exist in many languages. The RFC spec is probably a really good base, as it prevents the need for any inter-system coordination, the UUIDs are 128-bit, and when using IDs from software implementing certain versions of the spec, they include a time code portion that makes sorting possible, etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

苏别ゝ 2024-09-15 18:38:01

如果您必须按顺序(每台计算机)但可以放弃间隙/计数器要求,请查找 RFC 4122

如果您使用 .NET 并且可以消除顺序和间隙/计数器要求,则只需使用 System.Guid。它们实现了 RFC 4122 版本 4,并且在机器和请求之间已经是唯一的(冲突概率非常低)。这可以很容易地实现为网络服务或仅在本地使用。

If you must be sequential (per machine) but can drop the gap/counter requirments look for an implementation of the Version 1 UUID as specified in RFC 4122.

If you're working in .NET and can eliminate the sequential and gap/counter requirements, just use System.Guids. They implement RFC 4122 Version 4 and are already unique (very low collision probability) across machines and requests. This could be easily implemented as a web service or just used locally.

迷爱 2024-09-15 18:38:01

这是一种可以满足所有要求的方法的高级想法,尽管有一个可能不符合许多用例的重要警告。

如果您可以容忍有两个序列号 - 立即返回逻辑序列号;保证唯一且有序,但有间隙 - 并且一个单独的物理系统保证按顺序排列,没有间隙并且稍后可用 - 那么解决方案似乎很简单:

  • 一个分布式系统,可以提供高分辨率时钟 + 机器 ID 作为逻辑序列号
  • 将所有逻辑序列号流式传输到一个单独的分布式系统中,该系统对逻辑序列号进行排序并将它们映射到物理序列号。

一旦第二个系统完成处理,从逻辑到物理的映射就可以按需发生。

Here's a high-level idea for an approach that may fulfill all the requirements, albeit with a significant caveat that may not match many use cases.

If you can tolerate having two sequence numbers - a logical one returned immediately; guaranteed unique and ordered but with gaps - and a separate physical one guaranteed to be in sequential order with no gaps and available a short while later - then the solution seems straightforward:

  • One distributed system that can serve up a high resolution clock + machine id as the logical sequence number
  • Stream all the logical sequence numbers into a separate distributed system that orders the logical sequence numbers and maps them to the physical sequence numbers.

The mapping from logical to physical can happen on-demand as soon as the second system is done with processing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文