使用票证服务器生成主 ID?
我正在 Java 和 Cassandra 之上构建一个分布式应用程序。生成独特的连续 32 位和64 位 ID,类似于使用 Flickr 的票证服务器 生成主 ID,是一个好的服务器吗?我对此感到特别兴奋,因为它可以帮助我根据需要将 ID 的大小减少到 32 位或 64 位,否则使用 UUID 可能会达到 128 位。我不希望这些 ID 完全连续,但至少要增加!
然而,使用单个数据库服务器可能会引入单点故障,而 Cassandra 已消除了这一点。然而,这对于我们应用程序的初始阶段来说可能没问题。稍后我们可能会引入两台服务器来缓解这些问题。
这听起来是个好策略吗?简而言之,我们将 MYSQL 和 Cassandra 混合在一个应用程序中。我知道,如果 mySQL 由于某种原因停机,那么我们就不能单独使用 Cassandra。
我们已经考虑了其他解决方案,例如雪花,但它并不完全符合我们的要求。
编辑:我正在寻求建议,了解使用 MySQL 生成唯一主 ID 来对存储在 Cassandra 数据库中的数据/实体进行键控是否是一个好方法。 Flickr 票证服务器等方法有哪些缺点(如果有的话)?
I am building a distributed application on top of Java and Cassandra. To generate unique sequential 32bit & 64 bit IDs, is an approach like using Flickr's ticket servers to generate primary IDs, a good one? I am particularly excited about this as it can help me reduce the size of the IDs to 32 bits or 64 bits as required, which otherwise may go up to 128 bits with UUIDs. I do not want these IDs to be perfectly sequential, but increasing at least!
Using a single database server may however introduce a single point of failure that was eliminated by Cassandra. However this may be OK for the initial stage of our application. Later we may introduce two servers to alleviate those problems.
Does this sound like a good strategy? In short, we are mixing MYSQL and Cassandra in one application. I know, if mySQL is down for some reason then we can't go ahead with Cassandra alone.
We have looked to other solutions like snowflake however it did not perfectly matched our requirements.
EDIT : I am seeking advice on whether using MySQL for generating unique primary IDs to key the data/ entities stored in Cassandra database is a good approach. What are the downsides, if any, of an approach like Flickr's Ticket servers?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不太热衷于尝试给代理键赋予意义(如果你希望它们随着时间的推移而增加,你就会尝试这样做)。正如您所看到的,它使生成密钥的问题变得更加复杂。假设您希望键随着时间的推移而增加,以便可以对数据进行排序,为什么不包含创建对象的时间戳并将其存储在数据存储中呢?这显着简化了密钥生成,并允许您使用随着时间的推移而增加的密钥做几乎所有可以做的事情,额外的好处是,无论谁必须维护您的代码,对象应该如何排序都非常清楚。
I'm not a big fan of trying to attach meaning to surrogate keys (which you're trying to do if you want them to increase over time). As you're seeing, it makes your problem of generating keys more complicated. Assuming that you want the keys to increase over time simply so that you can sort data, why not include a timestamp of when the object was created and store that in your data store? This simplifies the key generation significantly and allows you to do pretty much everything you could do with keys that increase over time, with the added bonus of the fact it will be crystal clear to whoever has to maintain your code how objects should be sorted.
一般来说,“始终增加”和“无单点故障且无复杂同步”是不可能同时实现的。
如果您希望拥有多个 ID 生成器,并且不必在每个新 ID 上互相询问,那么每个 ID 生成器确实需要一个单独的 ID 池。
您链接的文章中提到了一个非常简单的示例,其中一台服务器创建奇数,而另一台服务器创建偶数。 (您可以轻松地将其扩展到更多服务器)。当然,那么您无法确定一台服务器不会先于另一台服务器运行,这可能会导致非递增序列,例如 111、120、113、122、115、124...
如果您只想“大致增加”,您可以实现一种方案,其中每个服务器以一定的间隔(例如每分钟或每 10000 个 ID)告诉其他服务器他当前的 ID,然后另一个服务器跳转自己的 ID(仅向前),如果他退缩得太远了。这应该以不中断 ID 生成的方式完成,以便在其他服务器关闭时保持鲁棒性。
啊,对于“末尾的空闲位”,只需将您的 ID 乘以某个
数字
(每次都相同,如果您确实想要“空闲位”而不仅仅是“,则为 2 的幂”)数据空间”),然后添加您的数据(应小于number
)。但当然,你会更早地用完 ID 空间(按number
系数)。In general, you can't have both "always increasing" and "no SPOF and no complex synchronization".
If you want to have several ID-generators which do not have to ask each other on every new ID, each of them really need a separate ID-pool.
A really simple example is mentioned in the article linked by you, where one server creates odd ones while the other one create even ones. (You can expand this to more servers trivially). Of course, then you can't be sure that one server doesn't run ahead of the other, which can lead to a non-increasing sequence like 111, 120, 113, 122, 115, 124 ...
If you only want "roughly increasing", you can implement a scheme where each server in some intervals (like each minute or each 10000 IDs) tells the other one(s) his current ID, and the other one then jumps its own ID (only forward) if he hangs back too far. This should be done in a way which does not interrupt the ID-generation, for robustness if the other server is down.
Ah, for the "free bits at the end", simply multiply your ID by some
number
(the same one each time, and a power of two if you really want "free bits" and not only "space for data"), then add your data (which should be less thannumber
). But of course then you'll run out of ID space quite a bit earlier (by factornumber
).