琐碎的任务 - 复杂的解决方案？

发布于 2024-12-12 02:44:12 字数 597 浏览 6 评论 0 原文

有一个小问题：

将 uniqueidentifier 分配给任何 externalId
分配后不要覆盖 uniqueidentifier - 只需返回现有的 uniqueidentifier

想象一个表

  ExternalId | Guid
--------------------------------
   some1     | accf-0334-dfdf-....

现在，问题在于规模。我们希望像这样映射数十亿个 externalId，并且我们需要能够快速分配这些标识符（数千/秒）。

我们从一个简单的 SQL Server 表开始，但它的性能不佳。我们将相同的模式移至 Cassandra ColumnFamily - 写入速度超快且分片，但是：在写入之前我们必须读取（以确保 externalId 尚未分配），因此我们再次达到读取查找 I/O 限制。

不幸的是，散列（以确定唯一标识符）是不可能的，因为我们已经分配了数亿个。缓存是有问题的，因为在大多数情况下，我们分配一个“全新的externalId”，因此它根本不会出现在数据库中。

有人对这里的解决方案有什么建议吗？

原文

There is a trivial problem:

assign uniqueidentifier to any externalId
do not overwrite the uniqueidentifier once it is assigned - just return existing uniqueidentifier

Imagine a table

  ExternalId | Guid
--------------------------------
   some1     | accf-0334-dfdf-....

Now, the twist is the scale. We want billions of externalIds to be mapped like this and we need to be able to assign these identifiers fast (thousands/sec)

We started of with a simple SQL Server table but it was not performing well. We moved the same schema to a Cassandra ColumnFamily - the writes are super fast and its sharded but: before writing we have to read (to make sure the externalId is not assigned already) so we hit the read seek I/O limit again.

Hashing (to determine uniqueidentifier) is unfortunately not possible as we already have hundreds of millions assigned. Caching is problematic because in most cases we are assigning a 'brand new externalId' so it wouldn't be in the database at all.

Does anybody have any suggestions for the solution here?

分享到QQ

分享到微博