Azure 表存储上的自动增量

发布于 2024-08-14 09:45:41 字数 486 浏览 13 评论 0原文

我目前正在开发 Azure 表存储的应用程序。在该应用程序中,我的表的插入相对较少(几千/天),这些实体的主键将在另一个表中使用,该表将有数十亿行。

因此,我正在寻找一种使用自动递增整数而不是 GUID 作为小表中的主键的方法(因为它将节省大量存储空间,并且插入的可扩展性并不是真正的问题)。

关于该主题已经有一些讨论,例如 http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/6b7d1ece-301b-44f1-85ab-eeb274349797

但是,由于并发问题确实很难调试和发现,因此我对自己实现这一点感到有点不舒服。因此,我的问题是是否有经过充分测试的实施?

I am currently developing an application for Azure Table Storage. In that application I have table which will have relatively few inserts (a couple of thousand/day) and the primary key of these entities will be used in another table, which will have billions of rows.

Therefore I am looking for a way to use an auto-incremented integer, instead of GUID, as primary key in the small table (since it will save lots of storage and scalability of the inserts is not really an issue).

There've been some discussions on the topic, e.g. on http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/6b7d1ece-301b-44f1-85ab-eeb274349797.

However, since concurrency problems can be really hard to debug and spot, I am a bit uncomfortable with implementing this on own. My question is therefore if there is a well tested impelemntation of this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

梦醒灬来后我 2024-08-21 09:45:41

对于每个在搜索中找到它的人来说,都有一个更好的解决方案。 表锁定的最短时间为15 秒 - 太糟糕了。如果您想创建真正可扩展的解决方案,请不要使用它。使用Etag

在表中为 ID 创建一个实体(您甚至可以将其命名为 ID 或其他名称)。

1)阅读它。

2)增量。

3) InsertOrUpdate WITH ETag 指定(来自读取查询)。

如果最后一个操作 (InsertOrUpdate) 成功,那么您将拥有一个新的、唯一的、自动递增的 ID。如果失败(HttpStatusCode == 412 异常),则意味着其他客户端更改了它。因此,再次重复 1,2 和 3。
Read+InsertOrUpdate 的通常时间小于200ms。我的测试实用程序在 github 上提供源代码

For everyone who will find it in search, there is a better solution. Minimal time for table lock is 15 seconds - that's awful. Do not use it if you want to create a truly scalable solution. Use Etag!

Create one entity in table for ID (you can even name it as ID or whatever).

1) Read it.

2) Increment.

3) InsertOrUpdate WITH ETag specified (from the read query).

if last operation (InsertOrUpdate) succeeds, then you have a new, unique, auto-incremented ID. If it fails (exception with HttpStatusCode == 412), it means that some other client changed it. So, repeat again 1,2 and 3.
The usual time for Read+InsertOrUpdate is less than 200ms. My test utility with source on github.

相思故 2024-08-21 09:45:41

请参阅 Josh Twist 的 UniqueIdGenerator 类

See UniqueIdGenerator class by Josh Twist.

梦情居士 2024-08-21 09:45:41

我还没有实现这个,但正在努力......

你可以用你的下一个 id 来创建一个队列,然后在需要时将它们从队列中取出。

您需要保留一个表来包含添加到队列中的最大数字的值。如果您知道不会使用大量整数,则可以让工作人员经常醒来并确保队列中仍然有整数。您还可以有一个已使用的 int 队列,工作人员可以检查以密切关注使用情况。

您还可以挂接该工作人员,这样如果当您的代码需要一个 id(偶然)时队列为空,它可以中断该工作人员的午睡以尽快创建更多密钥。

如果该调用失败,您将需要一种方法(告诉工作人员您将为他们完成工作(锁定),然后工作人员获取下一个 id 并解锁)

  1. 锁定
  2. 获取从表
  3. 增量 创建的最后一个键并保存
  4. 解锁

然后使用新值。

I haven't implemented this yet but am working on it ...

You could seed a queue with your next ids to use, then just pick them off the queue when you need them.

You need to keep a table to contain the value of the biggest number added to the queue. If you know you won't be using a ton of the integers, you could have a worker every so often wake up and make sure the queue still has integers in it. You could also have a used int queue the worker could check to keep an eye on usage.

You could also hook that worker up so if the queue was empty when your code needed an id (by chance) it could interupt the worker's nap to create more keys asap.

If that call failed you would need a way to (tell the worker you are going to do the work for them (lock), then do the workers work of getting the next id and unlock)

  1. lock
  2. get the last key created from the table
  3. increment and save
  4. unlock

then use the new value.

爱格式化 2024-08-21 09:45:41

我发现防止重复 ID 并让您自动增量的解决方案是

  1. 锁定(租用)一个 blob 并让它充当逻辑门。

  2. 然后读取值。

  3. 写入增量值

  4. 释放租约

  5. 在您的应用/表中使用该值

,如果您的辅助角色在此过程中崩溃,那么您的商店中只会缺少一个 ID。恕我直言,这比重复的要好。

以下是代码示例和更多信息史蒂夫·马克思的这种方法

The solution I found that prevents duplicate ids and lets you autoincrement it is to

  1. lock (lease) a blob and let that act as a logical gate.

  2. Then read the value.

  3. Write the incremented value

  4. Release the lease

  5. Use the value in your app/table

Then if your worker role were to crash during that process, then you would only have a missing ID in your store. IMHO that is better than duplicates.

Here is a code sample and more information on this approach from Steve Marx

缺⑴份安定 2024-08-21 09:45:41

如果您确实需要避免使用 guid,您是否考虑过使用基于日期/时间的内容,然后利用分区键来最小化并发风险。

您的分区键可以是用户、年、月、日、小时等,行键可以是足够小的时间跨度内的其余日期时间以控制并发。

当然,您必须问自己,以 Azure 中的日期为代价,避免使用 Guid 是否真的值得所有这些额外的努力(假设 Guid 可以正常工作)。

If you really need to avoid guids, have you considered using something based on date/time and then leveraging partition keys to minimize the concurrency risk.

Your partition key could be by user, year, month, day, hour, etc and the row key could be the rest of the datetime at a small enough timespan to control concurrency.

Of course you have to ask yourself, at the price of date in Azure, if avoiding a Guid is really worth all of this extra effort (assuming a Guid will just work).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文