对 Azure 可扩展性目标和多个 Azure 存储帐户的使用有疑问吗?
Windows Azure 存储抽象及其可扩展性目标博客文章指出单个存储帐户的事务限制为 5,000 个实体/秒,并且单个表分区的实体数/秒限制为 500 个。为了满足第一个限制,应该使用多个帐户,并且对于分区限制,应该仔细设计其分区。
我想询问其他有关于单个存储帐户 5000 个限制的经验的人。现在,我正在设计一个博客/维基社区,并表示有一天该网站会变得流行并吸引大量流量。我是否应该将用户相关表拆分到一个存储帐户,将博客相关表拆分到另一个帐户,并将 wiki 相关表拆分到另一个帐户,以防止出现此限制?或者我应该根据需要添加更多帐户,顺便问一下,有没有一种方法可以将天蓝色存储表从一个帐户转移到另一个帐户?文章说,当您达到该限制时,您将收到“503 服务器繁忙”响应,有没有办法知道限制正在接近,以便我可以提前进行某些操作,而不会准确地导致 503 错误?
The Windows Azure Storage Abstractions and their Scalability Targets blog post indicates there is a 5,000 entities/second transaction limit for a single storage account, and there is a 500 entities/second limit for a single table partition. And to meet the first limit one should used multiple accounts, and for the partition limit one should design their partitions carefully.
I'd like to ask for others who have experience on the 5000 limit to a single storage account. Right now, I'm designing a community of blogs/wikis and say one day the site becomes popular and attracts a lot of traffic. Should I split the user related tables to one storage account and blog related tables to another account and yet wiki related tables to another to prevent this limit right now? Or should I add more accounts as there is a need, by the way is there a way to transfer azure storage tables from one account to another? The article says when you hit that limit you will get “503 server busy” responses, is there a way to know the limit is getting close so I could something in advance without accully resulting 503 errors?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我总体上还没有达到帐户限制,但通过尝试将从队列读取的工作角色数量设置为荒谬的水平,我已经达到了队列上事务数量的限制。
据我所知,没有“你即将达到极限”的警告。当您第一次知道自己已达到限制时,您会收到 503 错误。
将数据从一个帐户传输到另一个帐户时,没有任何内置功能可以为您完成此操作。您要么必须推出自己的解决方案来读取源表中的每一行并将其写入目标表,要么使用类似 Cerebrata Cloud Storage Studio,它允许您下载和上传表格或其 CMDLTS 可以让您做同样的事情,但更便宜/免费。
如果您刚刚开始,并且您有跨存储帐户对数据进行分区的逻辑方法,并且它不会使代码太复杂,那么就这样做。但现阶段我不会太担心。如果您的网站确实变得流行并且您开始达到交易限制,那么它很可能来自您没有预料到的区域,或者可能来自于一张桌子上的太多交易。正如您所说,这是一个博客社区,可能获得最多交易的区域是您存储评论的地方。如果您的评论表每秒处理的事务超过 5000 个,您可能需要跨多个存储帐户对评论进行分区。当然,如果博客如此受欢迎,您很可能还会遇到其他问题需要处理。
I haven't hit the account limit overall, but I have hit the limit for number of transactions on a Queue by trying to set the number of worker roles reading from that queue to a ridiculous level.
As far as I know there is no "you're about to hit the limit" warning. The first time you know that you've hit the limit is you get the 503 error.
With transferring data from one account to another, there is no built in functionality that will do it for you. You either have to roll your own solution to read through every row in the source table and write it to the destination table, or use something like the Cerebrata Cloud Storage Studio which allows you to download and upload the contents of tables or their CMDLTS which let you do the same thing, but are cheaper/free.
If you're just starting out and you have logical ways of partitioning the data across storage accounts and it doesn't make the code too complicated, then do it. But I wouldn't worry about it too much at this stage. Chances are if your site does become popular and you start hitting the transaction limit, it will likely come from an area that you hadn't expected or may come from too many transactions to just one table. As you said this was for a community of blogs, the area that's likely to get the most transactions is where ever you store comments. If you get more than 5000 transactions a second against your comments table you may need to partition the comments across multiple storage accounts. Of course if the blogs are that popular, chances are you'll have other problems to deal with as well.
如果可扩展性是您所追求的,那么您可能会考虑 Sql Azure 联合而不是 Azure 表存储。联盟功能已于 2011 年 12 月开始提供。您可以找到一个很好的概述 此处。
使用 Sql Azure 联合,您可以更好地控制正在使用的资源量。在表存储中,我们鼓励您创建许多分区,以便底层引擎可以在某个时候将您的数据分布在多台机器上,并且您将获得更高的吞吐量。然而,分区只是表存储引擎的一个提示。它不一定会将数据移动到新机器。根据使用情况和内部算法,它可能会这样做,但你永远无法确定它何时会这样做。通过 Sql Azure 联合,您可以控制正在使用的实例数量。您将控制少量实例(= 小成本)和大量实例(= 大吞吐量)之间的平衡。
通过联合,您仍然可以享受关系数据库的大部分好处。也就是说,您仍然可以拥有事务、连接、索引。事实上,您可以拥有独立 Sql Azure 数据库的所有功能。唯一的限制是您一次只能对一个联合实例执行操作(目前联合内部没有内置的跨实例选择支持)。
确实,您可以通过创建多个帐户来增加表存储的吞吐量,但您必须手动管理。您将负责在进行拆分时在帐户之间移动数据,并负责实现应用程序级逻辑,以便在搜索某些数据时路由到正确的帐户。这是由联盟自动管理的。
考虑表存储的唯一原因可能与它的每 GB 价格有关,与 Sql Azure 相比,它要低得多(表存储定价描述此处,描述了 Sql Azure 定价此处)。因此,如果您正在考虑存储大量数据,那么您确实可以考虑表存储(只要您可以忍受它的限制)。
严格从吞吐量角度来看,Sql Azure 的单个实例可以提供与表存储帐户类似的性能。只要您能够获得良好的请求分布,通过联合,您就可以将单个数据库的吞吐量乘以已用实例的总数。
如果您对某些数字感兴趣,几个月前我做了一个基准测试并针对联合数据库运行它。结果可以在此处找到。
If scalability is what you are after, then you might consider Sql Azure Federations instead of the Azure Table Storage. The Federations feature has been made available starting with December 2011. You can find a good overview here.
With Sql Azure Federations you have better control on the amount of resources you are using. In Table Storage you are encouraged to create many partitions so that the underlying engine could at some point distribute your data on multiple machines and you will get an increased throughput. However, a partition is just a hint for the Table Storage engine. It will not necessarily move the data to a new machine. It might do that, based on the usage and on its internal algorithms, but you can never be sure when it does. With Sql Azure Federations you are the one controlling the number of instances you are using. You will control the balance between a small number of instances ( = small cost) and a big amount of instances ( = big throughput).
With Federations you can still enjoy most of the benefits from relational databases. That is you can still have transactions, joins, indexes. In fact you can have all the functionalities from a standalone Sql Azure database. The only limit is that you can only act on one federation instance at one time (at the moment the is no built in cross instance select support inside a federation).
It is true that you can increase the throughput from Table Storage by creating multiple accounts but you will have manage that manually. You will be responsible for moving the data between the accounts when making a split and for implementing the application level logic that would route to the correct account when searching certain data. That is managed automatically with Federations.
Probably the only reason to consider Table Storage is related to its price / GB which is a lot lower compared to Sql Azure (table storage pricing described here, Sql Azure pricing described here). So if you are considering to store huge amounts of data, then you might indeed consider the Table Storage (as long as you can live with its limitations).
Strictly from the throughput perspective a single instance of Sql Azure can provide a similar performance with a Table Storage account. As long as you can obtain a good distribution of the requests, with Federations you can multiply the throughput of a single database with the total number of used instances.
If you are interested on some numbers, a few months ago I have made a benchmark and run it against a federated database. The results are to be found here.