SQL Azure 性能注意事项
在规划 SQL Azure 应用程序时应牢记哪些性能注意事项? Azure 存储、工作人员和 Web 角色看起来非常可扩展,但如果最终他们使用一个数据库......它看起来像是瓶颈。
我试图找到有关以下内容的数字:
- 有多少并发连接 SQL Azure 支持吗?
- 带宽是多少?
但没有运气。
例如,我正在计划和应用程序使用非常高级别的插入,但我需要每次返回聚合函数的结果(例如:列中具有相同键的所有记录的总和),所以我不能搭配桌子存储。
批处理是一种选择,但时间响应也很关键,所以我担心数据库会因大量连接而变得臃肿。
分片是另一种选择,但即使插入量很大,数据量也很小,4到6列,有一个PK,没有FK。因此,对于分区而言,即使是 1Gb DB 也显得有些过大(并且付出了过高的代价:D)。
当我面对此类应用程序时,我应该牢记哪些性能关键点?
干杯。
Which are the performance considerations I should keep in mind when I'm planning an SQL Azure application? Azure Storage, and the worker and the web roles looks very scalable, but if at the end they are using one database... it looks like the bottleneck.
I was trying to find numbers about:
- How many concurrent connections does
SQL Azure support? - Which is the bandwidth?
But no luck.
For example, I'm planning and application that uses a very high level of inserts, but I need return the result of an aggregate function each time (e.g.: the sum of all records with same key in a column), so I can not go with table storage.
Batching is an option, but time response is critical as well, so I'm afraid the database will be bloated with lot of connections.
Sharding is another option, but even when the amount of inserts is massive, the amount of data is very small, 4 to 6 columns with one PK and no FK. So even a 1Gb DB would be an overkill (and an overpay :D) for a partition.
Which would be the performance keys I should keep in mind when I'm facing these kind of applications?
Cheers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
即使在云中,同时实现可扩展性和性能也可能非常困难。您的问题主要是关于可扩展性,因此您可能希望设计应用程序时使数据“最终”保持一致,例如使用队列。辅助角色将侦听传入的插入请求并异步执行插入。
为了最大限度地减少数据库往返次数并优化连接池,请确保批量插入。因此您可以一次性发送 100 个插入内容。另请记住,SQL Azure 现在支持 MARS(多个活动记录集),以便您可以在单个批次中将多个 SELECT 返回给调用代码。使用批处理和 MARS 应该将数据库连接的数量减少到最少。
分片通常有助于读取操作;对于插入来说没有那么多(尽管我从未使用分片对插入进行基准测试)。所以我不确定分片能否满足您的要求。
请记住,Azure 产品首先是为了多租户环境中的可扩展性和合理性能而设计的,在多租户环境中,您的数据库与同一服务器上的其他人共享。因此,如果您需要强大的性能和有保证的响应时间,您可能需要重新评估您的托管选择,或者确实按照 tijmenvdk 的建议测试 Azure 的性能边界以满足您的需求。
Achieving both scalability and performance can be very difficult, even in the cloud. Your question was primarily about scalability, so you may want to design your application in such a way that your data becomes "eventually" consistent, using queues for example. A worker role would listen for incoming insert requests and would perform the insert asynchronously.
To minimize the number of roundtrips to the database and optimize connection pooling make sure to batch your inserts as well. So you could send 100 inserts in one shot. Also keep in mind that SQL Azure now supports MARS (multiple active recordsets) so that you can return multiple SELECTs in a single batch back to the calling code. The use of batching and MARS should reduce the number of database connections to a minimum.
Sharding usually helps for Read operations; not so much for inserts (although I never benchmarked inserts with sharding). So I am not sure sharding will help you that much for your requirements.
Remember that the Azure offering is designed first for scalability and reasonable performance in a multitenancy environment, where your database is shared with others on the same server. So if you need strong performance with guaranteed response time you may need to reevaluate your hosting choices or indeed test the performance boundaries of Azure for your needs as suggested by tijmenvdk.
如果发生任何形式的资源争用(这包括重负载,但也可能在数据库物理移动时发生),SQL Azure 将限制您的连接。限制是不确定的,这意味着您无法预测这种情况是否以及何时发生。限制时,SQL Azure 将断开您的连接,要求您重试。由于底层基础设施的灵活性,支持的连接数量和带宽不是“按设计”发布的。话虽如此,该设置是针对高可用性而不是高吞吐量进行优化的。
如果突发发生在已知时间,您可能会考虑在这些突发期间进行分片,并在突发发生后合并数据。处理此问题的另一种方法是当且仅当发生限制时才开始排队/批处理写入。您可以使用 Azure 队列以及辅助角色来稍后清空队列。这种“溢出机制”的优点是,如果发生节流,可以自动接合。
作为替代方案,您可以使用 Azure 表存储并保留一个单独的运行总计表,您可以报告该总计,而不是对数据执行聚合以返回所有记录所需的总和(由于缺乏锁定,这可能会很棘手)不过表格)。
很抱歉陈述显而易见的事情,但第一步是测试您的场景中是否遇到限制。我会尝试一下溢出解决方案。
SQL Azure will throttle your connections if any form of resource contention occurs (this includes heavy load but might also occur when your database is physically moved around). Throttling is non-deterministic, meaning that you cannot predict if and when this happens. When throttling, SQL Azure will drop your connection, requiring you to perform a retry. Number of connections supported and bandwidth is not published "by design" due to the flexible nature of the underlying infrastructure. Having said that, the setup is optimized for high availability, not high throughput.
If the bursts happen at a known time, you might consider sharding just during those bursts and consolidating the data after the burst has happened. Another way to handle this, is to start queueing/batching writes if and only if throttling occurs. You can use an Azure Queue for that plus a worker role to empty the queue later. This "overflow mechanism" has the advantage of automatically engaging if throttling occurs.
As an alternative you could use Azure Table Storage and keep a separate table of running totals that you can report back instead of performing an aggregation over the data to return the required sum of all records (this might be tricky due to the lack of locking on the tables though).
Apologies for stating the obvious, but the first step would be to test if you run into throttling at all in your scenario. I would give the overflow solution a try.