性能 - 表服务、SQL Azure - 插入。大数据量查询速度
我读过很多关于比较 SQL Azure 和表服务的帖子和文章,其中大多数都说表服务比 SQL Azure 更具可扩展性。
- http://www.silverlight -travel.com/blog/2010/03/31/azure-table-storage-sql-azure/
- http://www.intertech.com/Blog/post /Windows-Azure-Table-Storage-vs-Windows-SQL-Azure.aspx
- Microsoft Azure 存储与 Azure SQL 数据库
- https://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/2fd79cf3-ebbb-48a2-be66-542e21c2bb4d
- https://blogs .msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx
- https://stackoverflow.com/questions/2711868/azure-performance
- http://vermorel.com/journal/2009/9/17/table-storage-or-the-100x-cost -factor.html
- Azure 表还是 SQL Azure?
- http://www.brentozar.com/archive/2010/01 /sql-azure-frequently-asked-questions/
- https:// /code.google.com/p/lokad-cloud/wiki/FatEntities
抱歉 http,我是新用户 >_< 但是 http://azurescope.cloudapp.net/BenchmarkTestCases/ 基准测试显示了不同的情况。
我的情况。使用 SQL Azure:一张表有很多插入,每天大约 172,000,000 次(每秒 2000 次)。当一张表中有 200 万条记录或 9999....90 亿条记录时,我可以期望插入和选择具有良好的性能吗?
使用表服务:一张表具有一定数量的分区。分区的数量可以很大,非常大。
问题 1:表服务对于在一个表中创建很多很多分区是否有一些限制或最佳实践?
问题#2:在单个分区中,我有大量的小实体,如上面的 SQL Azure 示例。当一个分区中有 200 万条记录或 99990 亿个实体时,我能否获得良好的插入和选择性能?
我知道分片或分区解决方案,但它是一个云服务,云不是很强大并且不需要我的代码技能就可以完成所有工作吗?
问题#3:有人可以向我展示 SQL Azure 和表服务查询大量数据的基准吗?
问题#4:也许您可以为我的案例提出更好的解决方案。
I'd read many posts and articles about comparing SQL Azure and Table Service and most of them told that Table Service is more scalable than SQL Azure.
- http://www.silverlight-travel.com/blog/2010/03/31/azure-table-storage-sql-azure/
- http://www.intertech.com/Blog/post/Windows-Azure-Table-Storage-vs-Windows-SQL-Azure.aspx
- Microsoft Azure Storage vs. Azure SQL Database
- https://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/2fd79cf3-ebbb-48a2-be66-542e21c2bb4d
- https://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx
- https://stackoverflow.com/questions/2711868/azure-performance
- http://vermorel.com/journal/2009/9/17/table-storage-or-the-100x-cost-factor.html
- Azure Tables or SQL Azure?
- http://www.brentozar.com/archive/2010/01/sql-azure-frequently-asked-questions/
- https://code.google.com/p/lokad-cloud/wiki/FatEntities
Sorry for http, I'm new user >_<
But http://azurescope.cloudapp.net/BenchmarkTestCases/ benchmark shows different picture.
My case. Using SQL Azure: one table with many inserts, about 172,000,000 per day(2000 per second). Can I expect good perfomance for inserts and selects when I have 2 million records or 9999....9 billion records in one table?
Using Table Service: one table with some number of partitions. Number of partitions can be large, very large.
Question #1: is Table service has some limitations or best practice for creating many, many, many partitions in one table?
Question #2: in a single partition I have a large amount of small entities, like in SQL Azure example above. Can I expect good perfomance for inserts and selects when I have 2 million records or 9999 billion entities in one partition?
I know about sharding or partition solutions, but it is a cloud service, is cloud not powerfull and do all work without my code skills?
Question #3: Can anybody show me benchmarks for quering on large amount of datas for SQL Azure and Table Service?
Question #4: May be you could suggest a better solution for my case.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
简短回答
长答案
在您的情况下,我怀疑 SQL Azure 不适合您,只是因为 SQL Azure 数据库的大小有限制。如果您插入的每一行都有 1K 的索引,您将在大约 300 天内达到 50GB 的限制。确实,微软正在谈论大于 50GB 的数据库,但他们没有给出具体的时间表。 SQL Azure 还有一个我目前无法找到的吞吐量限制(但我很确定它低于您的需要)。您可以通过在多个 SQL Azure 数据库中对数据进行分区来解决这一问题。
SQL Azure 确实具有的优势是能够运行聚合查询。在 AZT 中,您甚至无法在不加载每个客户的情况下编写
select count(*) from customer
。AZT 还具有每个分区每秒 500 个事务的限制,以及 每个帐户每秒“数千”。
我发现选择分区键 (PK) 和行键 (RK) 取决于您要如何查询数据。如果您想单独访问这些项目中的每一项,只需为每一行指定它自己的分区键和一个常量行键即可。这意味着您有很多分区。
举例来说,如果您插入的这些行是订单,并且这些订单属于客户。如果您更常见的是按客户列出订单,则 PK = CustomerId,RK = OrderId。这意味着您只需通过分区键查询即可找到客户的订单。要获取特定订单,您需要知道 CustomerId 和 OrderId。客户的订单越多,找到特定订单的速度就越慢。
如果您只需要通过 OrderId 访问订单,那么您可以使用 PK = OrderId, RK = string.Empty 并将 CustomerId 放入另一个属性中。虽然您仍然可以编写一个查询来返回客户的所有订单,但如果您的查询不使用 PartitionKey(有时甚至取决于您的编写方式),AZT 不支持除 PartitionKey 和 RowKey 之外的索引它们)将导致表扫描。就你所说的记录数量而言,这将是非常糟糕的。
在我遇到的所有场景中,拥有大量分区似乎并不会让 AZT 太担心。
在 AZT 中对数据进行分区的另一种不常提及的方法是将数据放在不同的表中。例如,您可能希望每天创建一个表。如果您想运行上周的查询,请对 7 个不同的表运行相同的查询。如果您准备在客户端做一些工作,您甚至可以并行运行它们。
Short Answer
Long Answer
In your case I suspect that SQL Azure is not going work for you, simply because of the limits on the size of a SQL Azure database. If each of those rows you're inserting are 1K with indexes you will hit the 50GB limit in about 300 days. It's true that Microsoft are talking about databases bigger than 50GB, but they've given no time frames on that. SQL Azure also has a throughput limit that I'm unable to find at this point (I pretty sure it's less than what you need though). You might be able to get around this by partitioning your data across more than one SQL Azure database.
The advantage SQL Azure does have though is the ability to run aggregate queries. In AZT you can't even write a
select count(*) from customer
without loading each customer.AZT also has a limit of 500 transactions per second per partition, and a limit of "several thousand" per second per account.
I've found that choosing what to use for your partition key (PK) and row key depends (RK) on how you're going to query the data. If you want to access each of these items individually, simply give each row it's own partition key and a constant row key. This will mean that you have lots of partition.
For the sake of example, if these rows you were inserting were orders and the orders belong to a customer. If it was more common for you to list orders by customer you would have PK = CustomerId, RK = OrderId. This would mean to find orders for a customer you simply have to query on the partition key. To get a specific order you'd need to know the CustomerId and the OrderId. The more orders a customer had, the slower finding any particular order would be.
If you just needed to access orders just by OrderId, then you would use PK = OrderId, RK = string.Empty and put the CustomerId in another property. While you can still write a query that brings back all orders for a customer, because AZT doesn't support indexes other than on PartitionKey and RowKey if your query doesn't use a PartitionKey (and sometimes even if it does depending on how you write them) will cause a table scan. With the number of records you're talking about that would be very bad.
In all of the scenarios I've encountered, having lots of partitions doesn't seem to worry AZT too much.
Another way you can partition your data in AZT that is not often mentioned is to put the data in different tables. For example, you might want to create one table for each day. If you want to run a query for last week, run the same query against the 7 different tables. If you're prepared to do a bit of work on the client end you can even run them in parallel.
Azure SQL 可以轻松获取更多数据。这是我几个月前录制的视频,其中展示了一个示例(可在 GitHub 上找到),该示例展示了实现此目的的一种方法。
https://www.youtube.com/watch?v=vVrqa0H_rQA
这是完整的仓库
https://github.com/Azure -Samples/streaming-at-scale/tree/master/eventhubs-streamanalytics-azuresql
Azure SQL can easily ingest that much data an more. Here's a video I recorded months ago that show a sample (available on GitHub) that shows one way you can do that.
https://www.youtube.com/watch?v=vVrqa0H_rQA
here's the full repo
https://github.com/Azure-Samples/streaming-at-scale/tree/master/eventhubs-streamanalytics-azuresql