性能 - 表服务、SQL Azure - 插入。大数据量查询速度

发布于 2024-09-26 00:13:00 字数 2831 浏览 7 评论 0 原文

我读过很多关于比较 SQL Azure 和表服务的帖子和文章，其中大多数都说表服务比 SQL Azure 更具可扩展性。

抱歉 http，我是新用户 >_< 但是 http://azurescope.cloudapp.net/BenchmarkTestCases/ 基准测试显示了不同的情况。

我的情况。使用 SQL Azure：一张表有很多插入，每天大约 172,000,000 次（每秒 2000 次）。当一张表中有 200 万条记录或 9999....90 亿条记录时，我可以期望插入和选择具有良好的性能吗？

使用表服务：一张表具有一定数量的分区。分区的数量可以很大，非常大。

问题 1：表服务对于在一个表中创建很多很多分区是否有一些限制或最佳实践？

问题#2：在单个分区中，我有大量的小实体，如上面的 SQL Azure 示例。当一个分区中有 200 万条记录或 99990 亿个实体时，我能否获得良好的插入和选择性能？

我知道分片或分区解决方案，但它是一个云服务，云不是很强大并且不需要我的代码技能就可以完成所有工作吗？

问题#3：有人可以向我展示 SQL Azure 和表服务查询大量数据的基准吗？

问题#4：也许您可以为我的案例提出更好的解决方案。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三人与歌 2024-10-03 00:13:00

简短回答

我还没有看到大量分区导致 Azure 表 (AZT) 问题，但我没有这么大的数据量。
分区中的项目越多，该分区中的查询就越慢
抱歉，我没有基准测试
请参阅下面

长答案

在您的情况下，我怀疑 SQL Azure 不适合您，只是因为 SQL Azure 数据库的大小有限制。如果您插入的每一行都有 1K 的索引，您将在大约 300 天内达到 50GB 的限制。确实，微软正在谈论大于 50GB 的数据库，但他们没有给出具体的时间表。 SQL Azure 还有一个我目前无法找到的吞吐量限制（但我很确定它低于您的需要）。您可以通过在多个 SQL Azure 数据库中对数据进行分区来解决这一问题。

SQL Azure 确实具有的优势是能够运行聚合查询。在 AZT 中，您甚至无法在不加载每个客户的情况下编写select count(*) from customer。

AZT 还具有每个分区每秒 500 个事务的限制，以及每个帐户每秒“数千”。

我发现选择分区键 (PK) 和行键 (RK) 取决于您要如何查询数据。如果您想单独访问这些项目中的每一项，只需为每一行指定它自己的分区键和一个常量行键即可。这意味着您有很多分区。

举例来说，如果您插入的这些行是订单，并且这些订单属于客户。如果您更常见的是按客户列出订单，则 PK = CustomerId，RK = OrderId。这意味着您只需通过分区键查询即可找到客户的订单。要获取特定订单，您需要知道 CustomerId 和 OrderId。客户的订单越多，找到特定订单的速度就越慢。

如果您只需要通过 OrderId 访问订单，那么您可以使用 PK = OrderId, RK = string.Empty 并将 CustomerId 放入另一个属性中。虽然您仍然可以编写一个查询来返回客户的所有订单，但如果您的查询不使用 PartitionKey（有时甚至取决于您的编写方式），AZT 不支持除 PartitionKey 和 RowKey 之外的索引它们）将导致表扫描。就你所说的记录数量而言，这将是非常糟糕的。

在我遇到的所有场景中，拥有大量分区似乎并不会让 AZT 太担心。

在 AZT 中对数据进行分区的另一种不常提及的方法是将数据放在不同的表中。例如，您可能希望每天创建一个表。如果您想运行上周的查询，请对 7 个不同的表运行相同的查询。如果您准备在客户端做一些工作，您甚至可以并行运行它们。

Short Answer

I haven't seen lots of partitions cause Azure Tables (AZT) problems, but I don't have this volume of data.
The more items in a partition, the slower queries in that partition
Sorry no, I don't have the benchmarks
See below

Long Answer

In your case I suspect that SQL Azure is not going work for you, simply because of the limits on the size of a SQL Azure database. If each of those rows you're inserting are 1K with indexes you will hit the 50GB limit in about 300 days. It's true that Microsoft are talking about databases bigger than 50GB, but they've given no time frames on that. SQL Azure also has a throughput limit that I'm unable to find at this point (I pretty sure it's less than what you need though). You might be able to get around this by partitioning your data across more than one SQL Azure database.

The advantage SQL Azure does have though is the ability to run aggregate queries. In AZT you can't even write a select count(*) from customer without loading each customer.

AZT also has a limit of 500 transactions per second per partition, and a limit of "several thousand" per second per account.

I've found that choosing what to use for your partition key (PK) and row key depends (RK) on how you're going to query the data. If you want to access each of these items individually, simply give each row it's own partition key and a constant row key. This will mean that you have lots of partition.

For the sake of example, if these rows you were inserting were orders and the orders belong to a customer. If it was more common for you to list orders by customer you would have PK = CustomerId, RK = OrderId. This would mean to find orders for a customer you simply have to query on the partition key. To get a specific order you'd need to know the CustomerId and the OrderId. The more orders a customer had, the slower finding any particular order would be.

If you just needed to access orders just by OrderId, then you would use PK = OrderId, RK = string.Empty and put the CustomerId in another property. While you can still write a query that brings back all orders for a customer, because AZT doesn't support indexes other than on PartitionKey and RowKey if your query doesn't use a PartitionKey (and sometimes even if it does depending on how you write them) will cause a table scan. With the number of records you're talking about that would be very bad.

In all of the scenarios I've encountered, having lots of partitions doesn't seem to worry AZT too much.

Another way you can partition your data in AZT that is not often mentioned is to put the data in different tables. For example, you might want to create one table for each day. If you want to run a query for last week, run the same query against the 7 different tables. If you're prepared to do a bit of work on the client end you can even run them in parallel.

回复收藏 0 原文