Azure - 查询 2 亿个实体
我需要查询 Windows Azure 中包含 2 亿个实体的存储。理想情况下,我希望使用表服务而不是 SQL Azure 来完成此任务。
用例是这样的:包含新实体的 POST 将从面向 Web 的 API 传入。我们必须查询大约 2 亿个实体,以确定我们是否可以接受新实体。
实体限制为 1,000 个:这是否适用于此类查询,即我必须一次查询 1,000 个实体并执行比较/业务规则,或者我可以一次性查询所有 2 亿个实体吗?我想在后一种情况下我会暂停。
有想法吗?
I have a need to query a store of 200 million entities in Windows Azure. Ideally, I would like to use the Table Service, rather than SQL Azure, for this task.
The use case is this: a POST containing a new entity will be incoming from a web-facing API. We must query about 200 million entities to determine whether or not we may accept the new entity.
With the entity limit of 1,000: does this apply to this type of query, i.e. I have to query 1,000 at a time and perform my comparisons / business rules, or can I query all 200 million entities in one shot? I think I would hit a timeout in the latter case.
Ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
扩展 Shiraz 关于表存储的评论:表被组织成分区,然后您的实体由行键索引。因此,使用分区键+行键的组合可以非常快速地找到每一行。诀窍是为您的特定应用程序选择最佳的分区键和行键。
对于上面的示例,如果您要按电话号码进行搜索,则可以将 TelephoneNumber 设为分区键。您可以很容易地找到与该电话号码相关的所有行(尽管不知道您的应用程序,我不知道您需要多少行)。为了进一步完善,您需要在分区键中定义一个可以索引的行键。这将为您提供非常快速的响应,让您知道记录是否存在。
表存储(实际上是一般的 Azure 存储 - 表、blob、队列)具有众所周知的 SLA。您可以在给定分区上每秒执行最多 500 个事务。对于上面的示例,对给定电话号码的行的查询相当于一笔事务(除非返回的行数超过 1000 行 - 要查看所有行,您需要额外的提取);添加行键来缩小搜索范围确实会产生单个事务)。插入新行也是如此。您还可以在单个分区内批量进行多行插入,并将它们保存在单个事务中。
有关 Azure 表存储的详细概述以及一些不错的实验,请查看 平台培训套件。
有关表中事务的详细信息,请参阅此 msdn 博客文章。
Expanding on Shiraz's comment about Table storage: Tables are organized into partitions, and then your entities are indexed by a Row key. So, each row can be found extremely fast using the combination of partition key + row key. The trick is to choose the best possible partition key and row key for your particular application.
For your example above, where you're searching by telephone number, you can make TelephoneNumber the partition key. You could very easily find all rows related to that telephone number (though, not knowing your application, I don't know just how many rows you'd be expecting). To refine things further, you'd want to define a row key that you can index into, within the partition key. This would give you a very fast response to let you know whether a record exists.
Table storage (actually Azure Storage in general - tables, blobs, queues) have a well-known SLA. You can execute up to 500 transactions per second on a given partition. With the example above, the query for rows for a given telephone number would equate to one transaction (unless you exceed 1000 rows returned - to see all rows, you'd need additional fetches); adding a row key to narrow the search would, indeed, yield a single transaction). So would inserting a new row. You can also batch up multiple row inserts, within a single partition, and save them in a single transaction.
For a nice overview of Azure Table Storage, with some good labs, check out the Platform Training Kit.
For more info about transactions within tables, see this msdn blog post.
1000 的限制是查询返回的行数,而不是查询的行数。
将所有 2 亿行拉入 Web 服务器进行检查是行不通的。
诀窍是用一个键来存储行,该键可用于检查记录是否应该被接受。
The limit of 1000 is the number of rows returned from a query, not the number of rows queried.
Pulling all of the 200 million rows into the web server to check them will not work.
The trick is to store the rows with a key that can be used to check if the record should be accepted.