如何以最快的方式获取接下来的 1000 条记录
我正在使用 Azure 表存储。 假设我的表中有一个包含 10,000 条记录的分区,我想要获取编号为 1000 到 1999 的记录。下次我想要获取编号为 4000 到 4999 等的记录。 最快的方法是什么?
到目前为止我能找到的只有两个选项,我不太喜欢: 1.运行一个返回所有10,000条记录的查询,并在获得所有10,000条记录时过滤出我想要的内容。 2. 运行一次返回 1000 条记录的查询,并使用延续令牌获取接下来的 1000 条记录。
是否可以在不下载所有相应记录的情况下获得延续令牌?如果我能获得连续令牌 1,而不是获得连续令牌 2,并通过 CT2 获得记录 2000 到 2999,那就太好了。
I'm using Azure Table Storage.
Let's say i have a Partition in my Table with 10,000 records, and I would like to get records number 1000 to 1999. And next time i would like to get records number 4000 to 4999 etc.
What is the fastest way of doing that?
All I can find till now are two options, which I don't like very much:
1. run a query which returns all 10,000 records, and filter out what I want when I get all 10,000 records.
2. Run a query whichs returns 1000 records at a time, and use a continuation token to get the next 1000 records.
Is it possible to get a continuation token without downloading all corresponding records? It would be great if i can get Continuation Token 1, than get Continuation token 2, and with CT2 get records 2000 to 2999.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
理论上,您应该能够使用连续令牌,而无需通过在第一个请求后关闭连接来下载前 1000 条记录的实际数据。我的意思是在 TCP 级别关闭它。在您读取所有数据之前。然后打开一个新连接并在那里使用延续令牌。两个 WebRequest 不会执行此操作,因为 HTTP 实现可能会使用保持活动状态,这意味着即使您没有在代码中读取数据,所有数据也将在后台读取。实际上,您可以将 HTTP 请求配置为不使用保持活动状态。
但是,如果您知道 RowKey 并且可以对其进行搜索,那么另一种方法自然是,但我假设您不知道每 1000 个实体批次中将包含哪些行键。
最后我想问一下你为什么会遇到这个问题。 您的访问模式是什么。如果插入很常见并且获取这些记录很少,我就不会费心提高它的效率。如果这就像一个分页问题,我可能会获取第一个请求的所有数据并将其缓存(在云中)。如果插入很少,但您需要经常运行此查询,我会考虑使数据插入每 1000 个实体有一个分区,并在插入实体时根据需要(由于排序)重新平衡。
Theoretically you should be able to use continuation tokens without downloading the actual data for the first 1000 recors by closing the connection you have after the first request. And I mean closing it at TCP level. And before you read all data. Then open a new connection and use continuation token there. Two WebRequests will not do it since the HTTP implementation will likely use keep alive wchich means all your data is going to be read in the background even though you don't read it in your code. Actually you can configure your HTTP requests to not use keep alive.
However, another way is naturally if you know the RowKey and can search on that but I assume you don't know which row keys will be in each 1000 entity batch.
Last I would ask why you have this problem in the first place. And what your access pattern is. If inserts are common and getting these records is rare I wouldn't bother making it more efficient. if this is like a paging problem i would probably get all data on the first request and cache it (in the cloud). if inserts are rare but you need to run this query often I would consider making the insertion of data have one partion for every 1000 entities and rebalance as needed (due to sorting) as entities are inserted.