合并缓存的 GQL 查询而不是使用 IN
我正在生成一个合并许多用户评论的提要,因此您的提要可能是 user1+user2+user1000 的评论,而我的可能是 user1+user2。所以我有这样一句话:
some_comments = Comment.gql("WHERE username IN :1",user_list)
我不能只对整个内容进行内存缓存,因为每个人都会有不同的提要,即使 user1 和 user2 的提要对于许多观众来说是常见的。根据文档:
...IN 运算符执行一个单独的 每个的底层数据存储查询 列表中的项目。实体 返回的结果是 所有底层证券的叉积 数据存储查询和是 重复数据删除。最多允许 30 个数据存储查询 单个 GQL 查询。
是否有一个库函数可以合并一些排序和缓存的查询,或者我必须这样做:(
for user in user_list
if memcached(user):
add it to the results
else:
add Comment.gql("WHERE username = :1",user) to the results
cache it too
sort the results
在最坏的情况下(没有缓存任何内容)我预计发送 30 个 GQL 查询比一个巨大的 IN 查询要慢。)
I'm generating a feed that merges the comments of many users, so your feed might be of comments by user1+user2+user1000 whereas mine might be user1+user2. So I have the line:
some_comments = Comment.gql("WHERE username IN :1",user_list)
I can't just memcache the whole thing since everyone will have different feeds, even if the feeds for user1 and user2 would be common to many viewers. According to the documentation:
...the IN operator executes a separate
underlying datastore query for every
item in the list. The entities
returned are a result of the
cross-product of all the underlying
datastore queries and are
de-duplicated. A maximum of 30 datastore queries are allowed for any
single GQL query.
Is there a library function to merge some sorted and cached queries, or am I going to have to:
for user in user_list
if memcached(user):
add it to the results
else:
add Comment.gql("WHERE username = :1",user) to the results
cache it too
sort the results
(In the worst case (nothing is cached) I expect sending 30 GQL queries off is slower than one giant IN query.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
没有任何内置功能可以执行此操作,但您可以自己执行此操作,但有一点需要注意:如果您执行
in
查询并返回 30 个结果,这些将是根据您的排序最低的 30 条记录。所有子查询的排序标准。但是,如果您想从缓存的各个查询中组装结果集,那么您要么必须为每个用户缓存与总结果集一样多的结果(例如 30),然后丢弃大部分结果,要么您'我们将不得不为每个用户存储更少的结果,并接受有时您会丢弃一个用户的较新结果,转而使用另一个用户的较旧结果。也就是说,您可以这样做:
memcache.get_multi
来检索所有用户的缓存结果集目前,
in
查询是串行执行的,因此即使没有缓存任何结果,此方法也不会比执行in
查询慢。不过,这种情况将来可能会改变。如果您现在想提高性能,您可能需要使用 Guido 的 NDB 项目,它将允许您并行执行所有子查询。There's nothing built-in to do this, but you can do it yourself, with one caveat: If you do an
in
query and return 30 results, these will be the 30 records that sort lowest according to your sort criteria across all the subqueries. If you want to assemble the resultset from cached individual queries, though, either you are going to have to cache as many results for each user as the total result set (eg, 30), and throw away most of those results, or you're going to have to store fewer results per user, and accept that sometimes you'll throw away newer results from one user in favor of older results from another.That said, here's how you can do this:
memcache.get_multi
to retrieve cached result sets for all the usersmemcache.set_multi
to cache the result sets.Currently,
in
queries are executed serially, so this approach won't be any slower than executing anin
query, even when none of the results are cached. This may change in future, though. If you want to improve performance now, you'll probably want to use Guido's NDB project, which will allow you to execute all the subqueries in parallel.您可以使用
memcache.get_multi()
查看哪些用户的 Feed 已在 memcache 中。然后对原始用户列表与 memcache 中找到的用户列表使用set().difference()
来找出哪些未检索到。然后最终从数据存储中批量获取丢失的用户源。从那里您可以组合两个列表,如果它不太长,则在内存中对其进行排序。如果您正在使用 Ajaxy 进行某些工作,则可以将排序工作交给客户端。
You can use
memcache.get_multi()
to see which of the user's feeds are already in memcache. Then use aset().difference()
on the original user list vs. the user list found in memcache to find out which weren't retrieved. Then finally fetch the missing user feeds from the datastore in a batch get.From there you can combine the two lists and, if it isn't too long, sort it in memory. If you're working on something Ajaxy, you could hand off sorting to the client.