合并缓存的 GQL 查询而不是使用 IN

发布于 2024-11-01 05:54:46 字数 827 浏览 3 评论 0原文

我正在生成一个合并许多用户评论的提要,因此您的提要可能是 user1+user2+user1000 的评论,而我的可能是 user1+user2。所以我有这样一句话:

some_comments = Comment.gql("WHERE username IN :1",user_list)

我不能只对整个内容进行内存缓存,因为每个人都会有不同的提要,即使 user1 和 user2 的提要对于许多观众来说是常见的。根据文档

...IN 运算符执行一个单独的 每个的底层数据存储查询 列表中的项目。实体 返回的结果是 所有底层证券的叉积 数据存储查询和是 重复数据删除。最多允许 30 个数据存储查询 单个 GQL 查询。

是否有一个库函数可以合并一些排序和缓存的查询,或者我必须这样做:(

for user in user_list
  if memcached(user):
    add it to the results
  else:
    add Comment.gql("WHERE username = :1",user) to the results 
    cache it too
sort the results

在最坏的情况下(没有缓存任何内容)我预计发送 30 个 GQL 查询比一个巨大的 IN 查询要慢。)

I'm generating a feed that merges the comments of many users, so your feed might be of comments by user1+user2+user1000 whereas mine might be user1+user2. So I have the line:

some_comments = Comment.gql("WHERE username IN :1",user_list)

I can't just memcache the whole thing since everyone will have different feeds, even if the feeds for user1 and user2 would be common to many viewers. According to the documentation:

...the IN operator executes a separate
underlying datastore query for every
item in the list. The entities
returned are a result of the
cross-product of all the underlying
datastore queries and are
de-duplicated. A maximum of 30 datastore queries are allowed for any
single GQL query.

Is there a library function to merge some sorted and cached queries, or am I going to have to:

for user in user_list
  if memcached(user):
    add it to the results
  else:
    add Comment.gql("WHERE username = :1",user) to the results 
    cache it too
sort the results

(In the worst case (nothing is cached) I expect sending 30 GQL queries off is slower than one giant IN query.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

长不大的小祸害 2024-11-08 05:54:46

没有任何内置功能可以执行此操作,但您可以自己执行此操作,但有一点需要注意:如果您执行 in 查询并返回 30 个结果,这些将是根据您的排序最低的 30 条记录。所有子查询的排序标准。但是,如果您想从缓存的各个查询中组装结果集,那么您要么必须为每个用户缓存与总结果集一样多的结果(例如 30),然后丢弃大部分结果,要么您'我们将不得不为每个用户存储更少的结果,并接受有时您会丢弃一个用户的较新结果,转而使用另一个用户的较旧结果。

也就是说,您可以这样做:

  1. 执行一个 memcache.get_multi 来检索所有用户的缓存结果集
  2. 对于每个没有缓存结果集的用户,执行单独的查询,获取您需要的最多结果。使用memcache.set_multi缓存结果集。
  3. 对所有结果集进行合并连接,并将前 n 个结果作为最终结果集。因为用户名可能不是列表字段(例如,每个评论都有一个作者),所以您无需担心重复。

目前,in 查询是串行执行的,因此即使没有缓存任何结果,此方法也不会比执行 in 查询慢。不过,这种情况将来可能会改变。如果您现在想提高性能,您可能需要使用 Guido 的 NDB 项目,它将允许您并行执行所有子查询。

There's nothing built-in to do this, but you can do it yourself, with one caveat: If you do an in query and return 30 results, these will be the 30 records that sort lowest according to your sort criteria across all the subqueries. If you want to assemble the resultset from cached individual queries, though, either you are going to have to cache as many results for each user as the total result set (eg, 30), and throw away most of those results, or you're going to have to store fewer results per user, and accept that sometimes you'll throw away newer results from one user in favor of older results from another.

That said, here's how you can do this:

  1. Do a memcache.get_multi to retrieve cached result sets for all the users
  2. For each user that doesn't have a result set cached, execute the individual query, fetching the most results you need. Use memcache.set_multi to cache the result sets.
  3. Do a merge-join on all the result sets and take the top n results as your final result set. Because username is presumably not a list field (eg, every comment has a single author), you don't need to worry about duplicates.

Currently, in queries are executed serially, so this approach won't be any slower than executing an in query, even when none of the results are cached. This may change in future, though. If you want to improve performance now, you'll probably want to use Guido's NDB project, which will allow you to execute all the subqueries in parallel.

他不在意 2024-11-08 05:54:46

您可以使用 memcache.get_multi() 查看哪些用户的 Feed 已在 memcache 中。然后对原始用户列表与 memcache 中找到的用户列表使用 set().difference() 来找出哪些未检索到。然后最终从数据存储中批量获取丢失的用户源。

从那里您可以组合两个列表,如果它不太长,则在内存中对其进行排序。如果您正在使用 Ajaxy 进行某些工作,则可以将排序工作交给客户端。

You can use memcache.get_multi() to see which of the user's feeds are already in memcache. Then use a set().difference() on the original user list vs. the user list found in memcache to find out which weren't retrieved. Then finally fetch the missing user feeds from the datastore in a batch get.

From there you can combine the two lists and, if it isn't too long, sort it in memory. If you're working on something Ajaxy, you could hand off sorting to the client.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文