寻找想法/替代方案来提供与 GAE 数据存储查询匹配的页面/项目计数/项目导航

发布于 2025-01-07 12:54:12 字数 1432 浏览 0 评论 0原文

我喜欢数据存储的简单性、可扩展性和易用性;新的 ndb 库中的增强功能非常出色。

据我了解数据存储最佳实践,当与查询匹配的项目数量很大时,不应编写代码来提供匹配查询结果的项目和/或页数;因为执行此操作的唯一方法是检索所有结果,这是资源密集型的。

然而,在许多应用程序(包括我们的应用程序)中,人们普遍希望看到匹配项目的计数并为用户提供导航到这些结果的特定页面的能力。由于需要解决 对大型数据集进行分页。为了支持推荐的方法,数据必须包含一个具有唯一值的列,该列可以按照结果显示的方式进行排序。该列将为每页结果定义一个起始值;保存它后,我们可以有效地获取相应的页面,从而可以根据请求导航到特定页面或下一页。因此,如果您想显示以多种方式排序的结果,则可能需要维护多个此类列。

需要注意的是,从 SDK v1.3.1 开始,查询游标 是执行数据存储分页的推荐方法。 它们有一些限制,包括缺乏对 IN 和 != 筛选运算符的支持。目前,我们的一些重要查询使用 IN,但我们将尝试使用 OR 来编写它们,以便与查询游标一起使用。

按照建议的指南,可以为用户提供(下一页)(上一页)导航按钮,以及导航过程中的特定页面按钮。例如,如果用户按(下一页) 3 次,应用程序可以显示以下按钮,记住每个按钮的唯一起始记录或光标以保持导航效率:(上一页)(页面-1) (第2页) (第3页) (第4页) (下一页)

有些人建议单独跟踪计数,但当允许用户查询一组丰富的字段(这些字段会改变返回的结果)时,这种方法并不实用。

我正在寻找有关这些一般问题以及具体以下问题的见解:

  1. 您在数据存储应用中提供哪些查询结果导航选项来解决这些限制?

  2. 如果为用户提供高效的结果计数和页面导航 整个查询结果集的优先级,应该使用数据存储 被放弃,转而使用现在提供的 GAE MySql 解决方案

  3. 大表架构是否有任何即将发生的变化或 数据存储实现将提供额外的功能 有效地计算查询结果?

非常感谢您的帮助。

I like the datastore simplicity, scalability and ease of use; and the enhancements found in the new ndb library are fabulous.

As I understand datastore best practices, one should not write code to provide item and/or page counts of matching query results when the number of items that match a query is large; because the only way to do this is to retrieve all the results which is resource intensive.

However, in many applications, including ours, it is a common desire to see a count of matching items and provide the user with the ability to navigate to a specific page of those results. The datastore paging issue is further complicated by the requirement to work around limitations of fetch(limit, offset=X) as outlined in the article Paging Through Large Datasets. To support the recommended approach, the data must include a uniquely valued column that can be ordered in the way the results are to be displayed. This column will define a starting value for each page of results; saving it, we can fetch the corresponding page efficiently, allowing navigation to a specific or next page as requested. Therefore, if you want to show results ordered in multiple ways, several such columns may need to be maintained.

It should be noted that as of SDK v1.3.1, Query Cursors are the recommended way to do datastore paging. They have some limitations, including lack of support for IN and != filter operators. Currently some of our important queries use IN, but we'll try writing them using OR for use with query cursors.

Following the guidelines suggested, a user could be given a (Next) and (Prev) navigation buttons, as well as specific page buttons as navigation proceeded. For example if the user pressed (Next) 3 times, the app could show the following buttons, remembering the unique starting record or cursor for each to keep the navigation efficient: (Prev) (Page-1) (Page-2) (Page-3) (Page-4) (Next).

Some have suggested keeping track of counts separately, but this approach isn't practical when users will be allowed to query on a rich set of fields that will vary the results returned.

I'm looking for insights on these issues in general and the following questions specifically:

  1. What navigational options of query results do you provide in your datastore apps to work around these limitations?

  2. If providing users with efficient result counts and page navigation
    of the entire query result set is a priority, should use of the datastore
    be abandoned in favor of the GAE MySql solution now being offered.

  3. Are there any upcoming changes in the big table architecture or
    datastore implementation that will provide additional capability for
    counting results of a query efficiently?

Many thanks in advance for your help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

故笙诉离歌 2025-01-14 12:54:12

这完全取决于您通常会得到多少结果。例如,通过向 .count() 传递一个合适的限制,如果 #items 例如 <= 100,则可以提供准确的计数;如果有更多,则可以提供“很多”。听起来您无法预先计算所有可能的计数,但至少您可以缓存它们,从而节省许多数据存储操作。

使用 NDB,最有效的方法可能是使用 fetch_page() 请求实体的第一页,然后使用生成的游标作为 count() 调用的起点;或者,您最好使用其异步设施同时运行第一页的 fetch() 和 count() 。如果您的查询不支持游标,第二个选项可能是您唯一的选择。大多数 IN / OR 查询当前不支持游标,但如果您按 __key__ 排序,它们就会支持。

在UI选项方面,我认为提供下一页和上一页选项就足够了;可以向前跳几页的“Gooooooogle”用户界面很可爱,但我自己几乎从不使用它。 (要实现“上一页”,请颠倒查询的顺序并使用与当前页相同的游标。我很确定这肯定有效。)

It all depends on how many results you typically get. E.g. by passing .count() a suitable limit you can provide an exact count if the #items is e.g. <= 100 and "many" if there are more. It sounds like you cannot pre-compute all possible counts, but at least you could cache them, thereby saving many datastore ops.

Using NDB, the most efficient approach may either be to request the first page of entities using fetch_page(), and then using the resulting cursor as a starting point for a count() call; or alternatively, you may be better off running the fetch() of the first page and the count() concurrently using its async facilities. The second option may be your only choice if your query does not support cursors. Most IN / OR queries don't currently support cursors, but they do if you order by __key__.

In terms of UI options, I think it's sufficient to offer next and previous page options; the "Gooooooogle" UI that affords skipping ahead several pages is cute but I almost never use it myself. (To implement "previous page", reverse the order of the query and use the same cursor you used for the current page. I'm pretty sure this is guaranteed to work.)

人海汹涌 2025-01-14 12:54:12

也许只是针对这种风格的分页:

(first)(Prev)(Page1)(Page2)(Page3)....(Last)(next)

这样就不需要总数 -你只需要你的代码就知道还有足够的结果用于另外 3 个以上的页面。每页页面大小为 10 个项目,您只需要知道有 30 多个项目即可。

如果您有 60 个项目(足够 6 页),当您已经在第 4 页时,您的代码将向前查找并意识到只剩下另外 20 条记录,因此您可以显示最后一个页码:

(first)(Prev )(Page4)(Page5)(Page6)(next)(last)

基本上对于当前页面的每次获取,只需获取另外 3 页数据的足够记录,数一下它们以查看还有多少寻呼你确实有,然后相应地显示您的寻呼机。

另外,如果你只获取钥匙,它会比获取额外的项目更有效。
希望这是有道理的! :)

Maybe just aim for this style of paging:

(first)(Prev)(Page1)(Page2)(Page3)....(Last)(next)

That way the total number is not required - you only need your code to know that there is enough results for another 3+ pages. with page size of 10 items per page, you just need to know there are 30+ items.

If you have 60 items, (enough for 6 pages) when youre already on page 4, your code would look forward and realise there are only another 20 records to go, so you could then show the last page number:

(first)(Prev)(Page4)(Page5)(Page6)(next)(last)

Basically for each fetch for the current page, just fetch enough records for another 3 pages of data, count them to see how many more pages you actully have, then dispaly your pager accordingly.

Also, if you just fetch the keys, it will be more efficient than fetching extra items.
hope that makes some sense!!?? :)

哭了丶谁疼 2025-01-14 12:54:12
  1. 我注意到 gmail 已经准备好一些计数 - 它可以告诉您总共收到了多少封电子邮件,以及您的收件箱中有多少封电子邮件等 - 但在其他计数上,例如全文搜索,它会显示您正在查看“1-20 个”或“1-20 个约 130 个”。您真的需要显示每个查询的计数,还是可以只预先计算重要的查询?
  1. I notice that gmail is ready with some counts - it can tell you how many total emails you've received, and how many are in your inbox, etc - but on other counts, like full-text searches it says you're looking at "1-20 of many" or "1-20 of about 130". Do you really need to display counts for every query, or could you pre-calculate just the important ones?
陌伤ぢ 2025-01-14 12:54:12

由于问题是“寻找提供页面的想法/替代方案”,也许获取 10 页 key_only 项目的非常简单的替代方案,然后在该集合中处理导航是值得考虑的。

我在回答类似问题时对此进行了详细说明,您将在那里找到示例代码:

使用光标向后分页正在工作,但缺少一个项目

示例代码更适合这个问题。这是其中的一部分:

def session_list():
    page = request.args.get('page', 0, type=int)

    sessions_keys = Session.query().order(-Session.time_opened).fetch(100, keys_only=True)
    sessions_keys, paging = generic_list_paging(sessions_keys, page)
    # generic_list_paging will select the proper sublist.
    sessions = [ sk.get() for sk in sessions_keys ]

    return render_template('generic_list.html', objects=sessions, paging=paging)

请参阅引用的问题以获取更多代码。

当然,如果结果集可能很大,则仍然必须对获取进行一些限制,我认为硬性限制是 1000 个项目。显然,如果结果超过 10 页左右,用户将被要求通过添加条件进行细化。

处理数百个 key_only 项目内的分页确实要简单得多,绝对值得考虑。正如问题中提到的,它使得提供直接页面导航变得非常容易。实际的实体项仅针对实际的当前页面获取,其余的只是键,因此成本并不高。您可以考虑将keys_only结果集在memcache中保留几分钟,以便用户快速浏览页面时不需要再次执行相同的查询。

Since the question was "looking for ideas/alternatives to providing a page", maybe the very simple alternative of fetching 10 pages worth of key_only items, then handling navigation through within this set is worth considering.

I have elaborated on this in answering a similar question, you will find sample code there :

Backward pagination with cursor is working but missing an item

The sample code would be more appropriate for this question. Here is a piece of it:

def session_list():
    page = request.args.get('page', 0, type=int)

    sessions_keys = Session.query().order(-Session.time_opened).fetch(100, keys_only=True)
    sessions_keys, paging = generic_list_paging(sessions_keys, page)
    # generic_list_paging will select the proper sublist.
    sessions = [ sk.get() for sk in sessions_keys ]

    return render_template('generic_list.html', objects=sessions, paging=paging)

See the referenced question for more code.

Of course, if the result set is potentially huge, some limit to the fetch must still be given, the hard limit being 1000 items I think. Obviously, it the result is more than some 10 pages long, the user will be asked to refine by adding criteria.

Dealing with paging within a few hundreds of keys_only items is really so much simpler, that it's definitely worth considering. It makes it quite easy to provide direct page navigation as mentionned in the question. The actual entity items are only fetched for the actual current page, the rest is only keys so it's not so costly. And you may consider keeping the keys_only result set in memcache for a few minutes so that a user quickly browsing through pages will not require the same query to be performed again.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文