对于 Google App Engine (java)，如何在 FetchOptions 中设置和使用块大小？

发布于 2024-12-01 20:58:38 字数 1918 浏览 3 评论 0原文

我正在运行一个查询，它当前返回 1400 个结果，因此我在日志文件中收到以下警告：

com.google.appengine.api.datastore.QueryResultsSourceImpl logChunkSizeWarning：此查询没有设置块大小 FetchOptions 并已返回 1000 多个结果。如果结果集为此大小对于此查询很常见，请考虑将块大小设置为提高性能。

我在任何地方都找不到关于如何实际实现这一点的任何示例，这里有一个关于 python 的问题，但由于我使用 java 并且不理解 python，所以我正在努力翻译它。

另外，这个查询（如下）需要 17226cpu_ms 来执行，这感觉太长了，我什至无法想象如果我说 5000 个联系人并且需要在客户端搜索它们会发生什么（就像您对 googlemail 联系人所做的那样！））

我的代码是：

    int index=0;
    int numcontacts=0;
    String[][] DetailList;

    PersistenceManager pm = PMF.get().getPersistenceManager();


    try {
        Query query = pm.newQuery(Contact.class, "AdminID == AID");
        query.declareParameters("Long AID");
        query.setOrdering("Name asc");
        List<Contact> Contacts = (List<Contact>) query.execute(AdminID);
        numcontacts=Contacts.size();
        DetailList=new String[numcontacts][5];

        for (Contact contact : Contacts) 
        {
            DetailList[index][0]=contact.getID().toString();
            DetailList[index][1]=Encode.EncodeString(contact.getName());
            index++;
        }
    } finally {
        pm.close();
    }
    return (DetailList);

我在这里找到了以下两个条目：

，但实际上都没有涉及有关如何实现或使用这些选项的任何细节。我猜它是一个服务器端进程，我猜你应该设置某种循环来一次抓取一个块，但我实际上该怎么做呢？

我是否在循环内调用查询？
我怎么知道要循环多少次？
我是否只检查返回的第一个块的条目数小于块大小？

如果没有实际的例子可供参考，我该如何去寻找这样的东西呢？在我看来，这里的其他人似乎“只知道”怎么做......！

抱歉，如果我没有以正确的方式提出问题，或者我只是一个对此不太了解的新手，但我不知道还能去哪里解决这个问题！

原文

Im running a query and it is currently returning 1400 results and because of this I am getting the following warning in the log file:

com.google.appengine.api.datastore.QueryResultsSourceImpl
logChunkSizeWarning: This query does not have a chunk size set in
FetchOptions and has returned over 1000 results. If result sets of
this size are common for this query, consider setting a chunk size to
improve performance.

I can't find any examples anywhere as to how to actually implement this, there is a question on here about python, but as Im using java and dont understand python, I am struggling to translate it.

Also this query (below) is taking 17226cpu_ms to execute, which feels like way too long, I cant even imagine what would happen if I had say 5000 contacts and needed to search through them on the client side (like you do with googlemail contacts!)

The code I have is:

    int index=0;
    int numcontacts=0;
    String[][] DetailList;

    PersistenceManager pm = PMF.get().getPersistenceManager();


    try {
        Query query = pm.newQuery(Contact.class, "AdminID == AID");
        query.declareParameters("Long AID");
        query.setOrdering("Name asc");
        List<Contact> Contacts = (List<Contact>) query.execute(AdminID);
        numcontacts=Contacts.size();
        DetailList=new String[numcontacts][5];

        for (Contact contact : Contacts) 
        {
            DetailList[index][0]=contact.getID().toString();
            DetailList[index][1]=Encode.EncodeString(contact.getName());
            index++;
        }
    } finally {
        pm.close();
    }
    return (DetailList);

I found the following two entries on here:

but neither actually goes into any details about how to implement or use these options.
Im guessing its a server side process, and Im guessing that you are meant to setup some kind of loop to grab the chunks one chunk at a time, but how do I actually do that?

Do I call the query inside a loop?
How do I know how many times to loop?
Do I just check for the first chunk that comes back with less than the chunk size number of entries?

How am I meant to go about finding out stuff like this without an actual example to follow?
It seems to me that other people on here seem to "just know" how to do it..!

Sorry If I am not asking the questions in the right way or I'm just being a dim newbie about this, but I dont know where else to turn to figure this out!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏至、离别 2024-12-08 20:58:38

遇到同样的问题，最后一条评论是一个月前的，所以这是我发现的关于繁重数据集查询的内容。

我想在阅读谷歌文档中的这些行后我会使用“查询光标”技术文章< /a> （顺便提到的 python 中的那个）：

本文是针对 SDK 版本 1.1.7 编写的。自版本 1.3.1 起，
查询游标 (Java | Python) 已取代所述技术
下面是现在推荐的大分页方法
数据集。

在有关“查询游标”的 Google 文档中。
文档的第一行准确地给出了为什么需要 cursor ：

查询游标允许应用程序执行查询并检索一批
结果，然后在 a 中获取同一查询的其他结果
后续网络请求，无需查询偏移量的开销。

该文档还提供了使用游标技术的 servlet 的 java 示例。有一个提示如何为客户端生成安全游标。最后，光标的局限性暴露出来。

希望这能为您解决问题提供线索。

关于范围和偏移的小提醒，如果忘记的话会对性能产生很大影响（我就是这么做的^^）：

起始偏移量对性能有影响：数据存储
必须在开始之前检索并丢弃所有结果
抵消。例如，范围为 5、10 的查询会获取 10 个结果
从数据存储中，然后丢弃前五个并返回
剩余 5 个提交给应用程序。

编辑：在使用 JDO 时，我一直在寻找一种方法，允许我以前的代码在单个查询中加载 1000 多个结果。因此，如果您也使用 JDO，我发现了这个旧的问题：

Query query = pm.newQuery(...);
// I would use of value below 1000 (gae limit) 
query.getFetchPlan().setFetchSize(numberOfRecordByFetch);

Meeting the same problem and the last comment was from a month ago, so here is what I found out about heavy dataset query.

I guess I'm gonna use the "Query cursor" technique after reading those lines in the google docs article (the one in python mentioned by the way) :

This article was written for SDK version 1.1.7. As of release 1.3.1,
query cursors (Java | Python) have superseded the techniques described
below and are now the recommended method for paging through large
datasets.

In the google docs about "Query Cursor".
The first line of the doc gives precisely why the need for cursor :

Query cursors allow an app to perform a query and retrieve a batch of
results, then fetch additional results for the same query in a
subsequent web request without the overhead of a query offset.

The documentation provides also a java example of a servlet using the cursor technique. There is a tip how to generate a safe cursor for the client. Finally, limitations of cursor are exposed.

Hope this gives you a lead to resolve your problem.

Small reminder about range and offset, quite impacting on performance if forgotten (and I did^^) :

The starting offset has implications for performance: the Datastore
must retrieve and then discard all results prior to the starting
offset. For example, a query with a range of 5, 10 fetches ten results
from the Datastore, then discards the first five and returns the
remaining five to the application.

Edit : As working with JDO, I kept looking for a way to allow my previous code to load more than 1000 result in a single query. So, if you're using JDO too, I found this old issue:

Query query = pm.newQuery(...);
// I would use of value below 1000 (gae limit) 
query.getFetchPlan().setFetchSize(numberOfRecordByFetch);

回复收藏 0 原文

清晨说晚安 2024-12-08 20:58:38

这就是我应用FetchOptions的方式，与您的示例代码相比，您可能需要稍微调整：

// ..... build the Query object
FetchOptions fetch_options =
    FetchOptions.Builder.withPrefetchSize(100).chunkSize(100);
QueryResultList<Entity> returned_entities =
    datastore_service_instance.prepare(query).asQueryResultList(fetch_options);

当然，数字可能会改变（100）。

如果我的答案不是您想要的，那么欢迎您重新表述您的问题（编辑）。

顺便说一句，我是写第一个链接问题的人。

This is how I apply FetchOptions, compared to your example code, you might need to tweak a bit:

// ..... build the Query object
FetchOptions fetch_options =
    FetchOptions.Builder.withPrefetchSize(100).chunkSize(100);
QueryResultList<Entity> returned_entities =
    datastore_service_instance.prepare(query).asQueryResultList(fetch_options);

Of course that the figures may be changed (100).

If my answer isn't what you're looking for then you're welcome to rephrase your question (edit).

By the way I'm the one who wrote the first linked question.

回复收藏 0 原文

愁杀 2024-12-08 20:58:38

如果您直接使用数据存储，没有 JDO，那么您将在迭代数据时执行如下操作来设置块大小：

Query query = new Query("entityname");
PreparedQuery preparedQuery = dataStore.prepare(query);
// the 200 should be less than 1000
FetchOptions options = FetchOptions.Builder.withChunkSize(200);
for (Entity result : preparedQuery.asIterable(options)) {
    ...
}

If you are using the dataStore directly, without JDO, then you would do something like the following to set the chunk-size when you are iterating through the data:

Query query = new Query("entityname");
PreparedQuery preparedQuery = dataStore.prepare(query);
// the 200 should be less than 1000
FetchOptions options = FetchOptions.Builder.withChunkSize(200);
for (Entity result : preparedQuery.asIterable(options)) {
    ...
}

回复收藏 0 原文

~没有更多了~