使用bulkloader从Google App Engine下载数据时出错

发布于 2024-10-21 10:19:33 字数 2832 浏览 6 评论 0原文

我正在尝试使用以下命令从数据存储区下载一些数据命令：

appcfg.py download_data --config_file=bulkloader.yaml --application=myappname 
                        --kind=mykindname --filename=myappname_mykindname.csv
                        --url=http://myappname.appspot.com/_ah/remote_api

当我在这种特定类型/表中没有太多数据时，我可以一次性下载数据 - 偶尔会遇到以下错误：

.................................[ERROR   ] [Thread-11]
ExportProgressThread:
Traceback (most recent call last):
  File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 1448, in run
    self.PerformWork()
  File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2216, in PerformWork
    item.key_end)
  File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2011, in StoreKeys
    (STATE_READ, unicode(kind), unicode(key_start), unicode(key_end)))
OperationalError: unable to open database file

这是我在服务器日志中看到的内容：

Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 277, in post
    response_data = self.ExecuteRequest(request)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 308, in ExecuteRequest
    response_data)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 86, in MakeSyncCall
    return stubmap.MakeSyncCall(service, call, request, response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 286, in MakeSyncCall
    rpc.CheckSuccess()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_rpc.py", line 126, in CheckSuccess
    raise self.exception
ApplicationError: ApplicationError: 4 no matching index found.

当出现该错误时，我只需重新运行下载和其他内容效果会很好。

最近，我注意到，随着我的同类体型的增大，下载工具失败的频率要高得多。例如，有一种大约 3500 个实体，我必须运行该命令 5 次 - 只有最后一次成功了。有办法解决这个错误吗？以前，我唯一的担心我无法在脚本中自动下载，因为偶尔的失败 - 现在我害怕我无法得到我的数据根本没有出来。

之前讨论过此问题此处但帖子很旧，我不确定建议的标志的作用 - 因此再次发布我类似的查询。

一些额外的细节。正如此处所述，我尝试了继续中断下载的建议（在从 App Engine 下载数据部分中）。当我在中断后恢复时，我没有收到任何错误，但下载的行数小于数据存储管理员向我显示的实体计数。这是我收到的消息：

[INFO    ] Have 3220 entities, 3220 previously transferred
[INFO    ] 3220 entities (1003 bytes) transferred in 2.9 seconds

数据存储管理员告诉我这种特殊类型有 ~4300实体。为什么剩余的实体没有被下载？

谢谢！

原文

I am trying to download some data from the datastore using the following
command:

appcfg.py download_data --config_file=bulkloader.yaml --application=myappname 
                        --kind=mykindname --filename=myappname_mykindname.csv
                        --url=http://myappname.appspot.com/_ah/remote_api

When I didn't have much data in this particular kind/table I could
download the data in one shot - occasionally running into the
following error:

.................................[ERROR   ] [Thread-11]
ExportProgressThread:
Traceback (most recent call last):
  File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 1448, in run
    self.PerformWork()
  File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2216, in PerformWork
    item.key_end)
  File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2011, in StoreKeys
    (STATE_READ, unicode(kind), unicode(key_start), unicode(key_end)))
OperationalError: unable to open database file

This is what I see in the server log:

Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 277, in post
    response_data = self.ExecuteRequest(request)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 308, in ExecuteRequest
    response_data)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 86, in MakeSyncCall
    return stubmap.MakeSyncCall(service, call, request, response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 286, in MakeSyncCall
    rpc.CheckSuccess()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_rpc.py", line 126, in CheckSuccess
    raise self.exception
ApplicationError: ApplicationError: 4 no matching index found.

When that error appeared I would simply re-run the download and things
would work out well.

Of late, I am noticing that as the size of my kind increases, the
download tool fails much more often. For instance, with a kind with
~3500 entities I had to run to the command 5 times - only the last of
which succeeded. Is there a way around this error? Previously, my only
worry was I wouldn't be able to automate downloads in a script because
of the occasional failures - now I am scared I won't be able to get my
data out at all.

This issue was discussed previously here
but the post is old and I am not sure what the suggested flag does -
hence posting my similar query again.

Some additional details.
As mentioned here I tried the suggestion to proceed with interrupted downloads (in the section Downloading Data from App Engine ). When I resume after the interruption, I get no errors, but the number of rows that are downloaded are lesser than the entity count the datastore admin shows me.This is the message I get:

[INFO    ] Have 3220 entities, 3220 previously transferred
[INFO    ] 3220 entities (1003 bytes) transferred in 2.9 seconds

The datastore admin tells me this particular kind has ~4300 entities. Why aren't the remaining entities getting downloaded?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

胡渣熟男 2024-10-28 10:19:33

仅基于我在第一个错误中看到“unicode”一词的事实，我将对此做出完全没有受过教育的猜测；我遇到了一个与用户从网络生成的数据有关的问题。用户输入了一些 unicode 字符，然后一大堆东西开始损坏——可能是我的错——因为我已经实现了漂亮的 repr 函数和一大堆其他东西。如果可以，请通过实时应用程序中的控制台实用程序快速扫描数据（如果只有 4k 记录），尝试将所有数据转换为 ascii 字符串以查找任何不符合要求的数据。

之后，我开始“清理”用户输入（抱歉，但我的“公共句柄”字段必须是仅限 ascii 的播放器！）

回复收藏 0 原文

~没有更多了~