使用bulkloader从Google App Engine下载数据时出错
我正在尝试使用以下命令从数据存储区下载一些数据 命令:
appcfg.py download_data --config_file=bulkloader.yaml --application=myappname
--kind=mykindname --filename=myappname_mykindname.csv
--url=http://myappname.appspot.com/_ah/remote_api
当我在这种特定类型/表中没有太多数据时,我可以 一次性下载数据 - 偶尔会遇到 以下错误:
.................................[ERROR ] [Thread-11]
ExportProgressThread:
Traceback (most recent call last):
File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 1448, in run
self.PerformWork()
File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2216, in PerformWork
item.key_end)
File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2011, in StoreKeys
(STATE_READ, unicode(kind), unicode(key_start), unicode(key_end)))
OperationalError: unable to open database file
这是我在服务器日志中看到的内容:
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 277, in post
response_data = self.ExecuteRequest(request)
File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 308, in ExecuteRequest
response_data)
File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 86, in MakeSyncCall
return stubmap.MakeSyncCall(service, call, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 286, in MakeSyncCall
rpc.CheckSuccess()
File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_rpc.py", line 126, in CheckSuccess
raise self.exception
ApplicationError: ApplicationError: 4 no matching index found.
当出现该错误时,我只需重新运行下载和其他内容 效果会很好。
最近,我注意到,随着我的同类体型的增大, 下载工具失败的频率要高得多。例如,有一种 大约 3500 个实体,我必须运行该命令 5 次 - 只有最后一次 成功了。有办法解决这个错误吗?以前,我唯一的 担心我无法在脚本中自动下载,因为 偶尔的失败 - 现在我害怕我无法得到我的 数据根本没有出来。
之前讨论过此问题此处 但帖子很旧,我不确定建议的标志的作用 - 因此再次发布我类似的查询。
一些额外的细节。 正如此处所述,我尝试了继续中断下载的建议(在从 App Engine 下载数据部分中)。当我在中断后恢复时,我没有收到任何错误,但下载的行数小于数据存储管理员向我显示的实体计数。这是我收到的消息:
[INFO ] Have 3220 entities, 3220 previously transferred
[INFO ] 3220 entities (1003 bytes) transferred in 2.9 seconds
数据存储管理员告诉我这种特殊类型有 ~4300实体。为什么剩余的实体没有被下载?
谢谢!
I am trying to download some data from the datastore using the following
command:
appcfg.py download_data --config_file=bulkloader.yaml --application=myappname
--kind=mykindname --filename=myappname_mykindname.csv
--url=http://myappname.appspot.com/_ah/remote_api
When I didn't have much data in this particular kind/table I could
download the data in one shot - occasionally running into the
following error:
.................................[ERROR ] [Thread-11]
ExportProgressThread:
Traceback (most recent call last):
File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 1448, in run
self.PerformWork()
File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2216, in PerformWork
item.key_end)
File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2011, in StoreKeys
(STATE_READ, unicode(kind), unicode(key_start), unicode(key_end)))
OperationalError: unable to open database file
This is what I see in the server log:
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 277, in post
response_data = self.ExecuteRequest(request)
File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 308, in ExecuteRequest
response_data)
File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 86, in MakeSyncCall
return stubmap.MakeSyncCall(service, call, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 286, in MakeSyncCall
rpc.CheckSuccess()
File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_rpc.py", line 126, in CheckSuccess
raise self.exception
ApplicationError: ApplicationError: 4 no matching index found.
When that error appeared I would simply re-run the download and things
would work out well.
Of late, I am noticing that as the size of my kind increases, the
download tool fails much more often. For instance, with a kind with
~3500 entities I had to run to the command 5 times - only the last of
which succeeded. Is there a way around this error? Previously, my only
worry was I wouldn't be able to automate downloads in a script because
of the occasional failures - now I am scared I won't be able to get my
data out at all.
This issue was discussed previously here
but the post is old and I am not sure what the suggested flag does -
hence posting my similar query again.
Some additional details.
As mentioned here I tried the suggestion to proceed with interrupted downloads (in the section Downloading Data from App Engine ). When I resume after the interruption, I get no errors, but the number of rows that are downloaded are lesser than the entity count the datastore admin shows me.This is the message I get:
[INFO ] Have 3220 entities, 3220 previously transferred
[INFO ] 3220 entities (1003 bytes) transferred in 2.9 seconds
The datastore admin tells me this particular kind has ~4300 entities. Why aren't the remaining entities getting downloaded?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
仅基于我在第一个错误中看到“unicode”一词的事实,我将对此做出完全没有受过教育的猜测;我遇到了一个与用户从网络生成的数据有关的问题。用户输入了一些 unicode 字符,然后一大堆东西开始损坏——可能是我的错——因为我已经实现了漂亮的 repr 函数和一大堆其他东西。如果可以,请通过实时应用程序中的控制台实用程序快速扫描数据(如果只有 4k 记录),尝试将所有数据转换为 ascii 字符串以查找任何不符合要求的数据。
之后,我开始“清理”用户输入(抱歉,但我的“公共句柄”字段必须是仅限 ascii 的播放器!)
I am going to make a completely uneducated guess at this just based on the fact that I saw the word "unicode" in the first error; I had an issue that was related to my data being user generated from the web. A user put in a few unicode characters and a whole load of stuff started breaking - probably my fault - as I had implemented pretty looking repr functions and a load of other stuff. If you can, take a quick scan of your data via the console utility in your live app, maybe (if it's only 4k records), try converting all of the data to ascii strings to find any that don't conform.
And after that, I started "sanitising" user inputs (sorry, but my "public handle" field needs to be ascii only players!)