AppEngine批量加载器通过设置key_name上传实体
我现在已经花了至少两个小时试图让它发挥作用。我在 SO 以及 Google 群组,但似乎没有一个答案对我有用。
问题:如何将下面的 CSV 文件中的数据批量上传到数据存储区,以创建在 CSV 文件中定义了 key_name 的实体(与使用下面的添加函数的结果相同)。
这是我的模型:
class RegisteredDomain(db.Model):
"""
Domain object class. It has no fields because it's existence is
proof that it has been registered. Indivdual registered domains
can be found using keys.
"""
pass
这是我通常添加/删除域等的方式:
def add(domains):
"""
Add domains. This functions accepts a single domain string or a
list of domain strings and adds them to the database. The domain(s)
must be valid unicode strings (a ValueError is thrown if the domain
strings are not valid.
"""
if not isinstance(domains, list):
domains = [domains]
cleaned_domains = []
for domain in domains:
clean_domain_ = clean_domain(domain)
is_valid_domain(clean_domain_)
cleaned_domains.append(clean_domain_)
domains = cleaned_domains
db.put([RegisteredDomain(key_name=make_key(domain)) for domain in domains])
def get(domains):
"""
Get domains. This function accepts a single domain string or a list
of domain strings and queries the database for them. It returns a
dictionary containing the domain name and RegisteredDomain object or
None if the entity was not found.
"""
if not isinstance(domains, list):
domains = [domains]
entities = db.get([Key.from_path('RegisteredDomain', make_key(domain)) for domain in domains])
return dict(zip(domains, entities))
注意:在上面的代码中 make_key 只是将域变为小写并在前面添加“d”。
所以就是这样。现在我正疯狂地尝试从 CSV 文件上传一些 RegisteredDomain 实体。这是 CSV 文件(请注意,第一个字符“d”在那里,因为键名称可能不以数字开头):
key
dgoogle.com
dgoogle11.com
dfacebook.com
dcool.com
duuuuuuu.com
dsdsdsds.com
dffffooo.com
dgmail.com
我无法自动生成bulkloader yaml 文件,因为应用程序引擎仍然没有更新我的数据存储统计信息(1 天加几个小时)。所以这(以及许多类似的排列)就是我想出的(主要是更改 import_transform 位):
python_preamble:
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.api.datastore
- import: google.appengine.ext.db
- import: utils
- import: bulk_helper
transformers:
- kind: RegisteredDomain
connector: csv
connector_options:
encoding: utf-8
property_map:
- property: __key__
external_name: key
export_transform: bulk_helper.key_to_reverse_str
import_template: transform.create_foreign_key('RegisteredDomain')
现在由于某种原因,当我尝试上传时,它说一切顺利并且 x 实体已被转移等,但没有任何更新在数据存储中(正如我从管理控制台中看到的那样)。这是我上传的方式:
appcfg.py upload_data --application=domain-sandwich --kind=RegisteredDomain --config_file=bulk.yaml --url=http://domain-sandwich.appspot.com/remote_api --filename=data.csv
最后,这是我的数据存储查看器的样子:
注意:我在开发服务器和 appengine 上都执行此操作(无论有效......)。
感谢您的帮助!
I have spent at least two hours now trying to get this to work. I have seen quite a few different questions on SO and in the Google groups, but none of the answers seem to work for me.
Question: How do I bulk upload data as in the CSV file below to the datastore to create entities that have the key_name defined in the CSV file (same result as using the add function below).
This is my model:
class RegisteredDomain(db.Model):
"""
Domain object class. It has no fields because it's existence is
proof that it has been registered. Indivdual registered domains
can be found using keys.
"""
pass
Here is how I usually add/delete domains, etc:
def add(domains):
"""
Add domains. This functions accepts a single domain string or a
list of domain strings and adds them to the database. The domain(s)
must be valid unicode strings (a ValueError is thrown if the domain
strings are not valid.
"""
if not isinstance(domains, list):
domains = [domains]
cleaned_domains = []
for domain in domains:
clean_domain_ = clean_domain(domain)
is_valid_domain(clean_domain_)
cleaned_domains.append(clean_domain_)
domains = cleaned_domains
db.put([RegisteredDomain(key_name=make_key(domain)) for domain in domains])
def get(domains):
"""
Get domains. This function accepts a single domain string or a list
of domain strings and queries the database for them. It returns a
dictionary containing the domain name and RegisteredDomain object or
None if the entity was not found.
"""
if not isinstance(domains, list):
domains = [domains]
entities = db.get([Key.from_path('RegisteredDomain', make_key(domain)) for domain in domains])
return dict(zip(domains, entities))
Note: in the above code make_key simply makes the domain lowercase and prepends a 'd'.
So there's that. Now I am going crazy trying to upload some RegisteredDomain entities from a CSV file. Here is the CSV file (note the first char 'd' is there because of the fact that a key name may not start with a number):
key
dgoogle.com
dgoogle11.com
dfacebook.com
dcool.com
duuuuuuu.com
dsdsdsds.com
dffffooo.com
dgmail.com
I have not been able to auto-generate the bulkloader yaml file because app engine has still not update my datastore stats (1 day plus a few hours). So this (and many similar permutations) is what I have come up with (mostly changing the import_transform bit):
python_preamble:
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.api.datastore
- import: google.appengine.ext.db
- import: utils
- import: bulk_helper
transformers:
- kind: RegisteredDomain
connector: csv
connector_options:
encoding: utf-8
property_map:
- property: __key__
external_name: key
export_transform: bulk_helper.key_to_reverse_str
import_template: transform.create_foreign_key('RegisteredDomain')
Now for some reason when I try to upload its says that all goes fine and x entities have been transferred etc, but nothing gets updated in the datastore (as I can see from the admin console). Here is how I upload:
appcfg.py upload_data --application=domain-sandwich --kind=RegisteredDomain --config_file=bulk.yaml --url=http://domain-sandwich.appspot.com/remote_api --filename=data.csv
And finally this is what my datastore viewer looks like:
Note: I am doing this both on the dev-server and on appengine (whatever works...).
Thanks for any help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题是 appengine 批量加载器(或数据存储区 API)中的错误。我发布了一些有关此问题的问题(问题 1, 问题 2,问题 3,问题 4),但以下是bulkloader 错误的文本以供将来参考:
bulkloader 不会导入没有属性的模型。示例:
在应用程序中,您可以像这样使用这些实体:
现在,当我尝试使用批量加载器导入数据时出现问题。查看bulkloader代码后,发现错误出现在EncodeContent方法中(第1400-1406行):
因为(也将为此发布一个问题)数据存储实体对象子类字典而不覆盖非零 或 len 方法,一个不包含任何属性但有键的实体,将不会为 True(即使当键已设置),因此不会附加到实体。
这是一个差异,可以在批量加载器中修复此问题,或者通过覆盖实体中的非零(两者都有效):
已发布的错误报告:
The problem is a bug in the appengine bulkloader (or datastore API). I posted a few issue about this problem (issue 1, issue 2, issue 3, issue 4), but here is the text for the bulkloader error for future reference:
The bulkloader will not import models without properties. Example:
In an application you can use these entities like this:
Now the problem occurred when I tried to import data using the bulkloader. After looking through the bulkloader code, the bug turned out to be in the EncodeContent method (lines 1400-1406):
Because (will also post an issue for this one) the datastore Entity object subclasses dict without overriding the nonzero or len methods an Entity that does not contain any properties, but does have a key, will not be True (makes "if not entity" true even whena key has been set) and will thus not be appended to entities.
Here is a diff that fixes this in the bulkloader OR by overriding nonzero in Entity (either one works):
Posted bug reports: