AppEngine批量加载器通过设置key_name上传实体

发布于 2024-12-01 10:40:05 字数 3066 浏览 0 评论 0原文

我现在已经花了至少两个小时试图让它发挥作用。我在 SO 以及 Google 群组,但似乎没有一个答案对我有用。

问题:如何将下面的 CSV 文件中的数据批量上传到数据存储区,以创建在 CSV 文件中定义了 key_name 的实体(与使用下面的添加函数的结果相同)。

这是我的模型:

class RegisteredDomain(db.Model):
    """
    Domain object class. It has no fields because it's existence is
    proof that it has been registered. Indivdual registered domains
    can be found using keys.
    """
    pass

这是我通常添加/删除域等的方式:

def add(domains):
    """
    Add domains. This functions accepts a single domain string or a
    list of domain strings and adds them to the database. The domain(s)
    must be valid unicode strings (a ValueError is thrown if the domain
    strings are not valid.
    """
    if not isinstance(domains, list):
        domains = [domains]

    cleaned_domains = []
    for domain in domains:
        clean_domain_ = clean_domain(domain)
        is_valid_domain(clean_domain_)
        cleaned_domains.append(clean_domain_)

    domains = cleaned_domains

    db.put([RegisteredDomain(key_name=make_key(domain)) for domain in domains])


def get(domains):
    """
    Get domains. This function accepts a single domain string or a list
    of domain strings and queries the database for them. It returns a
    dictionary containing the domain name and RegisteredDomain object or
    None if the entity was not found.
    """
    if not isinstance(domains, list):
        domains = [domains]

    entities = db.get([Key.from_path('RegisteredDomain', make_key(domain)) for domain in domains])
    return dict(zip(domains, entities))

注意:在上面的代码中 make_key 只是将域变为小写并在前面添加“d”。

所以就是这样。现在我正疯狂地尝试从 CSV 文件上传一些 RegisteredDomain 实体。这是 CSV 文件(请注意,第一个字符“d”在那里,因为键名称可能不以数字开头):

key
dgoogle.com
dgoogle11.com
dfacebook.com
dcool.com
duuuuuuu.com
dsdsdsds.com
dffffooo.com
dgmail.com

我无法自动生成bulkloader yaml 文件,因为应用程序引擎仍然没有更新我的数据存储统计信息(1 天加几个小时)。所以这(以及许多类似的排列)就是我想出的(主要是更改 import_transform 位):

python_preamble:
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.api.datastore
- import: google.appengine.ext.db
- import: utils
- import: bulk_helper

transformers:
- kind: RegisteredDomain
  connector: csv
  connector_options:
    encoding: utf-8
  property_map:
    - property: __key__
      external_name: key
      export_transform: bulk_helper.key_to_reverse_str
      import_template: transform.create_foreign_key('RegisteredDomain')

现在由于某种原因,当我尝试上传时,它说一切顺利并且 x 实体已被转移等,但没有任何更新在数据存储中(正如我从管理控制台中看到的那样)。这是我上传的方式:

appcfg.py upload_data --application=domain-sandwich --kind=RegisteredDomain --config_file=bulk.yaml --url=http://domain-sandwich.appspot.com/remote_api --filename=data.csv 

最后,这是我的数据存储查看器的样子: Datastore Viewer

注意:我在开发服务器和 appengine 上都执行此操作(无论有效......)。

感谢您的帮助!

I have spent at least two hours now trying to get this to work. I have seen quite a few different questions on SO and in the Google groups, but none of the answers seem to work for me.

Question: How do I bulk upload data as in the CSV file below to the datastore to create entities that have the key_name defined in the CSV file (same result as using the add function below).

This is my model:

class RegisteredDomain(db.Model):
    """
    Domain object class. It has no fields because it's existence is
    proof that it has been registered. Indivdual registered domains
    can be found using keys.
    """
    pass

Here is how I usually add/delete domains, etc:

def add(domains):
    """
    Add domains. This functions accepts a single domain string or a
    list of domain strings and adds them to the database. The domain(s)
    must be valid unicode strings (a ValueError is thrown if the domain
    strings are not valid.
    """
    if not isinstance(domains, list):
        domains = [domains]

    cleaned_domains = []
    for domain in domains:
        clean_domain_ = clean_domain(domain)
        is_valid_domain(clean_domain_)
        cleaned_domains.append(clean_domain_)

    domains = cleaned_domains

    db.put([RegisteredDomain(key_name=make_key(domain)) for domain in domains])


def get(domains):
    """
    Get domains. This function accepts a single domain string or a list
    of domain strings and queries the database for them. It returns a
    dictionary containing the domain name and RegisteredDomain object or
    None if the entity was not found.
    """
    if not isinstance(domains, list):
        domains = [domains]

    entities = db.get([Key.from_path('RegisteredDomain', make_key(domain)) for domain in domains])
    return dict(zip(domains, entities))

Note: in the above code make_key simply makes the domain lowercase and prepends a 'd'.

So there's that. Now I am going crazy trying to upload some RegisteredDomain entities from a CSV file. Here is the CSV file (note the first char 'd' is there because of the fact that a key name may not start with a number):

key
dgoogle.com
dgoogle11.com
dfacebook.com
dcool.com
duuuuuuu.com
dsdsdsds.com
dffffooo.com
dgmail.com

I have not been able to auto-generate the bulkloader yaml file because app engine has still not update my datastore stats (1 day plus a few hours). So this (and many similar permutations) is what I have come up with (mostly changing the import_transform bit):

python_preamble:
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.api.datastore
- import: google.appengine.ext.db
- import: utils
- import: bulk_helper

transformers:
- kind: RegisteredDomain
  connector: csv
  connector_options:
    encoding: utf-8
  property_map:
    - property: __key__
      external_name: key
      export_transform: bulk_helper.key_to_reverse_str
      import_template: transform.create_foreign_key('RegisteredDomain')

Now for some reason when I try to upload its says that all goes fine and x entities have been transferred etc, but nothing gets updated in the datastore (as I can see from the admin console). Here is how I upload:

appcfg.py upload_data --application=domain-sandwich --kind=RegisteredDomain --config_file=bulk.yaml --url=http://domain-sandwich.appspot.com/remote_api --filename=data.csv 

And finally this is what my datastore viewer looks like:
Datastore Viewer

Note: I am doing this both on the dev-server and on appengine (whatever works...).

Thanks for any help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

难如初 2024-12-08 10:40:06

问题是 appengine 批量加载器(或数据存储区 API)中的错误。我发布了一些有关此问题的问题(问题 1问题 2问题 3问题 4),但以下是bulkloader 错误的文本以供将来参考:

VERSION:
release: "1.5.2"
timestamp: 1308730906
api_versions: ['1']

bulkloader 不会导入没有属性的模型。示例:

class MetaObject(db.Model):
    """
    Property-less object. Identified by application set key.
    """
    pass

在应用程序中,您可以像这样使用这些实体:

db.put([MetaObject(key_name=make_key(obj)) for obj in objs])
db.get([Key.from_path('MetaObject', make_key(obj)) for obj in objs])
db.delete([Key.from_path('MetaObject', make_key(obj)) for obj in objs])

现在,当我尝试使用批量加载器导入数据时出现问题。查看bulkloader代码后,发现错误出现在EncodeContent方法中(第1400-1406行):

1365   def EncodeContent(self, rows, loader=None):
1366     """Encodes row data to the wire format.
1367
1368     Args:
1369       rows: A list of pairs of a line number and a list of column values.
1370       loader: Used for dependency injection.
1371
1372     Returns:
1373       A list of datastore.Entity instances.
1374
1375     Raises:
1376       ConfigurationError: if no loader is defined for self.kind
1377     """
1378     if not loader:
1379       try:
1380         loader = Loader.RegisteredLoader(self.kind)
1381       except KeyError:
1382         logger.error('No Loader defined for kind %s.' % self.kind)
1383         raise ConfigurationError('No Loader defined for kind %s.' % self.kind)
1384     entities = []
1385     for line_number, values in rows:
1386       key = loader.generate_key(line_number, values)
1387       if isinstance(key, datastore.Key):
1388         parent = key.parent()
1389         key = key.name()
1390       else:
1391         parent = None
1392       entity = loader.create_entity(values, key_name=key, parent=parent)
1393
1394       def ToEntity(entity):
1395         if isinstance(entity, db.Model):
1396           return entity._populate_entity()
1397         else:
1398           return entity
1399
1400       if not entity:
1401
1402         continue
1403       if isinstance(entity, list):
1404         entities.extend(map(ToEntity, entity))
1405       elif entity:
1406         entities.append(ToEntity(entity))
1407
1408     return entities

因为(也将为此发布一个问题)数据存储实体对象子类字典而不覆盖非零 或 len 方法,一个不包含任何属性但有键的实体,将不会 True(即使当键已设置),因此不会附加到实体。

这是一个差异,可以在批量加载器中修复此问题,或者通过覆盖实体中的非零(两者都有效):

--- bulkloader.py       2011-08-27 18:21:36.000000000 +0200
+++ bulkloader_fixed.py 2011-08-27 18:22:48.000000000 +0200
@@ -1397,12 +1397,9 @@
         else:
           return entity

-      if not entity:
-
-        continue
       if isinstance(entity, list):
         entities.extend(map(ToEntity, entity))
-      elif entity:
+      else:
         entities.append(ToEntity(entity))

     return entities
--- datastore.py        2011-08-27 18:41:16.000000000 +0200
+++ datastore_fixed.py  2011-08-27 18:40:50.000000000 +0200
@@ -644,6 +644,12 @@

     self.__key = Key._FromPb(ref)

+  def __nonzero__(self):
+      if len(self):
+          return True
+      if self.__key:
+          return True
+
   def app(self):
     """Returns the name of the application that created this entity, a
     string or None if not set.

已发布的错误报告:

问题 1:http://code.google.com/ p/googleappengine/issues/detail?id=5712

问题 2:http://code.google.com/ p/googleappengine/issues/detail?id=5713

问题 3:http://code.google.com/ p/googleappengine/issues/detail?id=5714

问题 4:http://code.google.com/ p/googleappengine/issues/detail?id=5715

The problem is a bug in the appengine bulkloader (or datastore API). I posted a few issue about this problem (issue 1, issue 2, issue 3, issue 4), but here is the text for the bulkloader error for future reference:

VERSION:
release: "1.5.2"
timestamp: 1308730906
api_versions: ['1']

The bulkloader will not import models without properties. Example:

class MetaObject(db.Model):
    """
    Property-less object. Identified by application set key.
    """
    pass

In an application you can use these entities like this:

db.put([MetaObject(key_name=make_key(obj)) for obj in objs])
db.get([Key.from_path('MetaObject', make_key(obj)) for obj in objs])
db.delete([Key.from_path('MetaObject', make_key(obj)) for obj in objs])

Now the problem occurred when I tried to import data using the bulkloader. After looking through the bulkloader code, the bug turned out to be in the EncodeContent method (lines 1400-1406):

1365   def EncodeContent(self, rows, loader=None):
1366     """Encodes row data to the wire format.
1367
1368     Args:
1369       rows: A list of pairs of a line number and a list of column values.
1370       loader: Used for dependency injection.
1371
1372     Returns:
1373       A list of datastore.Entity instances.
1374
1375     Raises:
1376       ConfigurationError: if no loader is defined for self.kind
1377     """
1378     if not loader:
1379       try:
1380         loader = Loader.RegisteredLoader(self.kind)
1381       except KeyError:
1382         logger.error('No Loader defined for kind %s.' % self.kind)
1383         raise ConfigurationError('No Loader defined for kind %s.' % self.kind)
1384     entities = []
1385     for line_number, values in rows:
1386       key = loader.generate_key(line_number, values)
1387       if isinstance(key, datastore.Key):
1388         parent = key.parent()
1389         key = key.name()
1390       else:
1391         parent = None
1392       entity = loader.create_entity(values, key_name=key, parent=parent)
1393
1394       def ToEntity(entity):
1395         if isinstance(entity, db.Model):
1396           return entity._populate_entity()
1397         else:
1398           return entity
1399
1400       if not entity:
1401
1402         continue
1403       if isinstance(entity, list):
1404         entities.extend(map(ToEntity, entity))
1405       elif entity:
1406         entities.append(ToEntity(entity))
1407
1408     return entities

Because (will also post an issue for this one) the datastore Entity object subclasses dict without overriding the nonzero or len methods an Entity that does not contain any properties, but does have a key, will not be True (makes "if not entity" true even whena key has been set) and will thus not be appended to entities.

Here is a diff that fixes this in the bulkloader OR by overriding nonzero in Entity (either one works):

--- bulkloader.py       2011-08-27 18:21:36.000000000 +0200
+++ bulkloader_fixed.py 2011-08-27 18:22:48.000000000 +0200
@@ -1397,12 +1397,9 @@
         else:
           return entity

-      if not entity:
-
-        continue
       if isinstance(entity, list):
         entities.extend(map(ToEntity, entity))
-      elif entity:
+      else:
         entities.append(ToEntity(entity))

     return entities
--- datastore.py        2011-08-27 18:41:16.000000000 +0200
+++ datastore_fixed.py  2011-08-27 18:40:50.000000000 +0200
@@ -644,6 +644,12 @@

     self.__key = Key._FromPb(ref)

+  def __nonzero__(self):
+      if len(self):
+          return True
+      if self.__key:
+          return True
+
   def app(self):
     """Returns the name of the application that created this entity, a
     string or None if not set.

Posted bug reports:

Issue 1: http://code.google.com/p/googleappengine/issues/detail?id=5712

Issue 2: http://code.google.com/p/googleappengine/issues/detail?id=5713

Issue 3: http://code.google.com/p/googleappengine/issues/detail?id=5714

Issue 4: http://code.google.com/p/googleappengine/issues/detail?id=5715

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文