当前位置：文江博客话题详情

google-app-engine data-migration google-cloud-datastore

是否可以为一种实体加载 2 个模型来支持数据迁移？

发布于 2025-01-01 00:08:59 字数 2624 浏览 4 评论 0 原文

我正在为我们当前的生产 App Engine 应用程序编写数据存储迁移。

我们对数据模型进行了一些相当广泛的更改，因此我正在尝试建立一个架构以允许将来更轻松地迁移。这包括迁移测试套件和迁移脚本的通用类结构。

我当前的策略遇到了问题。对于迁移和测试脚本，我需要一种方法将旧模式中的模型类和新数据模式的模型类同时加载到内存中，并使用其中任一者加载实体。

这是一组模式示例。

rev1.py

class Account(db.Model):
  _version      = db.IntegerProperty(default = 1)
  user          = db.UserProperty(auto_current_user_add = True, required = True)
  name          = db.StringProperty()
  contact_email = db.EmailProperty()

rev2.py

class Account(db.Model):
  _version = db.IntegerProperty(default = 2)
  auth_id  = db.StringProperty()
  name     = db.StringProperty()
  pwd_hash = db.StringProperty(required = True, indexed = False)

迁移脚本可能看起来像这样：

import rev1
import rev2

class MyMigration(...):
   def isNeeded(self):
      num_accounts = num_entities_with_version(rev1.Account, 1)
      return num_accounts > 0

   def run(self):
       rev1_accounts = rev1.Account.all()
       for account in [a for a in rev1_accounts if account._version == 1]:
           auth_id = account.contact_email
           if auth_id is None or auth_id == '':
              auth_id = account.user.email()

              new_account = rev2.Account.create(auth_id = auth_id,
                                                name    = account.name)

测试套件看起来像这样：

import rev1
import rev2

class MyTest(...):
   def testIt(self):
      # Setup data
      act1 = rev1.Account(name = '..', contact_email = '..')
      act1.put()
      act2 = rev1.Account(name = '..', contact_email = '..')
      act2.put()

      # Run migration
      migration.run()

      # Check results
      accounts = rev2.Account.all().fetch(99)

正如你所看到的，我以两种方式使用旧版本。我在迁移中使用它作为读取旧格式数据并将其转换为新格式的方法。（注意：由于所需的 pwd_hash 字段和其他字段更改等原因，我无法以新格式读取它）。我在测试套件中使用它在运行迁移之前以旧格式设置测试数据。

这一切在理论上看起来都很棒，但在实践中却分崩离析，因为 GAE 不允许为同一类型加载多个模型，或者更具体地说，查询仅返回最近定义的模型。

在开发服务器中，这似乎是由于在实体查询上调用 get() 的过程（例如：Account.get(my_key)）调用结果挂钩，该挂钩通过调用 class_for_kind 来构建结果模型对象数据中的实体种类名称。因此，即使我可能调用 rev2.Account.get()，它也可能会构建 rev1.Account 模型对象，因为种类“Account”映射到 _kind_map 字典中的 rev1.Account。

这让我重新思考了我的迁移策略，我想问问是否有人有想法。具体来说：

在测试和生产服务器上运行时手动覆盖 google.appengine.ext.db._kind_map 是否安全，以允许此迁移方法起作用？
有没有更好的方法可以同时在内存中保存模型的两个版本？
是否有其他迁移方法可能是完成这项工作的更明智的方法？

我想到尝试的其他方法包括：

当版本更改时更改实体类型。（使用 kind() 来更改它）然后，当我们迁移时，我们将所有类移动到新的种类名称。
找到一种方法来查询实体并获取尚未构建到完整对象中的“原始”对象（原始缓冲区？？）。（不适用于测试）
“Just Do It Live”：不要为此编写任何测试，而只是尝试使用最新的模式进行迁移，加载较旧的数据，解决出现的问题。

原文

I am writing a datastore migration for our current production App Engine application.

We made some fairly extensive changes to the data model so I am trying to put in place an architecture to allow easier migrations in the future. This includes test suites for the migrations and common class structures for migration scripts.

I am running into a problem with my current strategy. For both the migrations and the test scripts I need a way to load the Model classes from the old schema and the Model classes for the new data schema into memory at the same time and load entities using either.

Here is an example set of schemas.

rev1.py

class Account(db.Model):
  _version      = db.IntegerProperty(default = 1)
  user          = db.UserProperty(auto_current_user_add = True, required = True)
  name          = db.StringProperty()
  contact_email = db.EmailProperty()

rev2.py

class Account(db.Model):
  _version = db.IntegerProperty(default = 2)
  auth_id  = db.StringProperty()
  name     = db.StringProperty()
  pwd_hash = db.StringProperty(required = True, indexed = False)

A migration script may look something like:

import rev1
import rev2

class MyMigration(...):
   def isNeeded(self):
      num_accounts = num_entities_with_version(rev1.Account, 1)
      return num_accounts > 0

   def run(self):
       rev1_accounts = rev1.Account.all()
       for account in [a for a in rev1_accounts if account._version == 1]:
           auth_id = account.contact_email
           if auth_id is None or auth_id == '':
              auth_id = account.user.email()

              new_account = rev2.Account.create(auth_id = auth_id,
                                                name    = account.name)

And a test suite would look something like this:

import rev1
import rev2

class MyTest(...):
   def testIt(self):
      # Setup data
      act1 = rev1.Account(name = '..', contact_email = '..')
      act1.put()
      act2 = rev1.Account(name = '..', contact_email = '..')
      act2.put()

      # Run migration
      migration.run()

      # Check results
      accounts = rev2.Account.all().fetch(99)

So as you can see I am using the old revision in two ways. I am using it in the migration as a way to read data in the old format and convert it into the new format. (note: I can't read it in the new format because of things like the required pwd_hash field and other field changes). I am using it in the test suite to setup test data in the old format before running the migration.

It all seems great in theory, but in practice it falls apart because GAE doesn't allow multiple models to be loaded for the same kind, or more specifically, queries only return for the most recently defined model.

In the development server this seems to be due to the fact that the process of calling get() on a query for an entity (ex: Account.get(my_key)) calls a result hook that builds the result Model object by calling class_for_kind on the entity kind name from the data. So even though I may call rev2.Account.get(), it may build up rev1.Account Model objects because the kind 'Account' maps to rev1.Account in the _kind_map dictionary.

This has made me rethink my migration strategy a bit and I wanted to ask if anyone has thoughts. Specifically:

Would it be safe to manually override google.appengine.ext.db._kind_map at runtime in test and on the production servers to allow this migration method to work?
Is there some better way to keep two versions of a Model in memory at the same time?
Is there a different migration method that may be a smarter way to go about this work?

Other methods I have thought of trying include:

Change the entity kind when the version changes. (use kind() to change it) Then when we migrate we move all classes to the new kind name.
Find a way to query the entities and get back a 'raw' object (proto buffers??) that has not been built into a full object. (would not work with tests)
'Just Do It Live': Don't write tests for any of this and just try to migrate using the latest schema loading the older data working around issues as the come up.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

溺ぐ爱和你が 2025-01-08 00:09:00

我觉得这个更大的问题里面其实还有几个问题。不过，这里似乎有两个关键问题，一个是如何测试，另一个是如何真正做到这一点。

我不会多次定义这种类型；正如您所指出的，这样做有一些细微差别，而且，如果您最终加载了错误的模型，您会遇到各种各样的麻烦。也就是说，您完全有可能操纵 kind_map。我在一些特殊情况下这样做过，但我会尽可能避免这样做。

对于架构发生重大更改的实时迁移，您有两种选择：使用展开或使用较低级别的 API。添加必填字段时，您可能会发现使用 Expando 更容易，然后运行迁移以添加新信息，然后切换回普通 db.Model。较低级别的 API 位于 ext.db 的正下方，它将实体呈现为 Python 字典。这对于操作实体来说非常方便。使用您更习惯的方法。如果可能的话，我更喜欢 Expando，因为它是一个更高级别的界面，但它是一个两步过程。

对于测试，我个人建议您关注实际的转换例程。因此，不要从向下查询的角度测试方法，而是进行测试以确保转换例程本身正常运行。您甚至可以选择将旧实体作为 Python 字典传递，然后返回新实体。

我还会在这里进行另一项调整。我宁愿使用查询来查找我的所有 rev 1 帐户。这就是在模型上拥有索引 _version 的好处。您可以轻松找到需要迁移的内容。

另外，请查看 Google 的有关更新架构的文章。它很旧，但仍然很好。