我正在为我们当前的生产 App Engine 应用程序编写数据存储迁移。
我们对数据模型进行了一些相当广泛的更改,因此我正在尝试建立一个架构以允许将来更轻松地迁移。这包括迁移测试套件和迁移脚本的通用类结构。
我当前的策略遇到了问题。对于迁移和测试脚本,我需要一种方法将旧模式中的模型类和新数据模式的模型类同时加载到内存中,并使用其中任一者加载实体。
这是一组模式示例。
rev1.py
class Account(db.Model):
_version = db.IntegerProperty(default = 1)
user = db.UserProperty(auto_current_user_add = True, required = True)
name = db.StringProperty()
contact_email = db.EmailProperty()
rev2.py
class Account(db.Model):
_version = db.IntegerProperty(default = 2)
auth_id = db.StringProperty()
name = db.StringProperty()
pwd_hash = db.StringProperty(required = True, indexed = False)
迁移脚本可能看起来像这样:
import rev1
import rev2
class MyMigration(...):
def isNeeded(self):
num_accounts = num_entities_with_version(rev1.Account, 1)
return num_accounts > 0
def run(self):
rev1_accounts = rev1.Account.all()
for account in [a for a in rev1_accounts if account._version == 1]:
auth_id = account.contact_email
if auth_id is None or auth_id == '':
auth_id = account.user.email()
new_account = rev2.Account.create(auth_id = auth_id,
name = account.name)
测试套件看起来像这样:
import rev1
import rev2
class MyTest(...):
def testIt(self):
# Setup data
act1 = rev1.Account(name = '..', contact_email = '..')
act1.put()
act2 = rev1.Account(name = '..', contact_email = '..')
act2.put()
# Run migration
migration.run()
# Check results
accounts = rev2.Account.all().fetch(99)
正如你所看到的,我以两种方式使用旧版本。我在迁移中使用它作为读取旧格式数据并将其转换为新格式的方法。 (注意:由于所需的 pwd_hash 字段和其他字段更改等原因,我无法以新格式读取它)。我在测试套件中使用它在运行迁移之前以旧格式设置测试数据。
这一切在理论上看起来都很棒,但在实践中却分崩离析,因为 GAE 不允许为同一类型加载多个模型,或者更具体地说,查询仅返回最近定义的模型。
在开发服务器中,这似乎是由于在实体查询上调用 get() 的过程(例如:Account.get(my_key))调用结果挂钩,该挂钩通过调用 class_for_kind 来构建结果模型对象数据中的实体种类名称。因此,即使我可能调用 rev2.Account.get(),它也可能会构建 rev1.Account 模型对象,因为种类“Account”映射到 _kind_map 字典中的 rev1.Account。
这让我重新思考了我的迁移策略,我想问问是否有人有想法。具体来说:
- 在测试和生产服务器上运行时手动覆盖 google.appengine.ext.db._kind_map 是否安全,以允许此迁移方法起作用?
- 有没有更好的方法可以同时在内存中保存模型的两个版本?
- 是否有其他迁移方法可能是完成这项工作的更明智的方法?
我想到尝试的其他方法包括:
- 当版本更改时更改实体类型。 (使用 kind() 来更改它)然后,当我们迁移时,我们将所有类移动到新的种类名称。
- 找到一种方法来查询实体并获取尚未构建到完整对象中的“原始”对象(原始缓冲区??)。 (不适用于测试)
- “Just Do It Live”:不要为此编写任何测试,而只是尝试使用最新的模式进行迁移,加载较旧的数据,解决出现的问题。
I am writing a datastore migration for our current production App Engine application.
We made some fairly extensive changes to the data model so I am trying to put in place an architecture to allow easier migrations in the future. This includes test suites for the migrations and common class structures for migration scripts.
I am running into a problem with my current strategy. For both the migrations and the test scripts I need a way to load the Model classes from the old schema and the Model classes for the new data schema into memory at the same time and load entities using either.
Here is an example set of schemas.
rev1.py
class Account(db.Model):
_version = db.IntegerProperty(default = 1)
user = db.UserProperty(auto_current_user_add = True, required = True)
name = db.StringProperty()
contact_email = db.EmailProperty()
rev2.py
class Account(db.Model):
_version = db.IntegerProperty(default = 2)
auth_id = db.StringProperty()
name = db.StringProperty()
pwd_hash = db.StringProperty(required = True, indexed = False)
A migration script may look something like:
import rev1
import rev2
class MyMigration(...):
def isNeeded(self):
num_accounts = num_entities_with_version(rev1.Account, 1)
return num_accounts > 0
def run(self):
rev1_accounts = rev1.Account.all()
for account in [a for a in rev1_accounts if account._version == 1]:
auth_id = account.contact_email
if auth_id is None or auth_id == '':
auth_id = account.user.email()
new_account = rev2.Account.create(auth_id = auth_id,
name = account.name)
And a test suite would look something like this:
import rev1
import rev2
class MyTest(...):
def testIt(self):
# Setup data
act1 = rev1.Account(name = '..', contact_email = '..')
act1.put()
act2 = rev1.Account(name = '..', contact_email = '..')
act2.put()
# Run migration
migration.run()
# Check results
accounts = rev2.Account.all().fetch(99)
So as you can see I am using the old revision in two ways. I am using it in the migration as a way to read data in the old format and convert it into the new format. (note: I can't read it in the new format because of things like the required pwd_hash field and other field changes). I am using it in the test suite to setup test data in the old format before running the migration.
It all seems great in theory, but in practice it falls apart because GAE doesn't allow multiple models to be loaded for the same kind, or more specifically, queries only return for the most recently defined model.
In the development server this seems to be due to the fact that the process of calling get() on a query for an entity (ex: Account.get(my_key)) calls a result hook that builds the result Model object by calling class_for_kind on the entity kind name from the data. So even though I may call rev2.Account.get(), it may build up rev1.Account Model objects because the kind 'Account' maps to rev1.Account in the _kind_map dictionary.
This has made me rethink my migration strategy a bit and I wanted to ask if anyone has thoughts. Specifically:
- Would it be safe to manually override google.appengine.ext.db._kind_map at runtime in test and on the production servers to allow this migration method to work?
- Is there some better way to keep two versions of a Model in memory at the same time?
- Is there a different migration method that may be a smarter way to go about this work?
Other methods I have thought of trying include:
- Change the entity kind when the version changes. (use kind() to change it) Then when we migrate we move all classes to the new kind name.
- Find a way to query the entities and get back a 'raw' object (proto buffers??) that has not been built into a full object. (would not work with tests)
- 'Just Do It Live': Don't write tests for any of this and just try to migrate using the latest schema loading the older data working around issues as the come up.
发布评论
评论(2)
我觉得这个更大的问题里面其实还有几个问题。不过,这里似乎有两个关键问题,一个是如何测试,另一个是如何真正做到这一点。
我不会多次定义这种类型;正如您所指出的,这样做有一些细微差别,而且,如果您最终加载了错误的模型,您会遇到各种各样的麻烦。也就是说,您完全有可能操纵 kind_map。我在一些特殊情况下这样做过,但我会尽可能避免这样做。
对于架构发生重大更改的实时迁移,您有两种选择:使用 展开或使用 较低级别的 API。添加必填字段时,您可能会发现使用 Expando 更容易,然后运行迁移以添加新信息,然后切换回普通 db.Model。较低级别的 API 位于 ext.db 的正下方,它将实体呈现为 Python 字典。这对于操作实体来说非常方便。使用您更习惯的方法。如果可能的话,我更喜欢 Expando,因为它是一个更高级别的界面,但它是一个两步过程。
对于测试,我个人建议您关注实际的转换例程。因此,不要从向下查询的角度测试方法,而是进行测试以确保转换例程本身正常运行。您甚至可以选择将旧实体作为 Python 字典传递,然后返回新实体。
我还会在这里进行另一项调整。我宁愿使用查询来查找我的所有 rev 1 帐户。这就是在模型上拥有索引 _version 的好处。您可以轻松找到需要迁移的内容。
另外,请查看 Google 的有关更新架构的文章。它很旧,但仍然很好。
I think there are actually several questions within the greater question. There seem to be two key questions here though, one is how to test and the other is how to really do it.
I wouldn't define the kind multiple times; as you've noted there are nuances to doing this, and, if you wind up with the wrong model loaded, you'll get all sorts of headaches. That said, it is completely possible for you to manipulate the kind_map. I've done this in some special cases, but I try to avoid it when possible.
For a live migration where you've got significant schema changes, you've got two choices: use Expando or use the lower level API. When adding required fields, you might find it easier to use Expando, then run a migration to add the new information, then switch back to a plain db.Model. The lower-level API sits right under the ext.db stuff, and it presents the entity as a Python dict. This can be very convenient for manipulating an entity. Use whichever method you're more comfortable with. I prefer Expando when posible, since it is a higher level interface, but it is a two-step process.
For testing, I'd personally suggest you focus on the actual conversion routines. So instead of testing the method from the point of querying down, test to ensure your conversion routines themselves function correctly. You might even choose to pass in the old entity as a Python dict, then return the new entity.
I'd make one other adjustment here as well. I'd rather use a query to find all my rev 1 accounts. That's the great thing about having an indexed _version on your models. You can trivially find things that need migrated.
Also, check out Google's article on updating schemas. It is old, but still good.
另一种方法是简单地在版本 2 上进行迁移,将旧属性保留在模型上,并在更新版本后将它们设置为 None。这将清除它们使用的空间,但仍保留它们的定义。然后在接下来的版本中,您可以将它们从模型中删除。
此方法非常简单,但确实需要两个版本才能完全删除旧属性,因此更类似于弃用现有属性。
Another approach is to simply do the migration on version 2, leaving the old attributes on the model and setting them to None after you update the version. This will clear out the space they use but will still leave them defined. Then in a following release you can just remove them from the model.
This method is pretty simple, but does require two releases to remove old attribute completely, so is more akin to deprecating the existing attributes.