GAE 数据存储上的架构迁移
首先,这是我在 Stack Overflow 上的第一篇文章,所以请原谅任何新手的错误步骤。如果我可以更清楚地阐述我的问题,请告诉我。
我正在 Google App Engine 上运行一个大型应用程序,并且一直在添加新功能,这些功能迫使我修改旧数据类并添加新数据类。为了清理我们的数据库并更新旧条目,我一直在尝试编写一个脚本,该脚本可以迭代类的实例,进行更改,然后重新保存它们。问题是,当您对服务器的调用时间超过几秒时,Google App Engine 就会超时。
我已经为这个问题苦苦挣扎了好几个星期。我找到的最佳解决方案在这里:http ://code.google.com/p/rietveld/source/browse/trunk/update_entities.py?spec=svn427&r=427
我为自己的网站创建了该代码的一个版本,您可以看到这里:
def schema_migration (self, target, batch_size=1000):
last_key = None
calls = {"Affiliate": Affiliate, "IPN": IPN, "Mail": Mail, "Payment": Payment, "Promotion": Promotion}
while True:
q = calls[target].all()
if last_key:
q.filter('__key__ >', last_key)
q.order('__key__')
this_batch_size = batch_size
while True:
try:
batch = q.fetch(this_batch_size)
break
except (db.Timeout, DeadlineExceededError):
logging.warn("Query timed out, retrying")
if this_batch_size == 1:
logging.critical("Unable to update entities, aborting")
return
this_batch_size //= 2
if not batch:
break
keys = None
while not keys:
try:
keys = db.put(batch)
except db.Timeout:
logging.warn("Put timed out, retrying")
last_key = keys[-1]
print "Updated %d records" % (len(keys),)
奇怪的是,该代码对于具有 100 - 1,000 个实例的类来说效果非常好,并且脚本通常需要大约 10 秒。但是,当我尝试运行数据库中包含 100K 多个实例的类的代码时,脚本运行了 30 秒,然后我收到以下消息:
“错误:服务器错误
服务器遇到错误,无法完成您的请求。 如果问题仍然存在,请报告您的问题并提及此错误消息以及导致该错误的查询。""
知道为什么 GAE 在 30 秒后超时吗?我可以做什么来解决这个问题?
感谢您! 凯勒
First off, this is my first post on Stack Overflow, so please forgive any newbish mis-steps. If I can be clearer in terms of how I frame my question, please let me know.
I'm running a large application on Google App Engine, and have been adding new features that are forcing me to modify old data classes and add new ones. In order to clean our database and update old entries, I've been trying to write a script that can iterate through instances of a class, make changes, and then re-save them. The problem is that Google App Engine times out when you make calls to the server that take longer than a few seconds.
I've been struggling with this problem for several weeks. The best solution that I've found is here: http://code.google.com/p/rietveld/source/browse/trunk/update_entities.py?spec=svn427&r=427
I created a version of that code for my own website, which you can see here:
def schema_migration (self, target, batch_size=1000):
last_key = None
calls = {"Affiliate": Affiliate, "IPN": IPN, "Mail": Mail, "Payment": Payment, "Promotion": Promotion}
while True:
q = calls[target].all()
if last_key:
q.filter('__key__ >', last_key)
q.order('__key__')
this_batch_size = batch_size
while True:
try:
batch = q.fetch(this_batch_size)
break
except (db.Timeout, DeadlineExceededError):
logging.warn("Query timed out, retrying")
if this_batch_size == 1:
logging.critical("Unable to update entities, aborting")
return
this_batch_size //= 2
if not batch:
break
keys = None
while not keys:
try:
keys = db.put(batch)
except db.Timeout:
logging.warn("Put timed out, retrying")
last_key = keys[-1]
print "Updated %d records" % (len(keys),)
Strangely, the code works perfectly for classes with between 100 - 1,000 instances, and the script often takes around 10 seconds. But when I try to run the code for classes in our database with more like 100K instances, the script runs for 30 seconds, and then I receive this:
"Error: Server Error
The server encountered an error and could not complete your request.
If the problem persists, please report your problem and mention this error message and the query that caused it.""
Any idea why GAE is timing out after exactly thirty seconds? What can I do to get around this problem?
Thanks you!
Keller
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听声音你就遇到了第二个 DeadlineExceededError 。每个 AppEngine 请求只能运行 30 秒。当 DeadLineExceedError 被引发时,你的工作就是停止处理并清理,因为你已经没有时间了,下次它被引发时你将无法捕获它。
您应该考虑使用 Mapper API 将迁移分为批处理并使用任务队列运行每个批处理。
you are hitting the second DeadlineExceededError by the sound of it. AppEngine requests can only run for 30 seconds each. When DeadLineExceedError is raised it's your job to stop processing and tidy up as you are running out of time, the next time it is raised you cannot catch it.
You should look at using the Mapper API to split your migration into batches and run each batch using the Task Queue.
您的解决方案的第一步是迁移到使用 GAE 的任务队列。此功能将允许您对稍后进行的更多工作进行排队。
这实际上并不能立即解决问题,因为即使任务队列也仅限于很短的时间片。但是,您可以展开循环以一次处理数据库中的少量行。完成每个批次后,它可以检查它已经运行了多长时间,如果足够长,它可以在队列中启动一个新任务,以继续当前任务将停止的位置。
另一种解决方案是不迁移数据。改变实现逻辑,让每个实体都知道自己是否已经迁移。新创建的实体或更新的旧实体将采用新格式。由于 GAE 不要求实体具有所有相同的字段,因此您可以轻松地做到这一点,但在关系数据库上这是不切实际的。
The start of your solution will be to migrate to using GAE's Task Queues. This feature will allow you to queue some more work to happen at a later time.
That won't actually solve the problem immediately, because even task queue's are limited to short timeslices. However, you can unroll your loop to process a handfull of rows in your database at a time. After completing each batch, it can check to see how long it has been running, and if it's been long enough, it can start a new task in the queue to continue where the current task will leave off.
An alternative solution is to not migrate the data. Change the implementing logic so that each entity knows whether or not it has been migrated. Newly created entities, or old entities that get updated, will take the new format. Since GAE doesn't require that entities have all the same fields, you can do this easily, where on a relational database, that wouldn't be practical.