Google App Engine 超时:数据存储操作超时,或数据暂时不可用

发布于 2024-10-13 06:56:20 字数 3070 浏览 6 评论 0 原文

这是我每天都会在应用程序日志中遇到的常见异常,通常每天 5/6 次,访问量为 1K/天:

db error trying to store stats
Traceback (most recent call last):
  File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/utility/worker.py", line 36, in deferred_store_print_statistics
    dbcounter.increment()
  File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/db/counter.py", line 28, in increment
    db.run_in_transaction(txn)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 1981, in RunInTransaction
    DEFAULT_TRANSACTION_RETRIES, function, *args, **kwargs)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2067, in RunInTransactionCustomRetries
    ok, result = _DoOneTry(new_connection, function, args, kwargs)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2105, in _DoOneTry
    if new_connection.commit():
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1585, in commit
    return rpc.get_result()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 530, in get_result
    return self.__get_result_hook(self)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1613, in __commit_hook
    raise _ToDatastoreError(err)
Timeout: The datastore operation timed out, or the data was temporarily unavailable.

引发上述异常的函数如下:

def store_printed_question(question_id, service, title):
    def _store_TX():
        entity = Question.get_by_key_name(key_names = '%s_%s' % \
                                         (question_id, service ) )
        if entity:
            entity.counter = entity.counter + 1                
            entity.put()
        else:
            Question(key_name = '%s_%s' % (question_id, service ),\ 
                          question_id ,\
                          service,\ 
                          title,\ 
                          counter = 1).put()
    db.run_in_transaction(_store_TX)

基本上, store_printed_question 函数检查先前是否打印过给定问题,在这种情况下,在单个事务中递增相关计数器。
此函数由 WebHandler 添加到 deferred 使用预定义的工作器如您所知,默认队列的吞吐量为每秒 5 次任务调用。

在具有六个属性(两个索引)的实体上,我认为使用 事务可以让我避免数据存储超时,但是,查看日志,这个错误仍然每天都会出现。

我存储的这个计数器并不是那么重要,所以我不担心这些超时;无论如何,我很好奇为什么 Google App Engine 即使在每秒 5 个任务的低速率下也无法正确处理此任务,并且降低速率是否可能是一个可能的解决方案。
每个问题都使用分片计数器来避免超时,这未免太过分了大部头书。

编辑:
我已将默认队列的速率限制设置为每秒 1 个任务;我仍然遇到同样的错误。

This is a common exception I'm getting on my application's log daily, usually 5/6 times a day with a traffic of 1K visits/day:

db error trying to store stats
Traceback (most recent call last):
  File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/utility/worker.py", line 36, in deferred_store_print_statistics
    dbcounter.increment()
  File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/db/counter.py", line 28, in increment
    db.run_in_transaction(txn)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 1981, in RunInTransaction
    DEFAULT_TRANSACTION_RETRIES, function, *args, **kwargs)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2067, in RunInTransactionCustomRetries
    ok, result = _DoOneTry(new_connection, function, args, kwargs)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2105, in _DoOneTry
    if new_connection.commit():
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1585, in commit
    return rpc.get_result()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 530, in get_result
    return self.__get_result_hook(self)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1613, in __commit_hook
    raise _ToDatastoreError(err)
Timeout: The datastore operation timed out, or the data was temporarily unavailable.

The function that is raising the exception above is the following one:

def store_printed_question(question_id, service, title):
    def _store_TX():
        entity = Question.get_by_key_name(key_names = '%s_%s' % \
                                         (question_id, service ) )
        if entity:
            entity.counter = entity.counter + 1                
            entity.put()
        else:
            Question(key_name = '%s_%s' % (question_id, service ),\ 
                          question_id ,\
                          service,\ 
                          title,\ 
                          counter = 1).put()
    db.run_in_transaction(_store_TX)

Basically, the store_printed_question function check if a given question was previously printed, incrementing in that case the related counter in a single transaction.
This function is added by a WebHandler to a deferred worker using the predefined default queue that, as you might know, has a throughput rate of five task invocations per second.

On a entity with six attributes (two indexes) I thought that using transactions regulated by a deferred task rate limit would allow me to avoid datastore timeouts but, looking at the log, this error is still showing up daily.

This counter I'm storing is not so much important, so I'm not worried about getting these timeouts; anyway I'm curious why Google App Engine can't handle this task properly even at a low rate like 5 tasks per second and if lowering the rate could be a possible solution.
A sharded counter on each question to avoid timeouts would be an overkill to me.

EDIT:
I have set the rate limit to 1 task per second on the default queue; I'm still getting the same error.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

℉絮湮 2024-10-20 06:56:20

查询只能存活 30 秒。请参阅我对 的回答这个问题提供了一些使用游标分解查询的示例代码。

A query can only live for 30 seconds. See my answer to this question for some sample code to break a query up using cursors.

孤凫 2024-10-20 06:56:20

一般来说,此类超时通常是由于写入争用造成的。如果您正在进行事务并且同时向同一实体组写入一堆内容,则会遇到写入争用问题(乐观并发)。在大多数情况下,如果您将实体组缩小,这通常会最小化这个问题。

在您的具体情况下,根据上面的代码,很可能是因为您应该使用分片counter 以避免序列化写入的堆栈。

另一种可能性较小的可能性(此处仅出于完整性目的而提及)是您的数据所在的平板电脑是 被移动

Generally speaking, a timeout like this is usually because of write contention. If you've got a transaction going and you're writing a bunch of stuff to the same entity group concurrently, you run into write contention issues (a side effect of optimistic concurrency). In most cases, if you make your entity group smaller, that will usually minimize this problem.

In your specific case, based on the code above, it's most probably because you should be using a sharded counter to avoid stacking of serialized writes.

Another far less likely possibility (mentioned here only for completeness) is that the tablet your data is on is being moved.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文