Google App Engine 超时：数据存储操作超时，或数据暂时不可用

发布于 2024-10-13 06:56:20 字数 3070 浏览 6 评论 0 原文

这是我每天都会在应用程序日志中遇到的常见异常，通常每天 5/6 次，访问量为 1K/天：

db error trying to store stats
Traceback (most recent call last):
  File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/utility/worker.py", line 36, in deferred_store_print_statistics
    dbcounter.increment()
  File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/db/counter.py", line 28, in increment
    db.run_in_transaction(txn)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 1981, in RunInTransaction
    DEFAULT_TRANSACTION_RETRIES, function, *args, **kwargs)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2067, in RunInTransactionCustomRetries
    ok, result = _DoOneTry(new_connection, function, args, kwargs)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2105, in _DoOneTry
    if new_connection.commit():
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1585, in commit
    return rpc.get_result()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 530, in get_result
    return self.__get_result_hook(self)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1613, in __commit_hook
    raise _ToDatastoreError(err)
Timeout: The datastore operation timed out, or the data was temporarily unavailable.

引发上述异常的函数如下：

def store_printed_question(question_id, service, title):
    def _store_TX():
        entity = Question.get_by_key_name(key_names = '%s_%s' % \
                                         (question_id, service ) )
        if entity:
            entity.counter = entity.counter + 1                
            entity.put()
        else:
            Question(key_name = '%s_%s' % (question_id, service ),\ 
                          question_id ,\
                          service,\ 
                          title,\ 
                          counter = 1).put()
    db.run_in_transaction(_store_TX)

基本上， store_printed_question 函数检查先前是否打印过给定问题，在这种情况下，在单个事务中递增相关计数器。
此函数由 WebHandler 添加到 deferred 使用预定义的工作器如您所知，默认队列的吞吐量为每秒 5 次任务调用。

在具有六个属性（两个索引）的实体上，我认为使用事务可以让我避免数据存储超时，但是，查看日志，这个错误仍然每天都会出现。

我存储的这个计数器并不是那么重要，所以我不担心这些超时；无论如何，我很好奇为什么 Google App Engine 即使在每秒 5 个任务的低速率下也无法正确处理此任务，并且降低速率是否可能是一个可能的解决方案。
每个问题都使用分片计数器来避免超时，这未免太过分了大部头书。

编辑：
我已将默认队列的速率限制设置为每秒 1 个任务；我仍然遇到同样的错误。

原文

This is a common exception I'm getting on my application's log daily, usually 5/6 times a day with a traffic of 1K visits/day:

db error trying to store stats
Traceback (most recent call last):
  File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/utility/worker.py", line 36, in deferred_store_print_statistics
    dbcounter.increment()
  File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/db/counter.py", line 28, in increment
    db.run_in_transaction(txn)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 1981, in RunInTransaction
    DEFAULT_TRANSACTION_RETRIES, function, *args, **kwargs)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2067, in RunInTransactionCustomRetries
    ok, result = _DoOneTry(new_connection, function, args, kwargs)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2105, in _DoOneTry
    if new_connection.commit():
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1585, in commit
    return rpc.get_result()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 530, in get_result
    return self.__get_result_hook(self)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1613, in __commit_hook
    raise _ToDatastoreError(err)
Timeout: The datastore operation timed out, or the data was temporarily unavailable.

The function that is raising the exception above is the following one:

def store_printed_question(question_id, service, title):
    def _store_TX():
        entity = Question.get_by_key_name(key_names = '%s_%s' % \
                                         (question_id, service ) )
        if entity:
            entity.counter = entity.counter + 1                
            entity.put()
        else:
            Question(key_name = '%s_%s' % (question_id, service ),\ 
                          question_id ,\
                          service,\ 
                          title,\ 
                          counter = 1).put()
    db.run_in_transaction(_store_TX)

Basically, the store_printed_question function check if a given question was previously printed, incrementing in that case the related counter in a single transaction.
This function is added by a WebHandler to a deferred worker using the predefined default queue that, as you might know, has a throughput rate of five task invocations per second.

On a entity with six attributes (two indexes) I thought that using transactions regulated by a deferred task rate limit would allow me to avoid datastore timeouts but, looking at the log, this error is still showing up daily.

This counter I'm storing is not so much important, so I'm not worried about getting these timeouts; anyway I'm curious why Google App Engine can't handle this task properly even at a low rate like 5 tasks per second and if lowering the rate could be a possible solution.
A sharded counter on each question to avoid timeouts would be an overkill to me.

EDIT:
I have set the rate limit to 1 task per second on the default queue; I'm still getting the same error.