App Engine 上的异步 urlfetch

发布于 2024-08-16 07:11:12 字数 1328 浏览 7 评论 0原文

我的应用程序需要对每个请求执行许多数据存储操作。我想并行运行它们以获得更好的响应时间。

对于数据存储更新,我正在执行批量放置,因此它们都是异步发生的,这节省了许多毫秒。 App Engine 允许更新最多 500 个实体并联。

但我还没有找到允许并行执行不同类型的数据存储区提取的内置函数。

由于 App Engine 确实允许 urlfetch 调用异步运行,因此我为每种类型创建一个 getter URL,以 JSON 格式的文本返回查询结果。现在我的应用程序可以对这些 URL 进行异步 urlfetch 调用,这可以并行化数据存储区提取。

此技术适用于少量并行请求,但当尝试同时运行 5 或 10 个以上的 urlfetch 调用时,App Engine 会引发错误。

我现在只是测试,所以每个 urlfetch 都是相同的查询;因为它们在小批量下工作得很好,但在处理多个同时请求时开始失败,我认为它一定与异步 urlfetch 调用有关。

我的问题是:

  1. 可以异步运行的 urlfetch.create_rpc() 调用数量是否有限制?
  2. 同步 urlfecth.fetch() 函数有一个“deadline”参数,该参数允许函数在失败之前等待最多 10 秒的响应。有没有办法告诉 urlfetch.create_rpc() 等待响应多长时间?
  3. 下面显示的错误是什么意思?
  4. 是否有更好的服务器端技术来并行运行不同类型的数据存储提取?

    文件“/base/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py”,第 501 行,位于 get_result 中 返回 self.__get_result_hook(self) 文件“/base/python_lib/versions/1/google/appengine/api/urlfetch.py​​”,第 331 行,位于 _get_fetch_result 中 引发下载错误(str(err)) InterruptedError: ('The Wait() 请求被另一个回调的异常中断:', DownloadError('ApplicationError: 5 ',))

My app needs to do many datastore operations on each request. I'd like to run them in parallel to get better response times.

For datastore updates I'm doing batch puts so they all happen asynchronously which saves many milliseconds. App Engine allows up to 500 entities to be updated in parallel.

But I haven't found a built-in function that allows datastore fetches of different kinds to execute in parallel.

Since App Engine does allow urlfetch calls to run asynchronously, I created a getter URL for each kind which returns the query results as JSON-formatted text. Now my app can do async urlfetch calls to these URLs which could parallelize the datastore fetches.

This technique works well with small numbers of parallel requests, but App Engine throws errors when attempting to run more than 5 or 10 of these urlfetch calls at the same time.

I'm only testing now, so each urlfetch is the identical query; since they work fine in small volumes but start failing with more than a handful of simultaneous requests, I'm thinking it must have something to do with the async urlfetch calls.

My questions are:

  1. Is there a limit to the number of urlfetch.create_rpc() calls that can run asynchronously?
  2. The synchronous urlfecth.fetch() function has a 'deadline' parameter that will allow the function to wait up to 10 seconds for a response before failing. Is there any way to tell urlfetch.create_rpc() how long to wait for a response?
  3. What do the errors shown below mean?
  4. Is there a better server-side technique to run datastore fetches of different kinds in parallel?

    File "/base/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 501, in get_result
    return self.__get_result_hook(self)
    File "/base/python_lib/versions/1/google/appengine/api/urlfetch.py", line 331, in _get_fetch_result
    raise DownloadError(str(err))
    InterruptedError: ('The Wait() request was interrupted by an exception from another callback:', DownloadError('ApplicationError: 5 ',))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

╰ゝ天使的微笑 2024-08-23 07:11:13

虽然我担心我无法直接回答您提出的任何问题,但我认为我应该告诉您,您沿着这些方向进行的所有研究可能不会为您的问题找到可行的解决方案。

问题是数据存储区的写入时间比读取时间长得多,因此,如果您找到一种方法来最大程度地增加可能发生的读取次数,那么您的代码在能够对所有数据进行相应的写入之前就已经耗尽了时间。您已阅读的实体。

我会认真考虑重新考虑数据存储类的设计,以减少需要发生的读写次数,因为这很快就会成为应用程序的瓶颈。

While I am afraid that I can't directly answer any of the questions that you pose, I think that I ought to tell you that all of your research along these lines may not lead to you to a working solution for your problem.

The problem is that datastore writes take much longer than reads, so if you find a way to max out the number of reads that can happen, you're code will very run out of time long before it is able to make corresponding writes to all of the entities that you have read.

I would seriously consider rethinking the design of your datastore classes to reduce the number of reads and writes that needs to happen, as this will quickly become a bottleneck for your application.

心欲静而疯不止 2024-08-23 07:11:13

您是否考虑过使用 TaskQueues 来完成将请求排队的工作稍后执行?

如果任务返回 4xx 状态,它将被视为失败并稍后重试 - 因此您可以将错误传递回去并让任务队列处理重试请求直到成功。另外,通过对存储桶大小和速率进行一些实验,您可能可以让任务队列减慢请求速度,这样您就不会最大化数据库

还有一个很好的包装器(deferred.defer),它使事情变得更加简单 - 您可以对应用程序中的(几乎)任何函数进行延迟调用。

Have you considered using TaskQueues to do the work of queuing the requests to be executed later?

If the task returns a 4xx status it will be considered failed and will be retried later - so you could pass the error back up and have the task queue handle retrying the requests until the succeed. Also, with some experimentation with bucket sizes and rates, you can probably have the Task Queue slow down the requests enough that you don't max out the database

There's also a nice wrapper (deferred.defer) which makes things even simpler - you can make a deferred call to (almost) any function in your app.

琉璃梦幻 2024-08-23 07:11:12

由于 App Engine 允许异步 urlfetch 调用,但不允许异步数据存储区获取,因此我尝试使用 urlfetch RPC 并行地从数据存储区检索。

缺少异步数据存储区获取是一个公认的问题:

http://code .google.com/p/googleappengine/issues/detail?id=1889

现在有一个允许异步查询的第三方工具:

http://code.google.com/p/asynctools/

"asynctools 是一个库,允许您并行执行 Google App Engine API 调用 可以混合 API 调用。一起排队,然后并行开始。”

这正是我一直在寻找的。

Since App Engine allows async urlfetch calls but does not allow async datastore gets, I was trying to use urlfetch RPCs to retrieve from the datastore in parallel.

The lack of async datastore gets is an acknowledged issue:

http://code.google.com/p/googleappengine/issues/detail?id=1889

And there's now a third-party tool that allows async queries:

http://code.google.com/p/asynctools/

"asynctools is a library allowing you to execute Google App Engine API calls in parallel. API calls can be mixed together and queued up and then all are kicked off in parallel."

This is exactly what I was looking for.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文