使用任务队列突显处理能力?
我遇到了这样一种情况,我想要对数据存储进行 1000 个不同的查询,对每个单独查询的结果进行一些计算(以获得 1000 个单独的结果),然后返回结果列表。
我希望返回结果列表作为开始计算的同一个 30 秒用户请求的响应,以获得更好的客户端性能。哈!
我有一个大胆的计划。
这些操作中的每一个操作通常在一秒钟内完成都没有问题,它们都不需要与其他操作写入相同的实体组,并且它们都不需要来自任何其他查询的任何信息。是否可以启动 1000 个独立任务,每个任务执行其中一个查询,进行计算,并将结果存储在某种临时实体集合中?原始请求可以等待 10 秒,然后对数据存储中的结果进行一次查询(也许它们都设置了一个我可以查询的唯一值)。任何尚未出现的结果都会在客户端注意到,并且客户端可以在另外十秒内再次请求这些值。
我希望经验丰富的应用工程师能够回答的问题是:
- 这很可笑吗?如果是这样,那么对于任何数量的任务来说这都是可笑的吗?一次50个合理吗?
- 如果我每秒读取同一个实体 20 次,就不会遇到数据存储争用,对吧?那些争论的东西都是为了写?
- 有没有更简单的方法来获取任务的响应?
I've got a situation where I want to make 1000 different queries to the datastore, do some calculations on the results of each individual query (to get 1000 separate results), and return the list of results.
I would like the list of results to be returned as the response from the same 30-second user request that started the calculation, for better client-side performance. Hah!
I have a bold plan.
Each of these operations individually will usually have no problem finishing in under a second, none of them need to write to the same entity group as any other, and none of them need any information from any of the other queries. Might it be possible to start 1000 independent tasks, each taking on one of these queries, doing its calculations, and storing the result in some sort of temporary collection of entities? The original request could wait 10 seconds, and then do a single query for the results from the datastore (maybe they all set a unique value I can query on). Any results that aren't in yet would be noticed at the client end, and the client could just ask for those values again in another ten seconds.
The questions I hope experienced appengineers can answer are:
- Is this ludicrous? If so, is it ludicrous for any number of tasks? Would 50 at once be reasonable?
- I won't run into datastore contention if I'm reading the same entity 20 times a second, right? That contention stuff is all for writing?
- Is there an easier way to get a response from a task?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,听起来很可笑:)
您不应该依赖任务队列来进行这样的操作。您不能指望如此快地生成 1000 个任务(尽管很可能会如此)。
为什么不使用 Channel API 来等待您的响应。因此,您的解决方案变为:
这将避免由于任务执行速度不如您希望的速度或其他原因而很可能不时出现的任何超时问题。
Yep, sounds pretty ludicrous :)
You shouldn't rely on the Taskqueue to operate like that. You can't rely on 1000 tasks being spawned that quickly (although they most likely will).
Why not use the Channel API to wait for your response. So your solution becomes:
This would avoid any timeout issues that would very likely arrise from time to time due to tasks not executing as fast as you like, or some other reason.
任务队列不提供任务何时执行的坚定保证 - ETA(默认为当前时间)是任务执行的最早时间,但如果队列已备份,或者没有可用于执行该任务的实例,它可能会稍后执行。
一种选择是使用 Datastore Plus / NDB,它允许您执行并行查询。然而,无论您如何执行,1000 个查询都将非常昂贵。
正如 @Chris 所建议的,另一种选择是将任务队列与 Channel API 结合使用,这样您就可以在查询完成时异步通知用户。
The Task Queue doesn't provide firm guarantees on when a task will execute - the ETA (which defaults to the current time) is the earliest time at which it will execute, but if the queue is backed up, or there are no instances available to execute the task, it could execute much later.
One option would be to use Datastore Plus / NDB, which allows you to execute queries in parallel. 1000 queries is going to be very expensive, however, no matter how you execute them.
Another option, as @Chris suggests, is to use the task queue with the Channel API, so you can notify the user asynchronously when the queries complete.