谷歌应用程序引擎上的任务队列或多线程
我的服务器位于 Google App Engine 上 我的工作之一是将大量记录与另一组记录进行匹配。 如果我必须将 10000 条记录与 100 条记录匹配,这需要很长时间。 实现这个的最好方法是什么。
我使用 Web2py 堆栈并将我的应用程序部署在 Google App Engine 上。
I have my server on Google App Engine
One of my jobs is to match a huge set of records with another.
This takes very long, if i have to match 10000 records with 100.
Whats the best way of implementing this.
Im, using Web2py stack and deployed my application on Google App Engine.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
也许我误解了一些东西,但这听起来像是任务队列的完美匹配,而且我看不出多线程有什么帮助,因为我认为这只是说明你可以同时提供许多响应,如果您的回复时间超过了 30 秒的限制。
对于任务,您可以添加它,然后处理直到时间限制,然后如果您在时间限制前尚未完成工作,则使用该任务的其余部分重新创建另一个任务。
maybe i'm misunderstanding something, but thos sounds like the perfect match for a task queue, and i can't see how multithreading will help, as i thought this only ment that you can serve many responses simultaneously, it won't help if your responses take longer than the 30 second limit.
With a task you can add it, then process until the time limit, then recreate another task with the remainder of the task if you haven't finished your job by the time limit.
GAE 不支持多线程代码,因此您不能显式使用它。
GAE 本身可以是多线程的,这意味着一个前端实例可以同时处理多个 http 请求。
在您的情况下,实现并行任务执行的最佳方法是任务队列。
Multithreading your code is not supported on GAE so you can not explicitly use it.
GAE itself can be multithreaded, which means that one frontend instance can handle multiple http requests simultaneously.
In your case, best way to achieve parallel task execution is Task Queue.
您正在做的事情的基本结构是让 cron 作业负责将工作划分为更小的单元,并使用任务队列执行每个单元。每个任务的有效负载将是标识第一组中的实体(例如一组密钥)的信息。每个任务将执行将第一组中的实体与第二组中的实体连接所需的任何查询,并存储中间(或可能最终)结果。您可以调整有效负载大小和任务队列速率,直到它按照您想要的方式执行。
如果需要聚合每个任务的结果,您可以让每个任务记录其完成情况并测试所有任务是否已完成,或者仅使用另一个作业来轮询完成记录以触发聚合。当 MapReduce 功能得到更广泛的应用时,这将成为执行此类工作的框架。
http://www.youtube.com/watch?v=EIxelKcyCC0
http://code.google.com/p/appengine-mapreduce/
The basic structure for what you're doing is to have the cron job be responsible for dividing the work into smaller units, and executing each unit with the task queue. The payload for each task would be information that identifies the entities in the first set (such as a set of keys). Each task would perform whatever queries are necessary to join the entities in the first set with the entities in the second set, and store intermediate (or perhaps final) results. You can tweak the payload size and task queue rate until it performs the way you desire.
If the results of each task need to be aggregated, you can have each task record its completion and test for whether all tasks are complete, or just have another job that polls the completion records, to fire off the aggregation. When the MapReduce feature is more widely available, that will be a framework for performing this kind of work.
http://www.youtube.com/watch?v=EIxelKcyCC0
http://code.google.com/p/appengine-mapreduce/