Web 应用程序任务的异步执行
我正在开发的 Web 应用程序需要执行在 http 请求/响应周期内执行时间太长的任务。通常,用户将执行请求,服务器将接受该请求,并运行一些脚本来生成数据(例如,使用 povray 渲染图像)。
当然,这些任务可能需要很长时间,因此在将响应发送到客户端之前,服务器不应挂起以等待脚本完成执行。因此,我需要异步执行脚本,并向客户端提供“资源在这里,但尚未准备好”,并可能告诉它一个 ajax 端点进行轮询,以便它可以在准备好时检索并显示资源。
现在,我的问题与设计无关(尽管我也非常喜欢这方面的任何提示)。我的问题是:解决这个问题的系统是否已经存在,所以我不会重新发明方轮?如果必须的话,我会使用进程队列管理器提交任务并放置一个 HTTP 端点来向 ajax 客户端发送状态,例如“待处理”、“中止”、“已完成”,但如果已经有类似的情况专门为这项任务而存在,我非常喜欢它。
我正在使用 python+django 工作。
编辑:请注意,这里的主要问题不是服务器和客户端必须如何协商和交换有关任务状态的信息。
问题在于服务器如何处理超长任务的提交和排队。换句话说,我需要一个比让我的服务器在 LSF 上提交脚本更好的系统。并不是说它不起作用,但我认为这有点太多了...
编辑2:我添加了赏金,看看是否可以获得其他答案。我检查了 pyprocessing,但无法执行作业提交并在稍后阶段重新连接到队列。
A web application I am developing needs to perform tasks that are too long to be executed during the http request/response cycle. Typically, the user will perform the request, the server will take this request and, among other things, run some scripts to generate data (for example, render images with povray).
Of course, these tasks can take a long time, so the server should not hang for the scripts to complete execution before sending the response to the client. I therefore need to perform the execution of the scripts async, and give the client a "the resource is here, but not ready" and probably tell it a ajax endpoint to poll, so it can retrieve and display the resource when ready.
Now, my question is not relative to the design (although I would very much enjoy any hints on this regard as well). My question is: does a system to solve this issue already exists, so I do not reinvent the square wheel ? If I had to, I would use a process queue manager to submit the task and put a HTTP endpoint to shoot out the status, something like "pending", "aborted", "completed" to the ajax client, but if something similar already exists specifically for this task, I would mostly enjoy it.
I am working in python+django.
Edit: Please note that the main issue here is not how the server and the client must negotiate and exchange information about the status of the task.
The issue is how the server handles the submission and enqueue of very long tasks. In other words, I need a better system than having my server submit scripts on LSF. Not that it would not work, but I think it's a bit too much...
Edit 2: I added a bounty to see if I can get some other answer. I checked pyprocessing, but I cannot perform submission of a job and reconnect to the queue at a later stage.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我不知道有哪个系统可以做到这一点,但实现自己的系统相当容易:
有一些边缘情况需要处理照顾好现有的框架显然会更好,正如你所说。
I don't know of a system that does it, but it would be fairly easy to implement one's own system:
There are a few edge-cases to take care of so an existing framework would clearly be better as you say.
首先,您需要一些单独的“worker”服务,该服务将在启动时单独启动,并通过一些本地 IPC(如 UNIX 套接字(快速)或数据库(简单))与 http 请求处理程序进行通信。
在处理请求期间,cgi 询问工作状态或其他数据并重播给客户端。
At first You need some separate "worker" service, which will be started separately at powerup and communicated with http-request handlers via some local IPC like UNIX-socket(fast) or database(simple).
During handling request cgi ask from worker state or other data and replay to client.
您可以通过回复 202 HTTP 代码来表示资源正在“处理”:客户端必须稍后重试才能获取已完成的资源。根据情况,您可能必须发出“请求 ID”才能将请求与响应相匹配。
或者,您可以查看现有的 COMET 库,它可能会更“开箱即用”地满足您的需求。我不确定是否有任何与您当前的 Django 设计相匹配的。
You can signal that a resource is being "worked on" by replying with a 202 HTTP code: the Client side will have to retry later to get the completed resource. Depending on the case, you might have to issue a "request id" in order to match a request with a response.
Alternatively, you could have a look at existing COMET libraries which might fill your needs more "out of the box". I am not sure if there are any that match your current Django design though.
对于您正在使用的 python/django 解决方案来说,这可能不是一个很好的答案,但我们使用 Microsoft Message Queue 来做这样的事情。它基本上像这样运行
无论如何,这就是它的要点。它对我们来说非常可靠,并且非常容易扩展和管理。
-al
Probably not a great answer for the python/django solution you are working with, but we use Microsoft Message Queue for things just like this. It basically runs like this
That's the gist of it anyways. It's been quite reliable for us and really straight forward to scale and manage.
-al
python 和 django 的另一个不错的选择是 Celery。
如果您认为 Celery 太重,无法满足您的需求,那么您可能需要考虑简单的分布式 taskqueue。
Another good option for python and django is Celery.
And if you think that Celery is too heavy for your needs then you might want to look at simple distributed taskqueue.
您应该避免在这里重新发明轮子。
查看 gearman。它拥有多种语言(包括 python)的库,并且相当受欢迎。不确定是否有人有任何开箱即用的方法可以轻松地将 django 连接到 gearman 和 ajax 调用,但自己完成这部分应该不会很复杂。
基本思想是,您运行 gearman 作业服务器(或多个作业服务器),让您的 Web 请求使用一些参数(如“{photo_id: 1234}”)对作业(如“resize_photo”)进行排队。您将此作为后台任务排队。你会得到一个把手。然后,您的 ajax 请求将轮询该句柄值,直到它被标记为完成。
然后你有一个工作程序(或者可能有很多),它是一个单独的Python进程,连接到这个作业服务器,并注册自己的“resize_photo”作业,完成工作,然后将其标记为完成。
我还发现这篇博客文章很好地总结了它的用法。
You should avoid re-inventing the wheel here.
Check out gearman. It has libraries in a lot of languages (including python) and is fairly popular. Not sure if anyone has any out of the box ways to easily connect up django to gearman and ajax calls, but it shouldn't be do complicated to do that part yourself.
The basic idea is that you run the gearman job server (or multiple job servers), have your web request queue up a job (like 'resize_photo') with some arguments (like '{photo_id: 1234}'). You queue this as a background task. You get a handle back. Your ajax request is then going to poll on that handle value until it's marked as complete.
Then you have a worker (or probably many) that is a separate python process connect up to this job server and registers itself for 'resize_photo' jobs, does the work and then marks it as complete.
I also found this blog post that does a pretty good job summarizing it's usage.
您可以尝试两种方法:
关于第二个选项,您可以通过阅读 Comet 了解更多信息;使用 ASP.NET,您可以通过实现 System. Web.IHttpAsyncHandler 接口。
You can try two approachs:
"loading"
and it needs to collect new information every time a new data piece is received.About second option, you can to learn more by reading about Comet; Using ASP.NET, you can do something similiar by implementing System.Web.IHttpAsyncHandler interface.