Web 应用程序任务的异步执行

发布于 2024-08-11 15:56:34 字数 761 浏览 8 评论 0原文

我正在开发的 Web 应用程序需要执行在 http 请求/响应周期内执行时间太长的任务。通常,用户将执行请求,服务器将接受该请求,并运行一些脚本来生成数据(例如,使用 povray 渲染图像)。

当然,这些任务可能需要很长时间,因此在将响应发送到客户端之前,服务器不应挂起以等待脚本完成执行。因此,我需要异步执行脚本,并向客户端提供“资源在这里,但尚未准备好”,并可能告诉它一个 ajax 端点进行轮询,以便它可以在准备好时检索并显示资源。

现在,我的问题与设计无关(尽管我也非常喜欢这方面的任何提示)。我的问题是:解决这个问题的系统是否已经存在,所以我不会重新发明方轮?如果必须的话,我会使用进程队列管理器提交任务并放置一个 HTTP 端点来向 ajax 客户端发送状态,例如“待处理”、“中止”、“已完成”,但如果已经有类似的情况专门为这项任务而存在,我非常喜欢它。

我正在使用 python+django 工作。

编辑:请注意,这里的主要问题不是服务器和客户端必须如何协商和交换有关任务状态的信息。

问题在于服务器如何处理超长任务的提交和排队。换句话说,我需要一个比让我的服务器在 LSF 上提交脚本更好的系统。并不是说它不起作用,但我认为这有点太多了...

编辑2:我添加了赏金,看看是否可以获得其他答案。我检查了 pyprocessing,但无法执行作业提交并在稍后阶段重新连接到队列。

A web application I am developing needs to perform tasks that are too long to be executed during the http request/response cycle. Typically, the user will perform the request, the server will take this request and, among other things, run some scripts to generate data (for example, render images with povray).

Of course, these tasks can take a long time, so the server should not hang for the scripts to complete execution before sending the response to the client. I therefore need to perform the execution of the scripts async, and give the client a "the resource is here, but not ready" and probably tell it a ajax endpoint to poll, so it can retrieve and display the resource when ready.

Now, my question is not relative to the design (although I would very much enjoy any hints on this regard as well). My question is: does a system to solve this issue already exists, so I do not reinvent the square wheel ? If I had to, I would use a process queue manager to submit the task and put a HTTP endpoint to shoot out the status, something like "pending", "aborted", "completed" to the ajax client, but if something similar already exists specifically for this task, I would mostly enjoy it.

I am working in python+django.

Edit: Please note that the main issue here is not how the server and the client must negotiate and exchange information about the status of the task.

The issue is how the server handles the submission and enqueue of very long tasks. In other words, I need a better system than having my server submit scripts on LSF. Not that it would not work, but I think it's a bit too much...

Edit 2: I added a bounty to see if I can get some other answer. I checked pyprocessing, but I cannot perform submission of a job and reconnect to the queue at a later stage.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

情场扛把子 2024-08-18 15:56:35

我不知道有哪个系统可以做到这一点,但实现自己的系统相当容易:

  • 创建一个包含 jobid、jobparameters、jobresult 的数据库表
    • jobresult 是一个字符串,用于保存结果的 pickle
    • jobparameters 是输入参数的腌制列表
  • 当服务器开始处理作业时,它会在表中创建一个新行,并启动一个新进程来处理该行,并传递该进程jobid
  • 任务处理程序进程在完成
  • 网页(xmlrpc 或您正在使用的任何内容)包含一个方法“getResult(jobid)”时更新表中的 jobresult,该方法将检查表中的 jobresult
    • 如果找到结果,则返回结果,并从表中删除该行
    • 否则它将返回一个空列表、或 None 或您首选的返回值以表明作业尚未完成

有一些边缘情况需要处理照顾好现有的框架显然会更好,正如你所说。

I don't know of a system that does it, but it would be fairly easy to implement one's own system:

  • create a database table with jobid, jobparameters, jobresult
    • jobresult is a string that will hold a pickle of the result
    • jobparameters is a pickled list of input arguments
  • when the server starts working on a job, it creates a new row in the table, and spwans a new process to handle that, passing that process the jobid
  • the task handler process updates the jobresult in the table when it has finished
  • a webpage (xmlrpc or whatever you are using) contains a method 'getResult(jobid)' that will check the table for a jobresult
    • if it finds a result, it returns the result, and deletes the row from the table
    • otherwise it returns an empty list, or None, or your preferred return value to signal that the job is not finished yet

There are a few edge-cases to take care of so an existing framework would clearly be better as you say.

晨曦慕雪 2024-08-18 15:56:35

首先,您需要一些单独的“worker”服务,该服务将在启动时单独启动,并通过一些本地 IPC(如 UNIX 套接字(快速)或数据库(简单))与 http 请求处理程序进行通信。

在处理请求期间,cgi 询问工作状态或其他数据并重播给客户端。

At first You need some separate "worker" service, which will be started separately at powerup and communicated with http-request handlers via some local IPC like UNIX-socket(fast) or database(simple).

During handling request cgi ask from worker state or other data and replay to client.

∝单色的世界 2024-08-18 15:56:35

您可以通过回复 202 HTTP 代码来表示资源正在“处理”:客户端必须稍后重试才能获取已完成的资源。根据情况,您可能必须发出“请求 ID”才能将请求与响应相匹配。

或者,您可以查看现有的 COMET 库,它可能会更“开箱即用”地满足您的需求。我不确定是否有任何与您当前的 Django 设计相匹配的。

You can signal that a resource is being "worked on" by replying with a 202 HTTP code: the Client side will have to retry later to get the completed resource. Depending on the case, you might have to issue a "request id" in order to match a request with a response.

Alternatively, you could have a look at existing COMET libraries which might fill your needs more "out of the box". I am not sure if there are any that match your current Django design though.

陌路黄昏 2024-08-18 15:56:35

对于您正在使用的 python/django 解决方案来说,这可能不是一个很好的答案,但我们使用 Microsoft Message Queue 来做这样的事情。它基本上像这样运行

  1. 网站更新某处具有“处理”状态的数据库行
  2. 网站向 MSMQ 发送一条消息(这是一个非阻塞调用,因此它立即将控制权返回给网站)
  3. Windows 服务(实际上可以是任何程序) )正在“监视”MSMQ 并获取消息
  4. Windows 服务以“已完成”状态更新数据库行。

无论如何,这就是它的要点。它对我们来说非常可靠,并且非常容易扩展和管理。

-al

Probably not a great answer for the python/django solution you are working with, but we use Microsoft Message Queue for things just like this. It basically runs like this

  1. Website updates a database row somewhere with a "Processing" status
  2. Website sends a message to the MSMQ (this is a non blocking call so it returns control back to the website right away)
  3. Windows service (could be any program really) is "watching" the MSMQ and gets the message
  4. Windows service updates the database row with a "Finished" status.

That's the gist of it anyways. It's been quite reliable for us and really straight forward to scale and manage.

-al

婴鹅 2024-08-18 15:56:35

python 和 django 的另一个不错的选择是 Celery

如果您认为 Celery 太重,无法满足您的需求,那么您可能需要考虑简单的分布式 taskqueue

Another good option for python and django is Celery.

And if you think that Celery is too heavy for your needs then you might want to look at simple distributed taskqueue.

雨轻弹 2024-08-18 15:56:34

您应该避免在这里重新发明轮子。

查看 gearman。它拥有多种语言(包括 python)的库,并且相当受欢迎。不确定是否有人有任何开箱即用的方法可以轻松地将 django 连接到 gearman 和 ajax 调用,但自己完成这部分应该不会很复杂。

基本思想是,您运行 gearman 作业服务器(或多个作业服务器),让您的 Web 请求使用一些参数(如“{photo_id: 1234}”)对作业(如“resize_photo”)进行排队。您将此作为后台任务排队。你会得到一个把手。然后,您的 ajax 请求将轮询该句柄值,直到它被标记为完成。

然后你有一个工作程序(或者可能有很多),它是一个单独的Python进程,连接到这个作业服务器,并注册自己的“resize_photo”作业,完成工作,然后将其标记为完成。

我还发现这篇博客文章很好地总结了它的用法。

You should avoid re-inventing the wheel here.

Check out gearman. It has libraries in a lot of languages (including python) and is fairly popular. Not sure if anyone has any out of the box ways to easily connect up django to gearman and ajax calls, but it shouldn't be do complicated to do that part yourself.

The basic idea is that you run the gearman job server (or multiple job servers), have your web request queue up a job (like 'resize_photo') with some arguments (like '{photo_id: 1234}'). You queue this as a background task. You get a handle back. Your ajax request is then going to poll on that handle value until it's marked as complete.

Then you have a worker (or probably many) that is a separate python process connect up to this job server and registers itself for 'resize_photo' jobs, does the work and then marks it as complete.

I also found this blog post that does a pretty good job summarizing it's usage.

冷…雨湿花 2024-08-18 15:56:34

您可以尝试两种方法:

  • 每隔n间隔调用网络服务器并告知作业ID;服务器处理并返回有关该任务当前执行的一些信息
  • 要实现长时间运行的页面,每隔n间隔发送数据;对于客户端来说,该 HTTP 请求将“始终”为“正在加载”,并且每次收到新数据时都需要收集新信息。

关于第二个选项,您可以通过阅读 Comet 了解更多信息;使用 ASP.NET,您可以通过实现 System. Web.IHttpAsyncHandler 接口。

You can try two approachs:

  • To call webserver every n interval and inform a job id; server processes and return some information about current execution of that task
  • To implement a long running page, sending data every n interval; for client, that HTTP request will "always" be "loading" and it needs to collect new information every time a new data piece is received.

About second option, you can to learn more by reading about Comet; Using ASP.NET, you can do something similiar by implementing System.Web.IHttpAsyncHandler interface.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文