Django:我应该启动一个单独的进程吗?

发布于 2024-10-05 06:02:10 字数 265 浏览 2 评论 0原文

我正在编写一个应用程序,允许用户上传文件中的数据;该应用程序将处理这些数据,并将结果通过电子邮件发送给用户。处理可能需要一些时间,所以我想在 Python 脚本中单独处理这个问题,而不是在视图中等待它完成。 Python 脚本和视图不需要通信,因为脚本将从视图写入的文件中获取数据。该视图只会显示一条消息,例如“感谢您上传数据 - 结果将通过电子邮件发送给您”

在 Django 中执行此操作的最佳方法是什么?产生一个单独的进程?将一些东西放入队列中?

一些示例代码将不胜感激。谢谢。

I'm writing an app that will allow the user to upload data in a file; the app will process this data, and email the results to the user. Processing may take some time, so I would like to handle this separately in a Python script rather than wait in the view for it to complete. The Python script and view don't need to communicate as the script will pick up the data from a file written by the view. The view will just put up a message like "Thanks for uploading your data - the results will be emailed to you"

What's the best way to do this in Django? Spawn off a separate process? Put something on a queue?

Some example code would be greatly appreciated. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

尽揽少女心 2024-10-12 06:02:11

最简单的解决方案是编写一个自定义 命令 来搜索对于所有未处理的文件,对其进行处理,然后通过电子邮件发送给用户。管理命令在 Django 框架内运行,因此它们可以访问所有模型、数据库连接等,但您可以从任何地方调用它们,例如 crontab。

如果您关心文件上传和处理开始之间的时间范围,您可以使用像 Celery 这样的框架,它是基本上是一个帮助程序库,用于使用消息队列并运行监听队列的工作程序。这将是相当低的延迟,但另一方面,简单性对您来说可能更重要。

我强烈建议不要在您的视图中启动线程或生成进程,因为线程将在 django 进程内运行,并且可能会破坏您的网络服务器(取决于您的配置)。子进程会继承 Django 进程的所有内容,这可能是您不想要的。最好将这些东西分开存放。

The simplest possible solution is to write a custom commands that searches for all the un-processed files, processes them and then emails the user. The management commands runs inside the Django framework so they have access to all models, db connections, etc, but you can call them from wherever, for example crontab.

If you care about the timeframe between the file has been uploaded and processing starts, you could use a framework like Celery, which is basically a helper library for using a message queue and running workers listening in on the queue. This would be pretty low latency, but on the other hand, simplicity might be more important for you.

I would strongly advice against starting threads or spawning processes in your views, as the threads would be running inside the django process and could destroy your webserver(depending on your configuration). The child process would inherit everything from the Django process, which you probably don't want. It is better to keep this stuff separate.

2024-10-12 06:02:11

我目前有一个具有类似要求的项目(只是更复杂^^)。

切勿从 Django 视图中生成子进程或线程。您无法控制 Django 进程,它可能会在任务结束之前被终止、暂停等。它由 Web 服务器控制(例如通过 WSGI 的 apache)。

我要做的是一个外部脚本,它将在单独的进程中运行。我认为你有两种解决方案:

  • 一个始终运行并爬行放置文件的目录的进程。例如,它会每十秒检查一次目录并处理文件
  • 与上面相同,但每 x 秒由 cron 运行一次。这基本上具有相同的效果
  • 使用 Celery 创建工作进程并将作业添加到 Django 应用程序的队列中。然后,您需要通过 Celery 可用的方法之一取回结果。

现在您可能需要访问 Django 模型中的信息,以便最终向用户发送电子邮件。这里有几种解决方案:

  • 从外部脚本导入模块(模型等)
  • 将外部脚本实现为自定义命令(如 knutin 建议)
  • 通过 POST 请求将结果传达给 Django 应用程序。然后,您可以在普通的 Django 视图中执行电子邮件发送和状态更改等操作。

我会选择外部流程并导入模块或 POST 请求。这样就灵活多了。例如,您可以使用多处理模块同时处理多个文件(从而有效地使用多核机器)。

基本工作流程是:

  1. 检查目录中是否有新文件
  2. 对于每个文件(可以并行化):
    1. 流程
    2. 发送电子邮件或通知您的 Django 应用程序
  3. 休眠一段时间

我的项目包含真正需要 CPU 的处理。我目前使用一个外部进程,它将处理作业提供给工作进程池(这基本上就是 Celery 可以为您做的事情),并通过 POST 请求将进度和结果报告回 Django 应用程序。它工作得非常好并且相对可扩展,但我很快就会将其更改为在集群上使用 Celery。

I currently have a project with similar requirements (just more complicated^^).

Never spawn a subprocess or thread from your Django view. You have no control of the Django processes and it could be killed, paused etc before the end of the task. It is controlled by the web server (e.g. apache via WSGI).

What I would do is an external script, which would run in a separate process. You have two solutions I think :

  • A process that is always running and crawling the directory where you put your files. It would for example check the directory every ten seconds and process the files
  • Same as above, but run by cron every x seconds. This basically has the same effect
  • Use Celery to create worker processes and add jobs to the queue with your Django application. Then you will need to get the results back by one of the means available with Celery.

Now you probably need to access the information in Django models to email the user in the end. Here you have several solutions :

  • Import your modules (models etc) from the external script
  • Implement the external script as a custom command (as knutin suggested)
  • Communicate the results to the Django application via a POST request for example. Then you would do the email sending and status changes etc in a normal Django view.

I would go for an external process and import the modules or POST request. This way it is much more flexible. You could for example make use of the multiprocessing module to process several files in the same time (thus using multi-core machines efficiently).

A basic workflow would be:

  1. Check the directory for new files
  2. For each file (can be parallelized):
    1. Process
    2. Send email or notify your Django application
  3. Sleep for a while

My project contains really CPU-demanding processing. I currently use an external process that gives processing jobs to a pool of worker processes (that's basically what Celery could do for you) and reports the progress and results back to the Django application via POST requests. It works really well and is relatively scalable, but I will soon change it to use Celery on a cluster.

是你 2024-10-12 06:02:11

您可以生成一个 线程 来进行处理。它实际上与 Django 没有太大关系;视图函数需要启动工作线程,仅此而已。

如果您确实想要一个单独的进程,则需要 subprocess 模块。但您真的需要重定向标准 I/O 或允许外部进程控制吗?

示例:

from threading import Thread
from MySlowThing import SlowProcessingFunction # or whatever you call it

# ...

Thread(target=SlowProcessingFunction, args=(), kwargs={}).start()

我还没有编写过不想跟踪线程进度的程序,所以我不知道在不将 Thread 对象存储在某处的情况下这是否有效。如果您需要这样做,那非常简单:

allThreads = []

# ...

global allThreads
thread = Thread(target=SlowProcessingFunction, args=(), kwargs={})
thread.start()
allThreads.append(thread)

thread.is_alive() 返回 False 时,您可以从列表中删除线程:

def cull_threads():
    global allThreads
    allThreads = [thread for thread in allThreads if thread.is_alive()]

You could spawn a thread to do the processing. It wouldn't really have much to do with Django; the view function would need to kick off the worker thread and that's it.

If you really want a separate process, you'll need the subprocess module. But do you really need to redirect standard I/O or allow external process control?

Example:

from threading import Thread
from MySlowThing import SlowProcessingFunction # or whatever you call it

# ...

Thread(target=SlowProcessingFunction, args=(), kwargs={}).start()

I haven't done a program where I didn't want to track the threads' progress, so I don't know if this works without storing the Thread object somewhere. If you need to do that, it's pretty simple:

allThreads = []

# ...

global allThreads
thread = Thread(target=SlowProcessingFunction, args=(), kwargs={})
thread.start()
allThreads.append(thread)

You can remove threads from the list when thread.is_alive() returns False:

def cull_threads():
    global allThreads
    allThreads = [thread for thread in allThreads if thread.is_alive()]
凉宸 2024-10-12 06:02:11

您可以使用多处理。 http://docs.python.org/library/multiprocessing.html

本质上,

def _pony_express(objs, action, user, foo=None):
    # unleash the beasts

def bulk_action(request, t):

    ...
    objs = model.objects.filter(pk__in=pks)

    if request.method == 'POST':
        objs.update(is_processing=True)

        from multiprocessing import Process
        p = Process(target=_pony_express, args=(objs, action, request.user), kwargs={'foo': foo})
        p.start()

        return HttpResponseRedirect(next_url)

    context = {'t': t, 'action': action, 'objs': objs, 'model': model}
    return render_to_response(...)

You could use multiprocessing. http://docs.python.org/library/multiprocessing.html

Essentially,

def _pony_express(objs, action, user, foo=None):
    # unleash the beasts

def bulk_action(request, t):

    ...
    objs = model.objects.filter(pk__in=pks)

    if request.method == 'POST':
        objs.update(is_processing=True)

        from multiprocessing import Process
        p = Process(target=_pony_express, args=(objs, action, request.user), kwargs={'foo': foo})
        p.start()

        return HttpResponseRedirect(next_url)

    context = {'t': t, 'action': action, 'objs': objs, 'model': model}
    return render_to_response(...)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文