Django：我应该启动一个单独的进程吗？

发布于 2024-10-05 06:02:10 字数 265 浏览 2 评论 0原文

我正在编写一个应用程序，允许用户上传文件中的数据；该应用程序将处理这些数据，并将结果通过电子邮件发送给用户。处理可能需要一些时间，所以我想在 Python 脚本中单独处理这个问题，而不是在视图中等待它完成。 Python 脚本和视图不需要通信，因为脚本将从视图写入的文件中获取数据。该视图只会显示一条消息，例如“感谢您上传数据 - 结果将通过电子邮件发送给您”

在 Django 中执行此操作的最佳方法是什么？产生一个单独的进程？将一些东西放入队列中？

一些示例代码将不胜感激。谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尽揽少女心 2024-10-12 06:02:11

最简单的解决方案是编写一个自定义命令来搜索对于所有未处理的文件，对其进行处理，然后通过电子邮件发送给用户。管理命令在 Django 框架内运行，因此它们可以访问所有模型、数据库连接等，但您可以从任何地方调用它们，例如 crontab。

如果您关心文件上传和处理开始之间的时间范围，您可以使用像 Celery 这样的框架，它是基本上是一个帮助程序库，用于使用消息队列并运行监听队列的工作程序。这将是相当低的延迟，但另一方面，简单性对您来说可能更重要。

我强烈建议不要在您的视图中启动线程或生成进程，因为线程将在 django 进程内运行，并且可能会破坏您的网络服务器（取决于您的配置）。子进程会继承 Django 进程的所有内容，这可能是您不想要的。最好将这些东西分开存放。

回复收藏 0 原文

颜 2024-10-12 06:02:11

我目前有一个具有类似要求的项目（只是更复杂^^）。

切勿从 Django 视图中生成子进程或线程。您无法控制 Django 进程，它可能会在任务结束之前被终止、暂停等。它由 Web 服务器控制（例如通过 WSGI 的 apache）。

我要做的是一个外部脚本，它将在单独的进程中运行。我认为你有两种解决方案：

一个始终运行并爬行放置文件的目录的进程。例如，它会每十秒检查一次目录并处理文件
与上面相同，但每 x 秒由 cron 运行一次。这基本上具有相同的效果
使用 Celery 创建工作进程并将作业添加到 Django 应用程序的队列中。然后，您需要通过 Celery 可用的方法之一取回结果。

现在您可能需要访问 Django 模型中的信息，以便最终向用户发送电子邮件。这里有几种解决方案：

从外部脚本导入模块（模型等）
将外部脚本实现为自定义命令（如 knutin 建议）
通过 POST 请求将结果传达给 Django 应用程序。然后，您可以在普通的 Django 视图中执行电子邮件发送和状态更改等操作。

我会选择外部流程并导入模块或 POST 请求。这样就灵活多了。例如，您可以使用多处理模块同时处理多个文件（从而有效地使用多核机器）。

基本工作流程是：

检查目录中是否有新文件
对于每个文件（可以并行化）：
1. 流程
2. 发送电子邮件或通知您的 Django 应用程序
休眠一段时间

我的项目包含真正需要 CPU 的处理。我目前使用一个外部进程，它将处理作业提供给工作进程池（这基本上就是 Celery 可以为您做的事情），并通过 POST 请求将进度和结果报告回 Django 应用程序。它工作得非常好并且相对可扩展，但我很快就会将其更改为在集群上使用 Celery。

I currently have a project with similar requirements (just more complicated^^).

Never spawn a subprocess or thread from your Django view. You have no control of the Django processes and it could be killed, paused etc before the end of the task. It is controlled by the web server (e.g. apache via WSGI).

What I would do is an external script, which would run in a separate process. You have two solutions I think :

A process that is always running and crawling the directory where you put your files. It would for example check the directory every ten seconds and process the files
Same as above, but run by cron every x seconds. This basically has the same effect
Use Celery to create worker processes and add jobs to the queue with your Django application. Then you will need to get the results back by one of the means available with Celery.

Now you probably need to access the information in Django models to email the user in the end. Here you have several solutions :

Import your modules (models etc) from the external script
Implement the external script as a custom command (as knutin suggested)
Communicate the results to the Django application via a POST request for example. Then you would do the email sending and status changes etc in a normal Django view.

I would go for an external process and import the modules or POST request. This way it is much more flexible. You could for example make use of the multiprocessing module to process several files in the same time (thus using multi-core machines efficiently).

A basic workflow would be:

Check the directory for new files
For each file (can be parallelized):
1. Process
2. Send email or notify your Django application
Sleep for a while

My project contains really CPU-demanding processing. I currently use an external process that gives processing jobs to a pool of worker processes (that's basically what Celery could do for you) and reports the progress and results back to the Django application via POST requests. It works really well and is relatively scalable, but I will soon change it to use Celery on a cluster.

回复收藏 0 原文

是你 2024-10-12 06:02:11

您可以生成一个线程来进行处理。它实际上与 Django 没有太大关系；视图函数需要启动工作线程，仅此而已。

如果您确实想要一个单独的进程，则需要 subprocess 模块。但您真的需要重定向标准 I/O 或允许外部进程控制吗？

示例：

from threading import Thread
from MySlowThing import SlowProcessingFunction # or whatever you call it

# ...

Thread(target=SlowProcessingFunction, args=(), kwargs={}).start()

我还没有编写过不想跟踪线程进度的程序，所以我不知道在不将 Thread 对象存储在某处的情况下这是否有效。如果您需要这样做，那非常简单：

allThreads = []

# ...

global allThreads
thread = Thread(target=SlowProcessingFunction, args=(), kwargs={})
thread.start()
allThreads.append(thread)

当 thread.is_alive() 返回 False 时，您可以从列表中删除线程：

def cull_threads():
    global allThreads
    allThreads = [thread for thread in allThreads if thread.is_alive()]

You could spawn a thread to do the processing. It wouldn't really have much to do with Django; the view function would need to kick off the worker thread and that's it.

If you really want a separate process, you'll need the subprocess module. But do you really need to redirect standard I/O or allow external process control?

Example:

from threading import Thread
from MySlowThing import SlowProcessingFunction # or whatever you call it

# ...

Thread(target=SlowProcessingFunction, args=(), kwargs={}).start()

I haven't done a program where I didn't want to track the threads' progress, so I don't know if this works without storing the Thread object somewhere. If you need to do that, it's pretty simple:

allThreads = []

# ...

global allThreads
thread = Thread(target=SlowProcessingFunction, args=(), kwargs={})
thread.start()
allThreads.append(thread)

You can remove threads from the list when thread.is_alive() returns False:

def cull_threads():
    global allThreads
    allThreads = [thread for thread in allThreads if thread.is_alive()]

回复收藏 0 原文

凉宸 2024-10-12 06:02:11

您可以使用多处理。 http://docs.python.org/library/multiprocessing.html

本质上，

def _pony_express(objs, action, user, foo=None):
    # unleash the beasts

def bulk_action(request, t):

    ...
    objs = model.objects.filter(pk__in=pks)

    if request.method == 'POST':
        objs.update(is_processing=True)

        from multiprocessing import Process
        p = Process(target=_pony_express, args=(objs, action, request.user), kwargs={'foo': foo})
        p.start()

        return HttpResponseRedirect(next_url)

    context = {'t': t, 'action': action, 'objs': objs, 'model': model}
    return render_to_response(...)

You could use multiprocessing. http://docs.python.org/library/multiprocessing.html

Essentially,

def _pony_express(objs, action, user, foo=None):
    # unleash the beasts

def bulk_action(request, t):

    ...
    objs = model.objects.filter(pk__in=pks)

    if request.method == 'POST':
        objs.update(is_processing=True)

        from multiprocessing import Process
        p = Process(target=_pony_express, args=(objs, action, request.user), kwargs={'foo': foo})
        p.start()

        return HttpResponseRedirect(next_url)

    context = {'t': t, 'action': action, 'objs': objs, 'model': model}
    return render_to_response(...)

回复收藏 0 原文

~没有更多了~