在Django中，如何调用启动时间较慢的子进程

发布于 2024-08-04 15:36:11 字数 3916 浏览 6 评论 0原文

假设您在 Linux 上运行 Django，并且您有一个视图，并且您希望该视图从对文件进行操作的名为 cmd 的子进程返回数据视图创建的内容，例如如下所示：

 def call_subprocess(request):
     response = HttpResponse()

     with tempfile.NamedTemporaryFile("W") as f:
         f.write(request.GET['data']) # i.e. some data

     # cmd operates on fname and returns output
     p = subprocess.Popen(["cmd", f.name], 
                   stdout=subprocess.PIPE, 
                   stderr=subprocess.PIPE)

     out, err = p.communicate()

     response.write(p.out) # would be text/plain...
     return response

现在，假设 cmd 的启动时间非常慢，但运行时间非常快，并且它本身没有守护进程模式。我想改进此视图的响应时间。

我希望通过在工作池中启动多个 cmd 实例，让它们等待输入，并让 < strong>call_process 要求这些工作池进程之一处理数据。

这实际上分为 2 部分：

第 1 部分。调用 cmd 和 cmd 的函数等待输入。这可以通过管道来完成，即

def _run_subcmd():
    p = subprocess.Popen(["cmd", fname], 
        stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    out, err = p.communicate()
    # write 'out' to a tmp file
    o = open("out.txt", "W")
    o.write(out)
    o.close()
    p.close()
    exit()

def _run_cmd(data):
    f = tempfile.NamedTemporaryFile("W")
    pipe = os.mkfifo(f.name)

    if os.fork() == 0:
        _run_subcmd(fname)
    else:
        f.write(data)

    r = open("out.txt", "r")
    out = r.read()
    # read 'out' from a tmp file
    return out

def call_process(request):
    response = HttpResponse()

    out = _run_cmd(request.GET['data'])

    response.write(out) # would be text/plain...
    return response

第 2 部分。一组在后台运行、等待数据的工作人员。即我们希望扩展上述内容，以便子进程已经在运行，例如，当 Django 实例初始化时，或者第一次调用此 call_process 时，将创建一组这些工作人员

WORKER_COUNT = 6
WORKERS = []

class Worker(object):
    def __init__(index):
        self.tmp_file = tempfile.NamedTemporaryFile("W") # get a tmp file name
        os.mkfifo(self.tmp_file.name)
        self.p = subprocess.Popen(["cmd", self.tmp_file], 
            stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        self.index = index

    def run(out_filename, data):
        WORKERS[self.index] = Null # qua-mutex??
        self.tmp_file.write(data)
        if (os.fork() == 0): # does the child have access to self.p??
            out, err = self.p.communicate()
            o = open(out_filename, "w")
            o.write(out)
            exit()

        self.p.close()
        self.o.close()
        self.tmp_file.close()
        WORKERS[self.index] = Worker(index) # replace this one
        return out_file

    @classmethod
    def get_worker() # get the next worker
    # ... static, incrementing index

应该在某个地方对工作人员进行一些初始化，如下所示：

def init_workers(): # create WORKERS_COUNT workers
    for i in xrange(0, WORKERS_COUNT):
        tmp_file = tempfile.NamedTemporaryFile()
        WORKERS.push(Worker(i))

现在，我上面的内容变成了这样：

def _run_cmd(data):
     Worker.get_worker() # this needs to be atomic & lock worker at Worker.index

     fifo = open(tempfile.NamedTemporaryFile("r")) # this stores output of cmd

     Worker.run(fifo.name, data)
     # please ignore the fact that everything will be
     # appended to out.txt ... these will be tmp files, too, but named elsewhere.

     out = fifo.read()
     # read 'out' from a tmp file
     return out


def call_process(request):
     response = HttpResponse()

     out = _run_cmd(request.GET['data'])

     response.write(out) # would be text/plain...
     return response

现在，问题是：

这会起作用吗？（我刚刚将其输入 StackOverflow，所以我确信存在问题，但从概念上讲，它会起作用）
需要寻找哪些问题？
有更好的替代方案吗？例如，线程也可以工作吗（Debian Lenny Linux）？是否有任何库可以处理这样的并行进程工作池？
是否有我应该注意的与 Django 的交互？

感谢您的阅读！我希望你和我一样觉得这个问题很有趣。

布莱恩

原文

Suppose you're running Django on Linux, and you've got a view, and you want that view to return the data from a subprocess called cmd that operates on a file that the view creates, for example likeso:

 def call_subprocess(request):
     response = HttpResponse()

     with tempfile.NamedTemporaryFile("W") as f:
         f.write(request.GET['data']) # i.e. some data

     # cmd operates on fname and returns output
     p = subprocess.Popen(["cmd", f.name], 
                   stdout=subprocess.PIPE, 
                   stderr=subprocess.PIPE)

     out, err = p.communicate()

     response.write(p.out) # would be text/plain...
     return response

Now, suppose cmd has a very slow start-up time, but a very fast operating time, and it does not natively have a daemon mode. I would like to improve the response-time of this view.

I would like to make the whole system would run much faster by starting up a number of instances of cmd in a worker-pool, have them wait for input, and having call_process ask one of those worker pool processes handle the data.

This is really 2 parts:

Part 1. A function that calls cmd and cmd waits for input. This could be done with pipes, i.e.

def _run_subcmd():
    p = subprocess.Popen(["cmd", fname], 
        stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    out, err = p.communicate()
    # write 'out' to a tmp file
    o = open("out.txt", "W")
    o.write(out)
    o.close()
    p.close()
    exit()

def _run_cmd(data):
    f = tempfile.NamedTemporaryFile("W")
    pipe = os.mkfifo(f.name)

    if os.fork() == 0:
        _run_subcmd(fname)
    else:
        f.write(data)

    r = open("out.txt", "r")
    out = r.read()
    # read 'out' from a tmp file
    return out

def call_process(request):
    response = HttpResponse()

    out = _run_cmd(request.GET['data'])

    response.write(out) # would be text/plain...
    return response

Part 2. A set of workers running in the background that are waiting on the data. i.e. We want to extend the above so that the subprocess is already running, e.g. when the Django instance initializes, or this call_process is first called, a set of these workers is created

WORKER_COUNT = 6
WORKERS = []

class Worker(object):
    def __init__(index):
        self.tmp_file = tempfile.NamedTemporaryFile("W") # get a tmp file name
        os.mkfifo(self.tmp_file.name)
        self.p = subprocess.Popen(["cmd", self.tmp_file], 
            stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        self.index = index

    def run(out_filename, data):
        WORKERS[self.index] = Null # qua-mutex??
        self.tmp_file.write(data)
        if (os.fork() == 0): # does the child have access to self.p??
            out, err = self.p.communicate()
            o = open(out_filename, "w")
            o.write(out)
            exit()

        self.p.close()
        self.o.close()
        self.tmp_file.close()
        WORKERS[self.index] = Worker(index) # replace this one
        return out_file

    @classmethod
    def get_worker() # get the next worker
    # ... static, incrementing index

There should be some initialization of workers somewhere, like this:

def init_workers(): # create WORKERS_COUNT workers
    for i in xrange(0, WORKERS_COUNT):
        tmp_file = tempfile.NamedTemporaryFile()
        WORKERS.push(Worker(i))

Now, what I have above becomes something likeso:

def _run_cmd(data):
     Worker.get_worker() # this needs to be atomic & lock worker at Worker.index

     fifo = open(tempfile.NamedTemporaryFile("r")) # this stores output of cmd

     Worker.run(fifo.name, data)
     # please ignore the fact that everything will be
     # appended to out.txt ... these will be tmp files, too, but named elsewhere.

     out = fifo.read()
     # read 'out' from a tmp file
     return out


def call_process(request):
     response = HttpResponse()

     out = _run_cmd(request.GET['data'])

     response.write(out) # would be text/plain...
     return response

Now, the questions:

Will this work? (I've just typed this off the top of my head into StackOverflow, so I'm sure there are problems, but conceptually, will it work)
What are the problems to look for?
Are there better alternatives to this? e.g. Could threads work just as well (it's Debian Lenny Linux)? Are there any libraries that handle parallel process worker-pools like this?
Are there interactions with Django that I ought to be conscious of?

Thanks for reading! I hope you find this as interesting a problem as I do.

Brian

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淑女气质 2024-08-11 15:36:11

看起来我好像在押注这个产品，因为这是我第二次回应推荐这个产品。

但似乎您需要消息队列服务，特别是分布式消息队列。

它的工作原理如下：

您的 Django 应用程序请求 CMD
CMD 被添加到队列
CMD 被推送到多个工作
它被执行并返回上游结果

大多数代码都存在，并且您不必构建自己的系统。

看一下 Celery，它最初是用 Django 构建的。

http://www.celeryq.org/
http://robertpogorzelski.com/blog/2009/09 /10/rabbitmq-celery-and-django/

回复收藏 0 原文

演出会有结束 2024-08-11 15:36:11

Issy 已经提到了 Celery，但由于评论效果不佳
使用代码示例，我将回复作为答案。

您应该尝试将 Celery 与 AMQP 结果存储同步使用。
您可以将实际执行分发到另一个进程甚至另一台机器。在 celery 中同步执行很容易，例如：

>>> from celery.task import Task
>>> from celery.registry import tasks

>>> class MyTask(Task):
...
...     def run(self, x, y):
...         return x * y 
>>> tasks.register(MyTask)

>>> async_result = MyTask.delay(2, 2)
>>> retval = async_result.get() # Now synchronous
>>> retval 4

AMQP 结果存储使得发回结果非常快，
但它仅在当前开发版本中可用（在代码冻结中成为
0.8.0)

Issy already mentioned Celery, but since comments doesn't work well
with code samples, I'll reply as an answer instead.

You should try to use Celery synchronously with the AMQP result store.
You could distribute the actual execution to another process or even another machine. Executing synchronously in celery is easy, e.g.:

>>> from celery.task import Task
>>> from celery.registry import tasks

>>> class MyTask(Task):
...
...     def run(self, x, y):
...         return x * y 
>>> tasks.register(MyTask)

>>> async_result = MyTask.delay(2, 2)
>>> retval = async_result.get() # Now synchronous
>>> retval 4

The AMQP result store makes sending back the result very fast,
but it's only available in the current development version (in code-freeze to become
0.8.0)

回复收藏 0 原文