在Django中,如何调用启动时间较慢的子进程
假设您在 Linux 上运行 Django,并且您有一个视图,并且您希望该视图从对文件进行操作的名为 cmd 的子进程返回数据视图创建的内容,例如如下所示:
def call_subprocess(request):
response = HttpResponse()
with tempfile.NamedTemporaryFile("W") as f:
f.write(request.GET['data']) # i.e. some data
# cmd operates on fname and returns output
p = subprocess.Popen(["cmd", f.name],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate()
response.write(p.out) # would be text/plain...
return response
现在,假设 cmd 的启动时间非常慢,但运行时间非常快,并且它本身没有守护进程模式。我想改进此视图的响应时间。
我希望通过在工作池中启动多个 cmd 实例,让它们等待输入,并让 < strong>call_process 要求这些工作池进程之一处理数据。
这实际上分为 2 部分:
第 1 部分。调用 cmd 和 cmd 的函数等待输入。这可以通过管道来完成,即
def _run_subcmd():
p = subprocess.Popen(["cmd", fname],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
# write 'out' to a tmp file
o = open("out.txt", "W")
o.write(out)
o.close()
p.close()
exit()
def _run_cmd(data):
f = tempfile.NamedTemporaryFile("W")
pipe = os.mkfifo(f.name)
if os.fork() == 0:
_run_subcmd(fname)
else:
f.write(data)
r = open("out.txt", "r")
out = r.read()
# read 'out' from a tmp file
return out
def call_process(request):
response = HttpResponse()
out = _run_cmd(request.GET['data'])
response.write(out) # would be text/plain...
return response
第 2 部分。一组在后台运行、等待数据的工作人员。即我们希望扩展上述内容,以便子进程已经在运行,例如,当 Django 实例初始化时,或者第一次调用此 call_process 时,将创建一组这些工作人员
WORKER_COUNT = 6
WORKERS = []
class Worker(object):
def __init__(index):
self.tmp_file = tempfile.NamedTemporaryFile("W") # get a tmp file name
os.mkfifo(self.tmp_file.name)
self.p = subprocess.Popen(["cmd", self.tmp_file],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
self.index = index
def run(out_filename, data):
WORKERS[self.index] = Null # qua-mutex??
self.tmp_file.write(data)
if (os.fork() == 0): # does the child have access to self.p??
out, err = self.p.communicate()
o = open(out_filename, "w")
o.write(out)
exit()
self.p.close()
self.o.close()
self.tmp_file.close()
WORKERS[self.index] = Worker(index) # replace this one
return out_file
@classmethod
def get_worker() # get the next worker
# ... static, incrementing index
应该在某个地方对工作人员进行一些初始化,如下所示:
def init_workers(): # create WORKERS_COUNT workers
for i in xrange(0, WORKERS_COUNT):
tmp_file = tempfile.NamedTemporaryFile()
WORKERS.push(Worker(i))
现在,我上面的内容变成了这样:
def _run_cmd(data):
Worker.get_worker() # this needs to be atomic & lock worker at Worker.index
fifo = open(tempfile.NamedTemporaryFile("r")) # this stores output of cmd
Worker.run(fifo.name, data)
# please ignore the fact that everything will be
# appended to out.txt ... these will be tmp files, too, but named elsewhere.
out = fifo.read()
# read 'out' from a tmp file
return out
def call_process(request):
response = HttpResponse()
out = _run_cmd(request.GET['data'])
response.write(out) # would be text/plain...
return response
现在,问题是:
这会起作用吗? (我刚刚将其输入 StackOverflow,所以我确信存在问题,但从概念上讲,它会起作用)
需要寻找哪些问题?
有更好的替代方案吗?例如,线程也可以工作吗(Debian Lenny Linux)?是否有任何库可以处理这样的并行进程工作池?
是否有我应该注意的与 Django 的交互?
感谢您的阅读!我希望你和我一样觉得这个问题很有趣。
布莱恩
Suppose you're running Django on Linux, and you've got a view, and you want that view to return the data from a subprocess called cmd that operates on a file that the view creates, for example likeso:
def call_subprocess(request):
response = HttpResponse()
with tempfile.NamedTemporaryFile("W") as f:
f.write(request.GET['data']) # i.e. some data
# cmd operates on fname and returns output
p = subprocess.Popen(["cmd", f.name],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate()
response.write(p.out) # would be text/plain...
return response
Now, suppose cmd has a very slow start-up time, but a very fast operating time, and it does not natively have a daemon mode. I would like to improve the response-time of this view.
I would like to make the whole system would run much faster by starting up a number of instances of cmd in a worker-pool, have them wait for input, and having call_process ask one of those worker pool processes handle the data.
This is really 2 parts:
Part 1. A function that calls cmd and cmd waits for input. This could be done with pipes, i.e.
def _run_subcmd():
p = subprocess.Popen(["cmd", fname],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
# write 'out' to a tmp file
o = open("out.txt", "W")
o.write(out)
o.close()
p.close()
exit()
def _run_cmd(data):
f = tempfile.NamedTemporaryFile("W")
pipe = os.mkfifo(f.name)
if os.fork() == 0:
_run_subcmd(fname)
else:
f.write(data)
r = open("out.txt", "r")
out = r.read()
# read 'out' from a tmp file
return out
def call_process(request):
response = HttpResponse()
out = _run_cmd(request.GET['data'])
response.write(out) # would be text/plain...
return response
Part 2. A set of workers running in the background that are waiting on the data. i.e. We want to extend the above so that the subprocess is already running, e.g. when the Django instance initializes, or this call_process is first called, a set of these workers is created
WORKER_COUNT = 6
WORKERS = []
class Worker(object):
def __init__(index):
self.tmp_file = tempfile.NamedTemporaryFile("W") # get a tmp file name
os.mkfifo(self.tmp_file.name)
self.p = subprocess.Popen(["cmd", self.tmp_file],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
self.index = index
def run(out_filename, data):
WORKERS[self.index] = Null # qua-mutex??
self.tmp_file.write(data)
if (os.fork() == 0): # does the child have access to self.p??
out, err = self.p.communicate()
o = open(out_filename, "w")
o.write(out)
exit()
self.p.close()
self.o.close()
self.tmp_file.close()
WORKERS[self.index] = Worker(index) # replace this one
return out_file
@classmethod
def get_worker() # get the next worker
# ... static, incrementing index
There should be some initialization of workers somewhere, like this:
def init_workers(): # create WORKERS_COUNT workers
for i in xrange(0, WORKERS_COUNT):
tmp_file = tempfile.NamedTemporaryFile()
WORKERS.push(Worker(i))
Now, what I have above becomes something likeso:
def _run_cmd(data):
Worker.get_worker() # this needs to be atomic & lock worker at Worker.index
fifo = open(tempfile.NamedTemporaryFile("r")) # this stores output of cmd
Worker.run(fifo.name, data)
# please ignore the fact that everything will be
# appended to out.txt ... these will be tmp files, too, but named elsewhere.
out = fifo.read()
# read 'out' from a tmp file
return out
def call_process(request):
response = HttpResponse()
out = _run_cmd(request.GET['data'])
response.write(out) # would be text/plain...
return response
Now, the questions:
Will this work? (I've just typed this off the top of my head into StackOverflow, so I'm sure there are problems, but conceptually, will it work)
What are the problems to look for?
Are there better alternatives to this? e.g. Could threads work just as well (it's Debian Lenny Linux)? Are there any libraries that handle parallel process worker-pools like this?
Are there interactions with Django that I ought to be conscious of?
Thanks for reading! I hope you find this as interesting a problem as I do.
Brian
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看起来我好像在押注这个产品,因为这是我第二次回应推荐这个产品。
但似乎您需要消息队列服务,特别是分布式消息队列。
它的工作原理如下:
大多数代码都存在,并且您不必构建自己的系统。
看一下 Celery,它最初是用 Django 构建的。
http://www.celeryq.org/
http://robertpogorzelski.com/blog/2009/09 /10/rabbitmq-celery-and-django/
It may seem like i am punting this product as this is the second time i have responded with a recommendation of this.
But it seems like you need a Message Queing service, in particular a distributed message queue.
ere is how it will work:
Most of this code exists, and you dont have to go about building your own system.
Have a look at Celery which was initially built with Django.
http://www.celeryq.org/
http://robertpogorzelski.com/blog/2009/09/10/rabbitmq-celery-and-django/
Issy 已经提到了 Celery,但由于评论效果不佳
使用代码示例,我将回复作为答案。
您应该尝试将 Celery 与 AMQP 结果存储同步使用。
您可以将实际执行分发到另一个进程甚至另一台机器。在 celery 中同步执行很容易,例如:
AMQP 结果存储使得发回结果非常快,
但它仅在当前开发版本中可用(在代码冻结中成为
0.8.0)
Issy already mentioned Celery, but since comments doesn't work well
with code samples, I'll reply as an answer instead.
You should try to use Celery synchronously with the AMQP result store.
You could distribute the actual execution to another process or even another machine. Executing synchronously in celery is easy, e.g.:
The AMQP result store makes sending back the result very fast,
but it's only available in the current development version (in code-freeze to become
0.8.0)
如何使用 python-daemon 或其后继者“守护”子进程调用,< a href="http://www.clapper.org/software/python/grizzled/" rel="nofollow noreferrer">灰白。
How about "daemonizing" the subprocess call using python-daemon or its successor, grizzled.