多处理池挂起且无法脱离应用程序
我确信这是一个菜鸟错误,但我无法弄清楚我在多处理方面做错了什么。我有这个代码(只是坐在周围,什么都不做)
if __name__ == '__main__':
pool = Pool(processes=4)
for i, x in enumerate(data):
pool.apply_async(new_awesome_function, (i, x))
pool.close()
pool.join()
数据是一个列表([1,2,3,4,5]),我试图将列表发送到多个CPU上完成的每个项目,但是当我将工作命令包装到一个函数中并发送此代码,它不会执行任何操作(当我在没有上述代码的情况下调用函数本身时,它工作正常)。所以我认为我使用了多重处理错误(尽管我从网站上获取了示例),有什么建议吗?
更新:我注意到当它用 control-c 冻结时我什至无法摆脱它......它总是可以摆脱我的有缺陷的程序。我查看了 python2.5 多处理池 并尝试遵循建议并将导入添加到我的if 语句但没有运气
更新2:对不起,刚刚意识到感谢下面的答案,该命令有效,但它似乎并没有终止程序或让我强制退出。
I'm sure this is a rookie mistake but I can't figure out what I'm doing wrong with multiprocessing. I have this code(that just sits around and does nothing)
if __name__ == '__main__':
pool = Pool(processes=4)
for i, x in enumerate(data):
pool.apply_async(new_awesome_function, (i, x))
pool.close()
pool.join()
data is a list([1,2,3,4,5]) and I'm trying to take the list send each item to be done over multiple cpu but when I wrap my working command into a function and send this code it doesn't do anything(when I call the function itself without above code it works fine). So I think I'm using multiprocessing wrong(although I took examples from sites), any suggestions?
Update: I noticed that I can't even break out of it when it freezes with control-c..that always works to get out of my buggy programs. I looked at python2.5 multiprocessing Pool and tried to follow the advice and added the import inside my if statement but no luck
Update2: I'm sorry, just realized thanks to the answer below that the command works but it doesn't seem to be terminating the program or letting me force quit.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
多处理不是线程。
您可能正在做这样的事情:
运行脚本后,数据没有改变。这是因为多处理使用程序的副本。您的函数正在运行,但它们是在您的程序的副本中运行的,因此对您的原始程序没有影响。
为了利用多处理,您需要显式地从一个进程到另一个进程进行通信。对于线程来说,一切都是共享的,但是对于多处理来说,除非您明确共享,否则什么都不会共享。
最简单的方法是使用返回值:
请参阅 python 文档: http://docs.python.org /library/multiprocessing.html,用于其他方法,例如队列、管道和管理器。你不能做的是改变你的程序状态并期望它能够工作。
Multiprocessing isn't threading.
You're probably doing something sorta like this
After you run the script, data has not changed. This is because multiprocessing uses copies of your program. Your functions are being run, but they are run in copies of your program and thus have no effect on your original program.
In order to make use of multiprocessing you need to explicitly communicate from one process to another. With threading everything is shared, but with multiprocessing nothing is shared unless you explicitly share it.
The simplest way is to use return values:
See the python documentation: http://docs.python.org/library/multiprocessing.html, for other methods such as Queues, Pipes, and Managers. What you can't do is change your program state and expect that to work.
我不知道您正在使用什么数据库,但很可能您无法像这样在进程之间共享数据库连接。
在Linux上,使用
fork()
,它在启动子进程时复制内存中的所有内容。然而,除非专门设计,否则诸如套接字、打开文件和数据库连接之类的东西将无法正常工作。在 Window 上,
fork()
不可用,因此它将重新运行您的脚本。就你而言,这将非常糟糕,因为它会再次丢弃所有内容。您可以通过放入if __name__ == '__main__':
位来防止这种情况发生。您应该能够在
my_awesome_function
中重新打开数据库连接,从而能够成功地与数据库交互。说实话,这样做你不会获得任何速度。事实上,我预计这会慢一些。看到数据库真的很慢。您的进程将花费大部分时间等待数据库。现在你只有多个进程在等待数据库,这确实不会改善情况。
但数据库是用来存储东西的。只要您正在进行处理,就应该在访问数据库之前在代码中进行处理。您基本上使用数据库作为集合,并且使用 python 集合您的代码会更好。如果您确实需要将这些内容放入数据库中,请在程序末尾执行此操作。
I don't know what database you are using, but chances are you can't share database connections between your processes like that.
On linux,
fork()
is used, which makes a copy of everything in memory when you start the subprocess. However things like socket, open files, and database connection won't work properly unless specifically designed to do so.On Window,
fork()
is unavailable so it'll rerun your script. In your case, that'll be really bad cause it'll drop everything again. You could prevent that by dropping inside theif __name__ == '__main__':
bit.You should be able to reopen the database connections in the
my_awesome_function
and thus be able to sucesfully interact with the database.Truth be told, you aren't going to gain any speed doing this. In fact, I expect this to be slower. See databases are really really slow. Your process is going to spend most of its time waiting for the database. Now you just have multiple processes waiting for the database and that really will not improve the situation.
But databases are for storing things. As long as you are doing processing, you should really do that inside your code before hitting the database. You are basically using the database a s a set, and your code would be much nicer using a python set. If you really need to put that stuff in a database, do that at the end of your program.
你的代码似乎对我有用:
给了我:
是什么让你认为它不起作用?
编辑:尝试运行它并查看输出:
我的是:
Your code seems to work for me:
gave me:
What makes you think it doesn't work?
Edit: Try to run this and look at the output:
Mine is: