Python - 从线程池调用 Linux 命令不起作用
-- 大家好, --
我有大约 4000 个 (1-50MB) 文件需要排序。
我正在考虑让Python调用Linux的排序命令。因为我认为这可能有点 I/O 限制,所以我会使用线程库。
这就是我所拥有的,但是当我运行它并观察系统监视器时,我没有看到弹出 25 个排序任务。好像是一次运行一个?我做错了什么?
...
print "starting sort"
def sort_unique(file_path):
"""Run linux sort -ug on a file"""
out = commands.getoutput('sort -ug -o "%s" "%s"' % (file_path, file_path))
assert not out
pool = ThreadPool(25)
for fn in os.listdir(target_dir):
fp = os.path.join(target_dir,fn)
pool.add_task(sort_unique, fp)
pool.wait_completion()
这是 ThreadPool 的来源,也许它被破坏了?
--Hi guys, --
I have about 4000 (1-50MB) files to sort.
I was thinking to have Python call the Linux sort command. And since I'm thinking this might be somewhat I/O bound, I would use the threading library.
So here's what I have but I when I run it and watch the system monitor I don't see 25 sort tasks pop up. It seems to be running one at a time? What am I doing wrong?
...
print "starting sort"
def sort_unique(file_path):
"""Run linux sort -ug on a file"""
out = commands.getoutput('sort -ug -o "%s" "%s"' % (file_path, file_path))
assert not out
pool = ThreadPool(25)
for fn in os.listdir(target_dir):
fp = os.path.join(target_dir,fn)
pool.add_task(sort_unique, fp)
pool.wait_completion()
Here's where ThreadPool comes from, perhaps that is broken?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你做的一切都是正确的。
有一种叫做 GIL 的东西蟒蛇;
全局解释器锁 - 最终导致 python 一次只执行一个线程。
选择 subprocess :),python 不是多线程的。
You're doing everything correct.
There is something which is called GIL in python;
Global Interpreter Lock - which eventually cause python to execute only one thread at time.
Choose subprocess instead :), python is not multithreaded.
事实上这似乎确实有效。我说得太早了。不知道大家是想删掉还是怎么的?对此感到抱歉。
Actually this does seem to be working. I spoke too soon. I'm not sure if you guys want to delete this or what? Sorry about that.
通常人们通过生成多个进程来做到这一点。
multiprocessing
模块使这很容易做到。另一方面,Python 非常擅长排序,所以为什么不直接将文件读入字符串列表
file.readlines()
,然后在 Python 中对其进行排序。您必须编写一个key
函数与list.sort()
一起使用来执行-g
选项,并且您还必须删除重复项,即-u
选项。删除重复项的最简单方法(也是一种快速方法)是在排序之前执行list(set(UNsortedfile))
。Normally people do this by spawning multiple processes. The
multiprocessing
module makes this easy to do.On the other hand, Python is pretty good at sorting, so why not just read the file into a list of strings
file.readlines()
and then sort it in Python. You would have to write akey
function to use withlist.sort()
to do the-g
option, and you would also have to remove duplicates, i.e.-u
option. The easiest way (and a fast way) to remove duplicates is to dolist(set(UNsortedfile))
before you do the sort.