PYTHON:多处理怪癖(或者:如何协调这些线程?)
我受到了挑战。 我不确定如何在没有 jython 或 cython(或其他一些 IronPython Whatsahoosie)的情况下使用多处理,并且选择对我的多核 CentOS 程序使用线程。 它读取一组文本文件并输出到字典(由定义函数外部的 hfreq={} 定义)。如果我让它睡眠,它就会运行(非常慢,似乎在一个核心上)并且工作正常。
此外,我不知道如何让它等到两个线程完成后才实际输出到文件(除了 sleep.time 部分,这完全违背了速度的目的)
示例:
hfreq={}
[INSERT TEXT FILE ARRAYS HERE, RESPECTIVELY filenames0[] and filenames1[]]
def count():
some code here that writes frequency to hfreq
def count1():
some code here that writes frequency to hfreq as well, but using filenames1
t1=Thread(target=count,args())
t2=Thread(target=count1,args())
t1.start()
t2.start()
time.sleep(15) #No other known way to prevent the following from running immediately
list=hfreq.items()
list.sort()
Output=Open('Freq.txt', 'w')
[for statement that writes to file]
Output.close()
这就是它结束的地方。如果我在没有线程类(单独)的情况下运行该程序,它会提供大约 10-14 秒的运行时间。如果我尝试线程方法(将两个线程之间的非线程数组减半),两个线程都会运行 14 秒(而不是预期的多核使用)。 感谢您阅读这面文字墙。请告诉我是否可以澄清。
I have been challenged.
I am unsure how to use multiprocessing without jython or cython (or some other IronPython whatsahoosie), and have opted to use Threads for my multicore CentOS program.
It reads a set of text files and outputs to a dictionary (defined by hfreq={} on the outside of the defined functions). If I have it sleep, it runs (terribly slowly, seemingly on one core) and works fine.
Additionally, I do not know how to have it wait until both threads are done to actually output to file (other than the sleep.time part, which completely defeats the purpose of speed)
EXAMPLE:
hfreq={}
[INSERT TEXT FILE ARRAYS HERE, RESPECTIVELY filenames0[] and filenames1[]]
def count():
some code here that writes frequency to hfreq
def count1():
some code here that writes frequency to hfreq as well, but using filenames1
t1=Thread(target=count,args())
t2=Thread(target=count1,args())
t1.start()
t2.start()
time.sleep(15) #No other known way to prevent the following from running immediately
list=hfreq.items()
list.sort()
Output=Open('Freq.txt', 'w')
[for statement that writes to file]
Output.close()
And that is where it ends. If I run the program with no threading classes (on its own), it gives about 10-14 seconds of runtime. If I try the threading approach (halving the non-threading array between the two threads), I get BOTH THREADS running for 14 seconds (instead of the expected multi-core usage).
Thank you for reading this wall of text. Please tell me if I can clarify.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您想利用 CPython 的多核优势,您应该使用
multiprocessing
模块:它有很多注意事项,但它相对比较适合解决这类问题。要等待线程完成,请使用 t.join()。
If you want to take advantage of multiple cores with CPython, you should use the
multiprocessing
module: it has many caveats but this is the sort of problem it's a relatively good fit for.To wait until a thread is done, use
t.join()
.