在 python 中实现基本的队列/线程进程
寻找一些眼球来验证下面的伪 python 块是否有意义。我希望生成多个线程来尽快实现一些 inproc 函数。这个想法是在主循环中生成线程,因此应用程序将以并行/并发的方式同时运行线程
chunk of code
-get the filenames from a dir
-write each filename ot a queue
-spawn a thread for each filename, where each thread
waits/reads value/data from the queue
-the threadParse function then handles the actual processing
based on the file that's included via the "execfile" function...
# System modules
from Queue import Queue
from threading import Thread
import time
# Local modules
#import feedparser
# Set up some global variables
appqueue = Queue()
# more than the app will need
# this matches the number of files that will ever be in the
# urldir
#
num_fetch_threads = 200
def threadParse(q)
#decompose the packet to get the various elements
line = q.get()
college,level,packet=decompose (line)
#build name of included file
fname=college+"_"+level+"_Parse.py"
execfile(fname)
q.task_done()
#setup the master loop
while True
time.sleep(2)
# get the files from the dir
# setup threads
filelist="ls /urldir"
if filelist
foreach file_ in filelist:
worker = Thread(target=threadParse, args=(appqueue,))
worker.start()
# again, get the files from the dir
#setup the queue
filelist="ls /urldir"
foreach file_ in filelist:
#stuff the filename in the queue
appqueue.put(file_)
# Now wait for the queue to be empty, indicating that we have
# processed all of the downloads.
#don't care about this part
#print '*** Main thread waiting'
#appqueue.join()
#print '*** Done'
感谢想法/评论/指针...
谢谢
looking for some eyeballs to verifiy that the following chunk of psuedo python makes sense. i'm looking to spawn a number of threads to implement some inproc functions as fast as possible. the idea is to spawn the threads in the master loop, so the app will run the threads simultaneously in a parallel/concurrent manner
chunk of code
-get the filenames from a dir
-write each filename ot a queue
-spawn a thread for each filename, where each thread
waits/reads value/data from the queue
-the threadParse function then handles the actual processing
based on the file that's included via the "execfile" function...
# System modules
from Queue import Queue
from threading import Thread
import time
# Local modules
#import feedparser
# Set up some global variables
appqueue = Queue()
# more than the app will need
# this matches the number of files that will ever be in the
# urldir
#
num_fetch_threads = 200
def threadParse(q)
#decompose the packet to get the various elements
line = q.get()
college,level,packet=decompose (line)
#build name of included file
fname=college+"_"+level+"_Parse.py"
execfile(fname)
q.task_done()
#setup the master loop
while True
time.sleep(2)
# get the files from the dir
# setup threads
filelist="ls /urldir"
if filelist
foreach file_ in filelist:
worker = Thread(target=threadParse, args=(appqueue,))
worker.start()
# again, get the files from the dir
#setup the queue
filelist="ls /urldir"
foreach file_ in filelist:
#stuff the filename in the queue
appqueue.put(file_)
# Now wait for the queue to be empty, indicating that we have
# processed all of the downloads.
#don't care about this part
#print '*** Main thread waiting'
#appqueue.join()
#print '*** Done'
Thoughts/comments/pointers are appreciated...
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果我理解正确的话:您会产生大量线程来更快地完成工作。
仅当每个线程中完成的工作的主要部分在不持有 GIL 的情况下完成时,这才有效。因此,如果有大量等待来自网络、磁盘或类似数据的数据,这可能是一个好主意。
如果每个任务都使用大量 CPU,那么这将非常类似于在单核 1-CPU 计算机上运行,您也可以按顺序执行它们。
我应该补充一点,我所写的内容对于 CPython 是正确的,但不一定适用于 Jython/IronPython。
另外,我应该补充一点,如果您需要利用更多 CPU/核心,可以使用 多处理 可能有帮助的模块。
If I understand this right: You spawn lots of threads to get things done faster.
This only works if the main part of the job done in each thread is done without holding the GIL. So if there is a lot of waiting for data from network, disk or something like that, it might be a good idea.
If each of the tasks are using a lot of CPU, this will run pretty much like on a single core 1-CPU machine and you might as well do them in sequence.
I should add that what I wrote is true for CPython, but not necessarily for Jython/IronPython.
Also, I should add that if you need to utilize more CPUs/cores, there's the multiprocessing module that might help.