在 python 中实现基本的队列/线程进程

发布于 2024-09-12 13:18:30 字数 1646 浏览 3 评论 0原文

寻找一些眼球来验证下面的伪 python 块是否有意义。我希望生成多个线程来尽快实现一些 inproc 函数。这个想法是在主循环中生成线程,因此应用程序将以并行/并发的方式同时运行线程

chunk of code
 -get the filenames from a dir
 -write each filename ot a queue
 -spawn a thread for each filename, where each thread 
  waits/reads value/data from the queue
 -the threadParse function then handles the actual processing 
  based on the file that's included via the "execfile" function...


# System modules
from Queue import Queue
from threading import Thread
import time

# Local modules
#import feedparser

# Set up some global variables
appqueue = Queue()

# more than the app will need
# this matches the number of files that will ever be in the 
# urldir
#
num_fetch_threads = 200


def threadParse(q)
  #decompose the packet to get the various elements
  line = q.get()
  college,level,packet=decompose (line)

  #build name of included file
  fname=college+"_"+level+"_Parse.py"
  execfile(fname)
  q.task_done()


#setup the master loop
while True
  time.sleep(2)
  # get the files from the dir
  # setup threads
  filelist="ls /urldir"
  if filelist
    foreach file_ in filelist:
        worker = Thread(target=threadParse, args=(appqueue,))
        worker.start()

    # again, get the files from the dir
    #setup the queue
    filelist="ls /urldir"
    foreach file_ in filelist:
       #stuff the filename in the queue
       appqueue.put(file_)


    # Now wait for the queue to be empty, indicating that we have
    # processed all of the downloads.

  #don't care about this part

  #print '*** Main thread waiting'
  #appqueue.join()
  #print '*** Done'

感谢想法/评论/指针...

谢谢

looking for some eyeballs to verifiy that the following chunk of psuedo python makes sense. i'm looking to spawn a number of threads to implement some inproc functions as fast as possible. the idea is to spawn the threads in the master loop, so the app will run the threads simultaneously in a parallel/concurrent manner

chunk of code
 -get the filenames from a dir
 -write each filename ot a queue
 -spawn a thread for each filename, where each thread 
  waits/reads value/data from the queue
 -the threadParse function then handles the actual processing 
  based on the file that's included via the "execfile" function...


# System modules
from Queue import Queue
from threading import Thread
import time

# Local modules
#import feedparser

# Set up some global variables
appqueue = Queue()

# more than the app will need
# this matches the number of files that will ever be in the 
# urldir
#
num_fetch_threads = 200


def threadParse(q)
  #decompose the packet to get the various elements
  line = q.get()
  college,level,packet=decompose (line)

  #build name of included file
  fname=college+"_"+level+"_Parse.py"
  execfile(fname)
  q.task_done()


#setup the master loop
while True
  time.sleep(2)
  # get the files from the dir
  # setup threads
  filelist="ls /urldir"
  if filelist
    foreach file_ in filelist:
        worker = Thread(target=threadParse, args=(appqueue,))
        worker.start()

    # again, get the files from the dir
    #setup the queue
    filelist="ls /urldir"
    foreach file_ in filelist:
       #stuff the filename in the queue
       appqueue.put(file_)


    # Now wait for the queue to be empty, indicating that we have
    # processed all of the downloads.

  #don't care about this part

  #print '*** Main thread waiting'
  #appqueue.join()
  #print '*** Done'

Thoughts/comments/pointers are appreciated...

thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

绿萝 2024-09-19 13:18:30

如果我理解正确的话:您会产生大量线程来更快地完成工作。

仅当每个线程中完成的工作的主要部分在不持有 GIL 的情况下完成时,这才有效。因此,如果有大量等待来自网络、磁盘或类似数据的数据,这可能是一个好主意。
如果每个任务都使用大量 CPU,那么这将非常类似于在单核 1-CPU 计算机上运行,​​您也可以按顺序执行它们。

我应该补充一点,我所写的内容对于 CPython 是正确的,但不一定适用于 Jython/IronPython。
另外,我应该补充一点,如果您需要利用更多 CPU/核心,可以使用 多处理 可能有帮助的模块。

If I understand this right: You spawn lots of threads to get things done faster.

This only works if the main part of the job done in each thread is done without holding the GIL. So if there is a lot of waiting for data from network, disk or something like that, it might be a good idea.
If each of the tasks are using a lot of CPU, this will run pretty much like on a single core 1-CPU machine and you might as well do them in sequence.

I should add that what I wrote is true for CPython, but not necessarily for Jython/IronPython.
Also, I should add that if you need to utilize more CPUs/cores, there's the multiprocessing module that might help.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文