多线程不改善Python的结果?

发布于 2025-01-28 03:47:20 字数 2441 浏览 4 评论 0原文

我将多线程应用于Python脚本以提高其性能。我不明白为什么执行时间没有改善。

这是我实施的代码段:

from queue import Queue
from threading import Thread
from datetime import datetime
import time



class WP_TITLE_DOWNLOADER(Thread):
    def __init__(self, queue,name):
        Thread.__init__(self)
        self.queue = queue
        self.name = name
 
    
    def download_link(self,linkss):       
       ####some test function
       ###later some processing will be done on this list.
       #####this will be processed on CPU. 
       for idx,link in enumerate(linkss):
           ##time.sleep(0.01)
           test.append(idx)

       for idx,i in enumerate(testv):
           i=i.append(2)
      ##

    def run(self):
        while True:
            # Get the work from the queue
            linkss = self.queue.get()
            try:
                 self.download_link(linkss)
            finally:
                 self.queue.task_done()                


       
######with threading

testv=[[i for i in range(5000)] for j in range(20)]
links_list=[[i for i in range(100000)] for j in range(20)]
test=[]
start_time =time.time()
queue = Queue()
thread_count=8
for x in range(thread_count):
    worker = WP_TITLE_DOWNLOADER(queue,str(x))
    # Setting daemon to True will let the main thread exit even though the workers are blocking
    worker.daemon = True
    worker.start()




##FILL UP Queue for threads
for links in links_list: 
        queue.put(links)
        
        
        
##print("queing time={}".format(time.time()-start_time))        
#print(test)
#wait for all to end
j_time =time.time()
queue.join()
t_time = time.time()-start_time
print("With threading time={}".format(t_time))
           
    



#############without threading,  
###following function is same as the one in threading. 
test=[]
def download_link(links1):       
        for idx,link in enumerate(links1):
           ##time.sleep(0.01)
           test.append(idx)
           
        for idx,i in enumerate(testv):
           i=i.append(2)



start_time =time.time()
for links in links_list: 
        download_link(links)
       
        
t_time = time.time()-start_time
print("without threading time={}".format(t_time))

螺纹时间= 0.564049482345581 没有线程时间= 0.13332700729370117

注意:当我脱离 time.sleep 时,螺纹时间低于没有螺纹的时间。 我的测试案例是我有一个列表,每个列表都有10000多个元素,使用多线程的想法是,可以同时处理多个列表,而不是处理单个列表项目,导致整个时间的减少。 。但是结果不符合预期。

I am applying Multi-threading to a python script to improve its performance. I don't understand why there is no improvement in the execution time.

This is the code snippet of my implementation:

from queue import Queue
from threading import Thread
from datetime import datetime
import time



class WP_TITLE_DOWNLOADER(Thread):
    def __init__(self, queue,name):
        Thread.__init__(self)
        self.queue = queue
        self.name = name
 
    
    def download_link(self,linkss):       
       ####some test function
       ###later some processing will be done on this list.
       #####this will be processed on CPU. 
       for idx,link in enumerate(linkss):
           ##time.sleep(0.01)
           test.append(idx)

       for idx,i in enumerate(testv):
           i=i.append(2)
      ##

    def run(self):
        while True:
            # Get the work from the queue
            linkss = self.queue.get()
            try:
                 self.download_link(linkss)
            finally:
                 self.queue.task_done()                


       
######with threading

testv=[[i for i in range(5000)] for j in range(20)]
links_list=[[i for i in range(100000)] for j in range(20)]
test=[]
start_time =time.time()
queue = Queue()
thread_count=8
for x in range(thread_count):
    worker = WP_TITLE_DOWNLOADER(queue,str(x))
    # Setting daemon to True will let the main thread exit even though the workers are blocking
    worker.daemon = True
    worker.start()




##FILL UP Queue for threads
for links in links_list: 
        queue.put(links)
        
        
        
##print("queing time={}".format(time.time()-start_time))        
#print(test)
#wait for all to end
j_time =time.time()
queue.join()
t_time = time.time()-start_time
print("With threading time={}".format(t_time))
           
    



#############without threading,  
###following function is same as the one in threading. 
test=[]
def download_link(links1):       
        for idx,link in enumerate(links1):
           ##time.sleep(0.01)
           test.append(idx)
           
        for idx,i in enumerate(testv):
           i=i.append(2)



start_time =time.time()
for links in links_list: 
        download_link(links)
       
        
t_time = time.time()-start_time
print("without threading time={}".format(t_time))

With threading time=0.564049482345581
without threading time=0.13332700729370117

NOTE: When I uncomment time.sleep, with threading time is lower than without threading.
My test case is I have a list of lists, each list has more than 10000s elements, the idea of using multi-threading is that instead of processing a single list item, multiple lists can be processed simultaneously, resulting in a decrease in overall time. But the results are not as expected.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

濫情▎り 2025-02-04 03:47:20

一般而言(总会有例外),多线程最适合IO结合处理(包括网络)。多处理非常适合CPU密集型活动。

因此,您的测试是有缺陷的。

您的意图显然是要进行某种网络爬行,但这并不是在测试代码中发生的,这意味着您的测试是CPU密集型的,因此不适合多线程。鉴于,一旦添加了网络代码,您可能会发现,只要您使用了合适的技术,就可以改善。

查看consturrent.futures中的threadpoolexecutor。您可能会发现这特别有用,因为您可以通过简单地用ProcessPoolExecutor替换ThreadPoolExecutor来交换多处理,这将使您的实验更容易量化

As a general rule (there will always be exceptions) multithreading is best suited to IO-bound processing (this includes networking). Multiprocessing is well suited to CPU-intensive activities.

Your testing is therefore flawed.

Your intention is clearly to do some kind of web-crawling but that's not happening in your test code which means that your test is CPU-intensive and therefore not suitable for multi-threading. Whereas, once you've added your networking code you may find that matters have improved providing you've used suitable techniques.

Take a look at ThreadPoolExecutor in concurrent.futures. You may find that useful in particular because you can swap to multiprocessing by simply replacing ThreadPoolExecutor with ProcessPoolExecutor which will make your experiments easier to quantify

幸福%小乖 2025-02-04 03:47:20

Python的概念称为“ GIL(全球解释器锁)”。此锁可确保在运行时只有一个线程。因此,即使您催生了多个线程来处理多个列表,一次仅处理一个线程。您可以尝试进行多处理以进行并行执行。

Python has a concept called 'GIL(Global Interpreter Lock)'. This lock ensures that only one thread looks during runtime. Therefore, even if you spawned multiple threads to process multiple lists, only one thread is processing at a time. You can try multi-processing for parallel execution.

茶色山野 2025-02-04 03:47:20

由于GIL(全局解释器锁),Python的线程很尴尬。线程必须竞争以使主要解释器能够计算。仅当线程中的代码不需要全局解释器即,即,在Python中的线程才是有益的。将计算卸载到硬件加速器时,执行I/O绑定计算或调用非Python库时。对于Python中的真实并发,请改用多处理。这有点麻烦了,因为您必须专门共享变量或复制它们并经常序列化通信。

Threading is awkward in Python because of the GIL (Global Interpreter Lock). Threads have to compete to get the main interpreter to be able to compute. Threading in python is only beneficial when the code inside the thread does not require the global interpreter, ie. when offloading computations to a hardware accelerator, when doing I/O bound computations or when calling a non-python library. For true concurrency in python, use multiprocessing instead. It's a bit more cumbersome as you have to specifically share your variables or duplicate them and often serialize your communications.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文