如何在 Python 中进行非阻塞 URL 获取

发布于 2024-07-24 07:45:11 字数 498 浏览 7 评论 0原文

我正在 Pyglet 中编写一个 GUI 应用程序,它必须显示来自互联网的数十到数百个缩略图。 现在,我正在使用 urllib.urlretrieve 来获取它们,但是每次都会阻塞,直到它们完成为止,并且一次只抓取一个。

我更喜欢并行下载它们,并在完成后立即显示它们,而不会在任何时候阻塞 GUI。 做这个的最好方式是什么?

我对线程了解不多,但看起来 threading 模块可能会有所帮助? 或者也许有一些我忽略的简单方法。

I am writing a GUI app in Pyglet that has to display tens to hundreds of thumbnails from the Internet. Right now, I am using urllib.urlretrieve to grab them, but this blocks each time until they are finished, and only grabs one at a time.

I would prefer to download them in parallel and have each one display as soon as it's finished, without blocking the GUI at any point. What is the best way to do this?

I don't know much about threads, but it looks like the threading module might help? Or perhaps there is some easy way I've overlooked.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

江城子 2024-07-31 07:45:11

您可能会受益于线程多处理 模块。 您实际上不需要自己创建所有这些基于 Thread 的类,有一个更简单的方法,使用 Pool.map:

from multiprocessing import Pool

def fetch_url(url):
    # Fetch the URL contents and save it anywhere you need and
    # return something meaningful (like filename or error code),
    # if you wish.
    ...

pool = Pool(processes=4)
result = pool.map(f, image_url_list)

You'll probably benefit from threading or multiprocessing modules. You don't actually need to create all those Thread-based classes by yourself, there is a simpler method using Pool.map:

from multiprocessing import Pool

def fetch_url(url):
    # Fetch the URL contents and save it anywhere you need and
    # return something meaningful (like filename or error code),
    # if you wish.
    ...

pool = Pool(processes=4)
result = pool.map(f, image_url_list)
猫烠⑼条掵仅有一顆心 2024-07-31 07:45:11

正如您所怀疑的,这是线程的完美情况。 这里是我发现的一个简短指南当我在 python 中做我自己的第一个线程处理时很有帮助。

As you suspected, this is a perfect situation for threading. Here is a short guide I found immensely helpful when doing my own first bit of threading in python.

孤君无依 2024-07-31 07:45:11

正如您所指出的,您可以创建多个线程,每个线程负责执行 urlretrieve 操作。 这允许主线程不间断地继续。

这是关于Python中线程的教程:
http://heather.cs.ucdavis.edu/~matloff/Python/ PyThreads.pdf

As you rightly indicated, you could create a number of threads, each of which is responsible for performing urlretrieve operations. This allows the main thread to continue uninterrupted.

Here is a tutorial on threading in python:
http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf

软的没边 2024-07-31 07:45:11

下面是如何使用 threading.Thread 的示例。 只需将类名替换为您自己的名称,并将 run 函数替换为您自己的函数即可。 请注意,线程非常适合像您这样的 IO 受限应用程序,并且确实可以加快速度。 在标准 python 中严格使用 pythong 线程进行计算并没有帮助,因为一次只有一个线程可以计算。

import threading, time
class Ping(threading.Thread):
    def __init__(self, multiple):
        threading.Thread.__init__(self)
        self.multiple = multiple
    def run(self):
        #sleeps 3 seconds then prints 'pong' x times
        time.sleep(3)
        printString = 'pong' * self.multiple

pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now

Here's an example of how to use threading.Thread. Just replace the class name with your own and the run function with your own. Note that threading is great for IO restricted applications like your's and can really speed it up. Using pythong threading strictly for computation in standard python doesn't help because only one thread can compute at a time.

import threading, time
class Ping(threading.Thread):
    def __init__(self, multiple):
        threading.Thread.__init__(self)
        self.multiple = multiple
    def run(self):
        #sleeps 3 seconds then prints 'pong' x times
        time.sleep(3)
        printString = 'pong' * self.multiple

pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now
森林迷了鹿 2024-07-31 07:45:11

您有以下选择:

  • 线程:最简单,但扩展性不佳
  • Twisted:中等难度,扩展性良好,但由于 GIL 和单线程而共享 CPU。
  • 多重处理:最难。 如果您知道如何编写自己的事件循环,则可以很好地扩展。

我建议仅使用线程,除非您需要工业规模的获取器。

You have these choices:

  • Threads: easiest but doesn't scale well
  • Twisted: medium difficulty, scales well but shares CPU due to GIL and being single threaded.
  • Multiprocessing: hardest. Scales well if you know how to write your own event loop.

I recommend just using threads unless you need an industrial scale fetcher.

长亭外,古道边 2024-07-31 07:45:11

您要么需要使用线程,要么需要使用异步网络库,例如 Twisted。 我怀疑在您的特定用例中使用线程可能会更简单。

You either need to use threads, or an asynchronous networking library such as Twisted. I suspect that using threads might be simpler in your particular use case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文