在线程中使用 Django ORM 并避免“太多客户端”使用 BoundedSemaphore 时出现异常

发布于 2024-09-14 03:13:14 字数 716 浏览 4 评论 0原文

我使用 manage.py 命令创建大约 200 个线程来检查远程主机。我的数据库设置允许我使用 120 个连接,因此我需要使用某种池化。我尝试过使用单独的线程,就像这样,

class Pool(Thread):
    def __init__(self):
        Thread.__init__(self)        
        self.semaphore = threading.BoundedSemaphore(10)

    def give(self, trackers):
        self.semaphore.acquire()
        data = ... some ORM (not lazy, query triggered here) ...
        self.semaphore.release()
        return data

我将此对象的实例传递给每个检查线程,但在 init-ing 120 个线程后,仍然在 Pool 对象内收到“OperationalError: FATAL: 抱歉,已经有太多客户端”。 我预计只会打开 10 个数据库连接,并且线程将等待空闲信号量插槽。我可以通过注释“release()”来检查信号量是否有效,在这种情况下,只有 10 个线程可以工作,其他线程将等到应用程序终止。

据我了解,每个线程都在打开与数据库的新连接,即使实际调用是在不同的线程内,但为什么呢?有没有一种方法可以仅在一个线程内执行所有数据库查询?

I work on manage.py command which creates about 200 threads to check remote hosts. My database setup allows me to use 120 connections, so I need to use some kind of pooling. I've tried using separated thread, like this

class Pool(Thread):
    def __init__(self):
        Thread.__init__(self)        
        self.semaphore = threading.BoundedSemaphore(10)

    def give(self, trackers):
        self.semaphore.acquire()
        data = ... some ORM (not lazy, query triggered here) ...
        self.semaphore.release()
        return data

I pass instance of this object to every check-thread but still getting "OperationalError: FATAL: sorry, too many clients already" inside Pool object after init-ing 120 threads .
I've expected that only 10 database connections will be opened and threads will wait for free semaphore slot. I can check that semaphore works by commenting "release()", in that case only 10 threads will work and other will wait till app termination.

As much as I understand, every thread is opening new connection to database even if actual call is inside different thread, but why? Is there any way to perform all database queries inside only one thread?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

×眷恋的温暖 2024-09-21 03:13:14

Django 的 ORM 在线程局部变量中管理数据库连接。因此访问 ORM 的每个不同线程都会创建自己的连接。您可以在 django/db/backends/__init__.py 的前几行中看到这一点。

如果要限制数据库连接的数量,则必须限制实际访问 ORM 的不同线程的数量。解决方案可能是实现一个服务,将 ORM 请求委托给专用 ORM 线程池。要从其他线程传输请求及其结果,您必须实现某种消息传递机制。由于这是一个典型的生产者/消费者问题,因此有关线程的 Python 文档应该给出一些如何实现此目标的提示。

编辑:我刚刚在谷歌上搜索了“django 连接池”。有很多人抱怨 Django 没有提供合适的连接池。其中一些设法集成了一个单独的池包。对于 PostgreSQL,我会看一下 pgpool 中间件。

Django's ORM manages database connections in thread-local variables. So each different thread accessing the ORM will create its own connection. You can see that in the first few lines of django/db/backends/__init__.py.

If you want to limit the number of database connections made, you must limit the number of different threads that actually access the ORM. A solution could be to implement a service that delegates ORM requests to a pool of dedicated ORM threads. To transmit the requests and their results from and to other threads you will have to implement some sort of message passing mechanism. Since this is a typical producer/consumer problem, the Python docs about threading should give some hints how to achieve this.

Edit: I've just googled for "django connection pooling". There are many people who complain that Django does not provide a proper connection pool. Some of them managed to integrate a separate pooling package. For PostgreSQL, I would take a look at the pgpool middleware.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文