在线程中使用 Django ORM 并避免“太多客户端”使用 BoundedSemaphore 时出现异常
我使用 manage.py 命令创建大约 200 个线程来检查远程主机。我的数据库设置允许我使用 120 个连接,因此我需要使用某种池化。我尝试过使用单独的线程,就像这样,
class Pool(Thread):
def __init__(self):
Thread.__init__(self)
self.semaphore = threading.BoundedSemaphore(10)
def give(self, trackers):
self.semaphore.acquire()
data = ... some ORM (not lazy, query triggered here) ...
self.semaphore.release()
return data
我将此对象的实例传递给每个检查线程,但在 init-ing 120 个线程后,仍然在 Pool 对象内收到“OperationalError: FATAL: 抱歉,已经有太多客户端”。 我预计只会打开 10 个数据库连接,并且线程将等待空闲信号量插槽。我可以通过注释“release()”来检查信号量是否有效,在这种情况下,只有 10 个线程可以工作,其他线程将等到应用程序终止。
据我了解,每个线程都在打开与数据库的新连接,即使实际调用是在不同的线程内,但为什么呢?有没有一种方法可以仅在一个线程内执行所有数据库查询?
I work on manage.py command which creates about 200 threads to check remote hosts. My database setup allows me to use 120 connections, so I need to use some kind of pooling. I've tried using separated thread, like this
class Pool(Thread):
def __init__(self):
Thread.__init__(self)
self.semaphore = threading.BoundedSemaphore(10)
def give(self, trackers):
self.semaphore.acquire()
data = ... some ORM (not lazy, query triggered here) ...
self.semaphore.release()
return data
I pass instance of this object to every check-thread but still getting "OperationalError: FATAL: sorry, too many clients already" inside Pool object after init-ing 120 threads .
I've expected that only 10 database connections will be opened and threads will wait for free semaphore slot. I can check that semaphore works by commenting "release()", in that case only 10 threads will work and other will wait till app termination.
As much as I understand, every thread is opening new connection to database even if actual call is inside different thread, but why? Is there any way to perform all database queries inside only one thread?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Django 的 ORM 在线程局部变量中管理数据库连接。因此访问 ORM 的每个不同线程都会创建自己的连接。您可以在 django/db/backends/__init__.py 的前几行中看到这一点。
如果要限制数据库连接的数量,则必须限制实际访问 ORM 的不同线程的数量。解决方案可能是实现一个服务,将 ORM 请求委托给专用 ORM 线程池。要从其他线程传输请求及其结果,您必须实现某种消息传递机制。由于这是一个典型的生产者/消费者问题,因此有关线程的 Python 文档应该给出一些如何实现此目标的提示。
编辑:我刚刚在谷歌上搜索了“django 连接池”。有很多人抱怨 Django 没有提供合适的连接池。其中一些设法集成了一个单独的池包。对于 PostgreSQL,我会看一下 pgpool 中间件。
Django's ORM manages database connections in thread-local variables. So each different thread accessing the ORM will create its own connection. You can see that in the first few lines of
django/db/backends/__init__.py
.If you want to limit the number of database connections made, you must limit the number of different threads that actually access the ORM. A solution could be to implement a service that delegates ORM requests to a pool of dedicated ORM threads. To transmit the requests and their results from and to other threads you will have to implement some sort of message passing mechanism. Since this is a typical producer/consumer problem, the Python docs about threading should give some hints how to achieve this.
Edit: I've just googled for "django connection pooling". There are many people who complain that Django does not provide a proper connection pool. Some of them managed to integrate a separate pooling package. For PostgreSQL, I would take a look at the pgpool middleware.