Pymongo、连接池和通过 Celery 的异步任务
我正在一个应用程序中使用 pymongo 访问 mongodb,该应用程序还使用 Celery 执行许多异步任务。我知道 pymongo 的连接池不支持异步工作人员(基于文档)。
为了访问集合,我有一个 Collection 类,它包装了适合我的应用程序的某些逻辑。我试图理解我用这个包装器继承的一些代码:
目前每个集合都会创建自己的 Connection 实例。根据我正在阅读的内容,这是错误的,我确实应该有一个 Connection 实例(在 settings.py 等中)并将其导入到我的 Collection 实例中。这一点很清楚。是否有关于推荐最大连接数的指南?当前的代码肯定会创建大量连接/套接字,因为它并没有真正利用池设施。
但是,由于一些代码是从异步 celery 任务调用的,并且是同步运行的,所以我不确定如何处理这个问题。我的想法是为任务实例化新的 Connection 实例,并使用单个连接实例作为同步任务(当然,在每个活动完成后结束请求)。这是正确的方向吗?
谢谢!
哈雷尔
I'm using pymongo to access mongodb in an application that also uses Celery to perform many asynchronous tasks. I know pymongo's connection pooling does not support asynchronous workers (based on the docs).
To access collections I've got a Collection class wrapping certain logic that fits my application. I'm trying to make sense of some code that I inherited with this wrapper:
Each collection at the moment creates its own Connection instance. Based on what I'm reading this is wrong and I should really have a single Connection instance (in settings.py or such) and import it into my Collection instances. That bit is clear. Is there a guideline as far as the maximum connections recommended? The current code surely creates a LOT of connections/sockets as its not really making use of the pooling facilities.
However, as some code is called from both asynchronous celery tasks as well as being run synchronously, I'm not sure how to handle this. My thought is to instantiate new Connection instances for the tasks and use the single one for for the synchronous ones (ending_request of course after each activity is done). Is this the right direction?
Thanks!
Harel
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
来自 pymongo 的文档:“PyMongo 是线程安全的,甚至为线程应用程序提供内置连接池。”
在您的情况下,“异步”一词可以翻译为您的应用程序的需求有多“不一致”。
像“x += 1”这样的语句在您的应用程序中永远不会保持一致。如果你能负担得起这个,那就没有问题。如果您有“关键”操作,则必须以某种方式实现一些同步锁。
至于最大连接数,我不知道确切的数字,所以测试并继续。
另请查看 Redis 和这个 示例,如果需要速度和内存效率。从我所做的一些基准测试来看,Redis python 驱动程序的读/写速度至少比 pymongo 快 2 倍。
From pymongo's docs : "PyMongo is thread-safe and even provides built-in connection pooling for threaded applications."
The word "asynchronous" in your situation can be translated into how "inconsistent" requirements your application has.
Statements like "x += 1" will never be consistent in your app. If you can afford this, there is no problem. If you have "critical" operations you must somehow implement some locks for synchronization.
As for the maximum connections, I don't know exact numbers, so test and proceed.
Also take a look at Redis and this example, if speed and memory efficiency are required. From some benchmarks I made, Redis python driver is at least 2x faster than pymongo, for reads/writes.