异步守护进程处理/与 Django 的 ORM 交互
我正在寻找一种使用 Django ORM 的守护进程进行异步数据处理的方法。然而,ORM 不是线程安全的;尝试从线程内检索/修改 django 对象不是线程安全的。所以我想知道实现异步的正确方法是什么?
基本上我需要完成的是获取数据库中的用户列表,查询第三方 api,然后更新这些用户的用户配置文件行。作为守护进程或后台进程。对每个用户连续执行此操作很容易,但需要很长时间才能实现可扩展。如果守护进程通过 ORM 检索和更新用户,我如何实现一次处理 10-20 个用户?我会为此使用标准线程/队列系统,但你不能像
models.User.objects.get(id=foo) ... 这样的
线程交互Django 本身是一个异步处理系统,它使得每个请求的异步 ORM 调用(?),那么应该有办法做到这一点吗?到目前为止我还没有在文档中找到任何内容。
干杯
I'm looking for a way to do asynchronous data processing with a daemon that uses Django ORM. However, the ORM isn't thread-safe; it's not thread-safe to try to retrieve / modify django objects from within threads. So I'm wondering what the correct way to achieve asynchrony is?
Basically what I need to accomplish is taking a list of users in the db, querying a third party api and then making updates to user-profile rows for those users. As a daemon or background process. Doing this in series per user is easy, but it takes too long to be at all scalable. If the daemon is retrieving and updating the users through the ORM, how do I achieve processing 10-20 users at a time? I would use a standard threading / queue system for this but you can't thread interactions like
models.User.objects.get(id=foo) ...
Django itself is an asynchronous processing system which makes asynchronous ORM calls(?) for each request, so there should be a way to do it? I haven't found anything in the documentation so far.
Cheers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看看 celery 。我想这会解决你的问题。它使用多处理模块。它需要(非常)小的设置,但是对扩展有很大帮助。
Have a look at celery . I guess that would solve your problem. It uses multiprocessing module. It needs a (very) little setup, however helps a lot in scaling.
如果您的异步处理是在自己的进程中完成的,那么线程安全不是问题,因为您的线程不共享地址空间,因此它们不会相互干扰。他们每个人都有自己的模型对象副本。并发性将由数据库和事务控制。所以你没问题。
如果您要在 Web 服务器进程之一中生成一个线程来执行异步业务,那么您需要锁定所有非线程安全的 API 调用。
Apache 通过 fork() 系统调用使用多个进程来处理并发的 Web 请求。这就是为什么 Django 的 ORM API 不需要是线程安全的。我相信 Apache 可能能够使用线程而不是进程,但它认为必须禁用该功能才能使用 Django。
http://groups.google.com/group/django-developers/ browser_thread/thread/905f79e350525c95
顺便说一句,您了解线程和进程之间的区别吗?这很重要。
If your asynchronous processing is being done in its own process, then thread safety is not an issue because your threads are not sharing an address space, so they can't interfere with each other. They would each have their own copy of model objects. Concurrency will be controlled by the database with transactions. So your fine.
If your going to spawn a thread inside one of the web server's processes to do your asynchronous business, then you need to lock all API calls that are not thread safe.
Apache uses multiple processes via the fork() system call to handle conncurrent web requests. This is why Django's ORM APIs don't need to be thread safe. I believe Apache may be able to use threads instead of processes, but it think that feature has to be disabled in order to use Django.
http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95
Btw, do you understand the difference between a thread and a process? Its kind of important.