Django 中异步任务和 Redis 的线程安全

发布于 2024-10-29 05:46:50 字数 1936 浏览 8 评论 0原文

我有一个 Django 应用程序，它在查询集上调用异步任务（使用 celery）。该任务获取查询集并执行一大堆操作，根据其中的对象，这些操作可能需要很长时间。对象可以跨查询集共享，因此用户可以在包含已运行对象的查询集上提交任务，并且该新任务应仅在尚未运行的对象上执行，但等待所有对象完成在它返回之前。

我的解释有点令人困惑，所以想象一下下面的代码：

from time import sleep
import redis
from celery.task import Task
from someapp.models import InterestingModel
from someapp.longtime import i_take_a_while

class LongRunningTask(Task):
    def run(self, process_id, *args, **kwargs):
        _queryset = InterestingModel.objects.filter(process__id=process_id)

        r = redis.Redis()
        p = r.pipeline()
        run_check_sets = ('run_check', 'objects_already_running')

        # There must be a better way to do this:
        for o in _queryset.values_list('pk', flat=True):
            p.sadd('run_check')
        p.sdiff(run_check_sets) # Objects that need to be run
        p.sunion(run_check_sets) # Objects that we need to wait for
        p.sunionstore('objects_already_running',run_check_sets)
        p.delete('run_check')
        redis_result = p.execute()

        objects_to_run = redis_result[-3]
        objects_to_wait_for = redis_result[-2]

        if objects_to_run:
            i_take_a_while(objects_to_run)
            p = r.pipeline()
            for o in objects_to_run:
                p.srem('objects_already_running', o)
            p.execute()

        while objects_to_wait_for:
            p = r.pipeline()
            for o in objects_to_wait_for:
                p.sismember('objects_already_running',o)
            redis_result = p.execute()
            objects_to_wait_for = [objects_to_wait_for[i] for i, member in enumerate(redis_result) if member]
            # Probably need to add some sort of timeout here or in redis
            sleep(30)

我对 Redis 非常陌生，所以我的主要问题是是否有更有效的方法来操作 Redis 以获得相同的结果。更广泛地说，我想知道 Redis 是否是处理这个问题的必要/正确方法。看来应该有更好的方法来将 Django 模型与 Redis 进行交互。最后，我想知道这段代码实际上是否是线程安全的。谁能在我的逻辑中找出漏洞吗？

任何评论表示赞赏。

原文

I have a django application that calls an asynchronous task on a queryset (using celery). The task takes the queryset and performs a whole bunch of operations that could potentially take a very long time based on the obects therein. Objects could be shared across querysets, so a user could submit a task on a queryset that contains objects that are already running, and that new task should should only execute on the objects that aren't yet running, but wait for all objects to complete before it returns.

My explanation is a bit confusing, so imagine the following code:

from time import sleep
import redis
from celery.task import Task
from someapp.models import InterestingModel
from someapp.longtime import i_take_a_while

class LongRunningTask(Task):
    def run(self, process_id, *args, **kwargs):
        _queryset = InterestingModel.objects.filter(process__id=process_id)

        r = redis.Redis()
        p = r.pipeline()
        run_check_sets = ('run_check', 'objects_already_running')

        # There must be a better way to do this:
        for o in _queryset.values_list('pk', flat=True):
            p.sadd('run_check')
        p.sdiff(run_check_sets) # Objects that need to be run
        p.sunion(run_check_sets) # Objects that we need to wait for
        p.sunionstore('objects_already_running',run_check_sets)
        p.delete('run_check')
        redis_result = p.execute()

        objects_to_run = redis_result[-3]
        objects_to_wait_for = redis_result[-2]

        if objects_to_run:
            i_take_a_while(objects_to_run)
            p = r.pipeline()
            for o in objects_to_run:
                p.srem('objects_already_running', o)
            p.execute()

        while objects_to_wait_for:
            p = r.pipeline()
            for o in objects_to_wait_for:
                p.sismember('objects_already_running',o)
            redis_result = p.execute()
            objects_to_wait_for = [objects_to_wait_for[i] for i, member in enumerate(redis_result) if member]
            # Probably need to add some sort of timeout here or in redis
            sleep(30)

I am extremely new to Redis, so my main question is whether there is a more efficient way to manipulate Redis to achieve the same result. More broadly, I wonder if Redis is necessary/the right approach to dealing with this problem. It seems like there should be a better way to interact Django models with Redis. Finally, I wonder if this code is, in fact, thread safe. Can anyone punch any holes in my logic?

Any commentary is appreciated.

分享到QQ

分享到微博