需要验证：可清除的 Python 队列类

发布于 2024-10-22 09:36:19 字数 1085 浏览 0 评论 0原文

由于我不是Python和多线程编程方面的专家，我想问你我的实现是否正确。

我的目标是扩展 Queue 类，以便可以清除它。并应归还被移走的物品。就这样。我的实现是：

import Queue

class ClearableQueue(Queue.Queue):

    def __init__(self, maxsize):
        Queue.Queue.__init__(self, maxsize)

    def clear(self):
        self.mutex.acquire()

        copyOfRemovedEntries = list(self.queue)
        self.queue.clear()
        self.unfinished_tasks = 0
        self.all_tasks_done.notifyAll()
        self.not_full.notifyAll()

        self.mutex.release()

        return copyOfRemovedEntries

正确吗？谢谢。

更新：不幸的是，这个实现仍然不够，因为在调用clear()之后task_done可能会抛出ValueError异常。

更准确地说：队列被认为是在多线程环境中使用的。因此，假设一个生产者和一个工作线程（但您也可以考虑更多线程）。通常，如果工作线程调用 get()，则应在工作线程完成工作后调用 task_done()。如果这种情况发生，那么生产者线程可能会出于某种原因在工作线程调用 get() 之后、调用 task_done() 之前调用clear()。到目前为止，这是有效的，但是，如果工作线程想要调用task_done()，则会抛出异常。这是因为task_done()通过检查Queue类的unfinished_tasks来检查未完成任务的数量。

如果这个问题可以仅由 ClearableQueue 类处理，那么可以毫无顾虑地调用clear()方法，那将会很有趣。或者是否必须有不同的东西以某种方式控制方法调用。

实际上，在我的具体情况下，我不使用 join() 方法，因此我不需要调用 task_done()。不过，我想让这个功能更完整。对于其他人也可能有用。

原文

Since I'm not an expert for Python and multi-threaded programming I'd like to ask you if my implementation is correct.

My aim was to extend the Queue class so it can be cleared. And the removed items should be returned. That's all. My implementation is:

import Queue

class ClearableQueue(Queue.Queue):

    def __init__(self, maxsize):
        Queue.Queue.__init__(self, maxsize)

    def clear(self):
        self.mutex.acquire()

        copyOfRemovedEntries = list(self.queue)
        self.queue.clear()
        self.unfinished_tasks = 0
        self.all_tasks_done.notifyAll()
        self.not_full.notifyAll()

        self.mutex.release()

        return copyOfRemovedEntries

Is it correct?
Thank you.

Update: Unfortunately, this implementation is still insufficient since task_done could throw the ValueError exception after clear() is called.

More precisely: The queue is thought to be used in a multi-threaded environment. So assume one producer and one worker thread (but you can consider more threads as well). Normally, if a worker thread calls get() a task_done() should be called after the worker has done its work. If this happens this way then it could happen that the producer thread call clear() for some reason, right after the worker thread has called get() and before task_done() has been called. This works so far, however, if the worker thread would like to call task_done() then the exception will be thrown. This is because task_done() checks the number of unfinished tasks by checking unfinished_tasks of the Queue class.

It would be interesting if this issue could be handled solely by the ClearableQueue class so the clear() method can be called without worries. Or if there has to be something different that controlls the method calls somehow.

Actually, in my concrete case I don't use the join() method so I don't need to call task_done(). However, I'd like to make this feature complete. Could be useful for other people as well.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

执手闯天涯 2024-10-29 09:36:19

如果您查看源代码，您将看到访问互斥锁的标准方法将变异代码包装在 try:finally 块中，以防出现问题：

import Queue

class ClearableQueue(Queue.Queue):

    def __init__(self, maxsize):
        Queue.Queue.__init__(self, maxsize)

    def clear(self):
        self.mutex.acquire()

        copyOfRemovedEntries = None
        try:
            copyOfRemovedEntries = list(self.queue)
            self.queue.clear()
            self.unfinished_tasks = 0
            self.all_tasks_done.notifyAll()
            self.not_full.notifyAll()
        finally:
            self.mutex.release()

        return copyOfRemovedEntries

编辑 1

如果您担心第二个线程抛出异常当您执行 get() 和 task_done() 时，为什么不将 task_done() 包装在 try-catch 块中呢？所有这些异常都告诉您，您已经确认了太多的项目，但是如果您的明确功能已经处理了它们，那么问题出在哪里？

如果它困扰您，这将隐藏该异常，使函数的意图更加明显，并删除我之前示例中的双重列表分配：

class ClearableQueue(Queue.Queue):

    def __init__(self, maxsize):
        Queue.Queue.__init__(self, maxsize)

    def get_all(self)
        self.mutex.acquire()

        try:
            copyOfRemovedEntries = list(self.queue)
            self.queue.clear()
            self.unfinished_tasks = 0
            self.all_tasks_done.notifyAll()
            self.not_full.notifyAll()
        finally:
            self.mutex.release()

        return copyOfRemovedEntries

    def clear(self):
        self.get_all()

    def task_done(self):
        try:
            Queue.Queue.task_done(self)
        except ValueError:
            pass

编辑 2

这作为一个更有效的解决方案怎么样？不隐藏任何内容：

class ClearableQueue(Queue.Queue):

    def __init__(self, maxsize):
        Queue.Queue.__init__(self, maxsize)
        self.tasks_cleared = 0

    def get_all(self)
        self.mutex.acquire()

        try:
            copyOfRemovedEntries = list(self.queue)
            self.queue.clear()
            self.unfinished_tasks = 0
            self.all_tasks_done.notifyAll()
            self.not_full.notifyAll()
            self.tasks_cleared += len(copyOfRemovedEntries)
        finally:
            self.mutex.release()

        return copyOfRemovedEntries

    def clear(self):
        self.get_all()

    def task_done(self):
        self.all_tasks_done.acquire()
        try:
            unfinished = self.unfinished_tasks + self.tasks_cleared - 1
            if unfinished <= 0:
                if unfinished < 0:
                    raise ValueError('task_done() called too many times')
                self.all_tasks_done.notify_all()
            self.unfinished_tasks = unfinished - self.tasks_cleared
            self.tasks_cleared = 0
        finally:
            self.all_tasks_done.release()

我认为这应该避免异常，但仍然按照原始类的预期方式运行。

If you look at the source, you'll see that the standard way to access the mutex wraps the mutating code in a try: finally block in case something goes wrong:

import Queue

class ClearableQueue(Queue.Queue):

    def __init__(self, maxsize):
        Queue.Queue.__init__(self, maxsize)

    def clear(self):
        self.mutex.acquire()

        copyOfRemovedEntries = None
        try:
            copyOfRemovedEntries = list(self.queue)
            self.queue.clear()
            self.unfinished_tasks = 0
            self.all_tasks_done.notifyAll()
            self.not_full.notifyAll()
        finally:
            self.mutex.release()

        return copyOfRemovedEntries

Edit 1

If you're worried about a second thread chucking exceptions when you do get() then task_done() why not just wrap task_done() in a try-catch block? All that exception tells you is that you is that you've acnowledged too many items, but if your clear function already took care of them, where's the issue?

This would hide that exception if it bothers you, make the intention of the functions more obvious and remove the double list-assignment in my previous example:

class ClearableQueue(Queue.Queue):

    def __init__(self, maxsize):
        Queue.Queue.__init__(self, maxsize)

    def get_all(self)
        self.mutex.acquire()

        try:
            copyOfRemovedEntries = list(self.queue)
            self.queue.clear()
            self.unfinished_tasks = 0
            self.all_tasks_done.notifyAll()
            self.not_full.notifyAll()
        finally:
            self.mutex.release()

        return copyOfRemovedEntries

    def clear(self):
        self.get_all()

    def task_done(self):
        try:
            Queue.Queue.task_done(self)
        except ValueError:
            pass

Edit 2

How about this as an even more effective solution which doesn't hide anything:

class ClearableQueue(Queue.Queue):

    def __init__(self, maxsize):
        Queue.Queue.__init__(self, maxsize)
        self.tasks_cleared = 0

    def get_all(self)
        self.mutex.acquire()

        try:
            copyOfRemovedEntries = list(self.queue)
            self.queue.clear()
            self.unfinished_tasks = 0
            self.all_tasks_done.notifyAll()
            self.not_full.notifyAll()
            self.tasks_cleared += len(copyOfRemovedEntries)
        finally:
            self.mutex.release()

        return copyOfRemovedEntries

    def clear(self):
        self.get_all()

    def task_done(self):
        self.all_tasks_done.acquire()
        try:
            unfinished = self.unfinished_tasks + self.tasks_cleared - 1
            if unfinished <= 0:
                if unfinished < 0:
                    raise ValueError('task_done() called too many times')
                self.all_tasks_done.notify_all()
            self.unfinished_tasks = unfinished - self.tasks_cleared
            self.tasks_cleared = 0
        finally:
            self.all_tasks_done.release()

I think this should avoid the exception but still behave in the way the original class was expected.

回复收藏 0 原文

百善笑为先 2024-10-29 09:36:19

您似乎遇到了某种竞争条件，如果我理解的话，目前的情况是您有时会得到：

T1: |----->|------------->|-------------->|
    | get  |    some_opp  | task_done     |
T2: |---------->|------>|---------------->|
    | other_opp | clear | yet_another_opp |

Whereclear is Performed inside get and task_done。这会导致崩溃。据我了解，您需要某种方法来做到这一点：

T1: |----->|------------->|-------------->|
    | get  |    some_opp  | task_done     |
T2: |---------->|------------------------>|------>|
    | other_opp | wait_for_task_done      | clear |

如果这是正确的，您可能需要第二个锁，由 get 设置并由 task_done 释放，它表示“无法清除此队列”。然后，您可能需要一个 get 和 task_done 版本，在您真正知道自己在做什么的特殊情况下，它不会执行此操作。

另一种方法是拥有一个更原子的锁，它允许您执行此操作：

T1: |----->|------------------->|-------------->|------------->|
    | get  |    some_opp        | task_done     | finish_clear |
T2: |---------->|-------------->|---------------->|
    | other_opp | partial_clear | yet_another_opp |

您说“我还没有完成此任务，但您可以清除其余的任务”，然后告诉task_done该任务已尝试被清除，所以它应该在之后做一些事情。但这开始变得相当复杂。

You appear to be suffering some sort of race condition, and if I understand it, the current situation is that you sometimes get:

T1: |----->|------------->|-------------->|
    | get  |    some_opp  | task_done     |
T2: |---------->|------>|---------------->|
    | other_opp | clear | yet_another_opp |

Where clear is performed within get and task_done. This causes a crash. As I understand it you need some way do do this:

T1: |----->|------------->|-------------->|
    | get  |    some_opp  | task_done     |
T2: |---------->|------------------------>|------>|
    | other_opp | wait_for_task_done      | clear |

If this is correct, you may need a second lock, set by get and released by task_done, which says 'this queue can't be cleared'. You might then need to have a version of get and task_done that does not do this for special cases where you really know that you're doing.

An alternative to this is to have a more atomic lock which allows you to do this:

T1: |----->|------------------->|-------------->|------------->|
    | get  |    some_opp        | task_done     | finish_clear |
T2: |---------->|-------------->|---------------->|
    | other_opp | partial_clear | yet_another_opp |

Where you say 'I'm not done with this task but you can clear the rest, then tells task_done that the task had an attempt at being cleared, so it should do something after. This is starting to get fairly complex though.

回复收藏 0 原文

~没有更多了~