线程通信中消息队列相对于共享数据有什么优势?
我读了一篇关于多线程程序设计的文章 http://drdobbs.com/architecture-and-design/215900465 ,它说“用异步消息替换共享数据”是最佳实践。尽可能地,更喜欢保持每个线程的数据隔离(非共享),并让线程通过传递数据副本的异步消息进行通信”。
让我困惑的是,我没有看到使用共享数据和消息队列之间的区别。我现在正在windows上做一个非gui项目,所以让我们使用windows的消息队列。并以传统的生产者-消费者问题为例。
使用共享数据,生产者线程和消费者线程之间将有一个共享容器和一个保护该容器的锁。当生产者输出产品时,它首先等待锁,然后向容器写入一些内容,然后释放锁。
使用消息队列,生产者可以简单地 PostThreadMessage 而无需阻塞。这就是异步消息的优点。但我认为必须存在一些锁来保护两个线程之间的消息队列,否则数据肯定会损坏。 PostThreadMessage 调用只是隐藏细节。我不知道我的猜测是否正确,但如果这是真的,优势似乎不再存在,因为这两种方法都做同样的事情,唯一的区别是系统在使用消息队列时隐藏细节。
附:也许消息队列使用非阻塞容器,但我也可以按照前一种方式使用并发容器。我想知道消息队列是如何实现的,这两种方式有什么性能差异吗?
更新: 如果消息队列操作仍然在其他地方被阻止,我仍然不明白异步消息的概念。如果我的猜测是错误的,请纠正我:当我们使用共享容器和锁时,我们将阻塞我们自己的线程。但是当使用消息队列时,我自己的线程立即返回,并将阻塞工作留给了某个系统线程。
I read a article about multithread program design http://drdobbs.com/architecture-and-design/215900465, it says it's a best practice that "replacing shared data with asynchronous messages. As much as possible, prefer to keep each thread’s data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data".
What confuse me is that I don't see the difference between using shared data and message queues. I am now working on a non-gui project on windows, so let's use windows's message queues. and take a tradition producer-consumer problem as a example.
Using shared data, there would be a shared container and a lock guarding the container between the producer thread and the consumer thread. when producer output product, it first wait for the lock and then write something to the container then release the lock.
Using message queue, the producer could simply PostThreadMessage without block. and this is the async message's advantage. but I think there must exist some lock guarding the message queue between the two threads, otherwise the data will definitely corrupt. the PostThreadMessage call just hide the details. I don't know whether my guess is right but if it's true, the advantage seems no longer exist,since both two method do the same thing and the only difference is that the system hide the details when using message queues.
ps. maybe the message queue use a non-blocking containner, but I could use a concurrent container in the former way too. I want to know how the message queue is implemented and is there any performance difference bwtween the two ways?
updated:
I still don't get the concept of async message if the message queue operations are still blocked somewhere else. Correct me if my guess was wrong: when we use shared containers and locks we will block in our own thread. but when using message queues, myself's thread returned immediately, and left the blocking work to some system thread.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
消息传递对于交换少量数据很有用,因为不需要避免冲突。对于计算机间通信来说,它比共享内存更容易实现。此外,正如您已经注意到的,消息传递的优点是应用程序开发人员无需担心共享内存等保护细节。
共享内存允许最大速度和便利的通信,因为它可以在计算机内以内存速度完成。共享内存通常比消息传递更快,因为消息传递通常使用系统调用来实现,因此需要更耗时的内核干预任务。相反,在共享内存系统中,仅需要系统调用来建立共享内存区域。一旦建立,所有访问都将被视为正常的内存访问,无需内核的额外帮助。
编辑:您可能希望实现自己的队列的一种情况是需要生成和使用大量消息,例如日志系统。随着PostThreadMessage的实现,它的队列容量是固定的。如果超出该容量,消息很可能会丢失。
Message passing is useful for exchanging smaller amounts of data, because no conflicts need be avoided. It's much easier to implement than is shared memory for intercomputer communication. Also, as you've already noticed, message passing has the advantage that application developers don't need to worry about the details of protections like shared memory.
Shared memory allows maximum speed and convenience of communication, as it can be done at memory speeds when within a computer. Shared memory is usually faster than message passing, as message-passing are typically implemented using system calls and thus require the more time-consuming tasks of kernel intervention. In contrast, in shared-memory systems, system calls are required only to establish shared-memory regions. Once established, all access are treated as normal memory accesses w/o extra assistance from the kernel.
Edit: One case that you might want implement your own queue is that there are lots of messages to be produced and consumed, e.g., a logging system. With the implemenetation of PostThreadMessage, its queue capacity is fixed. Messages will most liky get lost if that capacity is exceeded.
想象一下,您有 1 个线程生成数据,4 个线程处理该数据(大概是为了利用多核机器)。如果您有一个很大的全局数据池,那么当任何线程需要访问时,您可能必须锁定它,这可能会阻塞其他 3 个线程。当您添加更多处理线程时,锁必须等待的机会就会增加,并且可能需要等待的事物数量也会增加。最终添加更多线程不会有任何效果,因为你所做的只是花费更多时间进行阻塞。
相反,如果您有一个线程将消息发送到消息队列,每个消费者线程都有一个线程,那么它们就不能互相阻塞。您仍然必须锁定生产者线程和消费者线程之间的队列,但是由于每个线程都有一个单独的队列,因此您有一个单独的锁,并且每个线程不能阻止所有其他线程等待数据。
如果您突然获得一台 32 核计算机,您可以再添加 20 个处理线程(和队列),并期望性能将呈相当线性的扩展,这与第一种情况不同,在第一种情况下,新线程始终会相互冲突。
Imagine you have 1 thread producing data,and 4 threads processing that data (presumably to make use of a multi core machine). If you have a big global pool of data you are likely to have to lock it when any of the threads needs access, potentially blocking 3 other threads. As you add more processing threads you increase the chance of a lock having to wait and increase how many things might have to wait. Eventually adding more threads achieves nothing because all you do is spend more time blocking.
If instead you have one thread sending messages into message queues, one for each consumer thread then they can't block each other. You stil have to lock the queue between the producer and consumer threads but as you have a separate queue for each thread you have a separate lock and each thread can't block all the others waiting for data.
If you suddenly get a 32 core machine you can add 20 more processing threads (and queues) and expect that performance will scale fairly linearly unlike the first case where the new threads will just run into each other all the time.
我使用了共享内存模型,其中指向共享内存的指针在消息队列中通过仔细锁定进行管理。从某种意义上说,这是消息队列和共享内存之间的混合体。当必须在线程之间传递大量数据同时保持消息队列的安全时,这种情况非常明显。
整个队列可以通过适当的锁定等封装在单个 C++ 类中。关键是队列拥有共享存储并负责锁定。生产者获取队列输入的锁,并接收指向下一个可用存储块(通常是某种类型的对象)的指针,填充它并释放它。消费者将阻塞,直到生产者释放下一个共享对象。然后它可以获取存储的锁,处理数据并将其释放回池中。一个适当设计的队列可以非常高效地执行多个生产者/多个消费者操作。考虑 Java 线程安全 (java.util.concurrent.BlockingQueue) 语义,但对于指向存储的指针。
I have used a shared memory model where the pointers to the shared memory are managed in a message queue with careful locking. In a sense, this is a hybrid between a message queue and shared memory. This is very when large quantities of data must be passed between threads while retaining the safety of the message queue.
The entire queue can be packaged in a single C++ class with appropriate locking and the like. The key is that the queue owns the shared storage and takes care of the locking. Producers acquire a lock for input to the queue and receive a pointer to the next available storage chunk (usually an object of some sort), populates it and releases it. The consumer will block until the next shared object has released by the producer. It can then acquire a lock to the storage, process the data and release it back to the pool. In A suitably designed queue can perform multiple producer/multiple consumer operations with great efficiency. Think a Java thread safe (java.util.concurrent.BlockingQueue) semantics but for pointers to storage.
当然,传递消息时存在“共享数据”。毕竟,消息本身就是某种数据。然而,重要的区别是当您传递消息时,消费者将收到一个副本。
是的,它确实如此,但作为一个 WINAPI 调用,您可以合理地确定它做得正确。
优点是更安全。当您传递消息时,您有一个系统强制执行的锁定机制。你甚至不需要考虑它,你就不会忘记锁定。鉴于多线程错误是最令人讨厌的错误之一(考虑竞争条件),这一点非常重要。消息传递是建立在锁之上的更高级别的抽象。
缺点是传递大量数据可能会很慢。在这种情况下,您需要使用需要共享内存。
对于传递状态(即工作线程向 GUI 报告进度),消息是最佳选择。
Of course there is "shared data" when you pass messages. After all, the message itself is some sort of data. However, the important distinction is when you pass a message, the consumer will receive a copy.
Yes, it does, but being a WINAPI call, you can be reasonably sure that it does it right.
The advantage is more safety. You have a locking mechanism that is systematically enforced when you are passing a message. You don't even need to think about it, you can't forget to lock. Given that multi-thread bugs are some of the nastiest ones (think of race conditions), this is very important. Message passing is a higher level of abstraction built on locks.
The disadvantage is that passing large amounts of data would be probably slow. In that case, you need to use need shared memory.
For passing state (i.e. worker thread reporting progress to the GUI) the messages are the way to go.
这很简单(我很惊讶其他人写了这么长的响应!):
使用消息队列系统而不是“原始”共享数据意味着您必须仅在中央设备中获得一次同步(锁定/解锁资源)。地方。
使用基于消息的系统,您可以从更高的角度思考“消息”,而不必再担心同步问题。无论如何,消息队列完全有可能是在内部使用共享数据来实现的。
It's quite simple (I'm amazed others wrote such length responses!):
Using a message queue system instead of 'raw' shared data means that you have to get the synchronization (locking/unlocking of resources) right only once, in a central place.
With a message-based system, you can think in higher terms of "messages" without having to worry about synchronization issues anymore. For what it's worth, it's perfectly possible that a message queue is implemented using shared data internally.
我认为这是其中的关键信息:“尽可能保持每个线程的数据隔离(不共享),并让线程通过传递数据副本的异步消息进行通信”。即使用生产者-消费者:)
您可以自己进行消息传递或使用操作系统提供的东西。这是一个实现细节(需要立即完成)。关键是避免共享数据,例如多个线程修改同一内存区域。这可能会导致很难发现错误,即使代码很完美,也会因为所有锁定而消耗性能。
I think this is the key piece of info there: "As much as possible, prefer to keep each thread’s data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data". I.e. use producer-consumer :)
You can do your own message passing or use something provided by the OS. That's an implementation detail (needs to be done right ofc). The key is to avoid shared data, as in having the same region of memory modified by multiple threads. This can cause hard to find bugs, and even if the code is perfect it will eat performance because of all the locking.
我有完全相同的问题。看完答案后。我觉得:
在最典型的用例中,队列=异步,共享内存(锁)=同步。事实上,您可以执行共享内存的异步版本,但这需要更多代码,类似于重新发明消息传递轮。
更少的代码=更少的错误和更多的时间专注于其他事情。
前面的回答已经提到了优点和缺点,所以我不再重复。
I had exact the same question. After reading the answers. I feel:
in most typical use case, queue = async, shared memory (locks) = sync. Indeed, you can do a async version of shared memory, but that's more code, similar to reinvent the message passing wheel.
Less code = less bug and more time to focus on other stuff.
The pros and cons are already mentioned by previous answers so I will not repeat.