如何从 C++ 翻译以下线程模型去吗?
在我的 C++ 项目中,我在磁盘上有一个很大的 GB 二进制文件,我将其读入内存以进行只读计算。
我当前的 C++ 实现涉及将整个块读入内存一次,然后生成线程以从块中读取,以便进行各种计算(无互斥且运行快速)。从技术上讲,每个线程实际上一次只需要文件的一小部分,因此将来,我可能会更改此实现以使用 mmap(),特别是当文件变得太大时。我注意到了这个 gommap lib 所以我认为我应该继续关注。
我应该采取什么方法将我当前的 C++ 线程模型(一大块只读内存)转换为 Go 线程模型,同时考虑到运行时效率?
goroutine?替代品?
In my C++ project, I have a large, GB binary file on disk that I read into memory for read-only calculations.
My current C++ implementation involves reading the entire chunk into memory once and then spawning threads to read from the chunk in order to do various calculations (mutex-free and runs quickly). Technically, each thread really only needs a small part of the file at a time, so in the future, I may change this implementation to use mmap(), especially if the file gets too big. I've noticed this gommap lib so I think I should be covered going forward.
What approach should I take to translate my current C++ threading model (one large chunk of read-only memory) into a go threading model, keeping run-time efficiency in mind?
goroutines? alternatives?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我确信这个答案会引起很大的争议,但这里是这样的:
切换到 Go 不会减少运行时间,特别是如果你的代码已经是无互斥的。 Go 不保证 goroutine 的有效平衡,并且目前不会充分利用可用的内核。生成的代码比 C++ 慢。 Go 当前的优势在于清晰的抽象和并发性,而不是并行性。
如果您随后必须回溯内存,那么预先读取整个文件并不是特别有效。您稍后不会再次使用的文件部分将从缓存中删除,仅稍后重新加载。如果您的平台允许,您应该考虑内存映射,以便根据需要从磁盘加载页面。
如果存在任何强烈的例程间通信或数据之间的依赖性,您应该尝试使算法成为单线程。如果不了解更多有关应用于数据的例程的信息,就很难说,但听起来确实有可能您过早地退出了线程,希望获得神奇的性能提升。
如果由于文件大小或其他平台限制而无法依赖内存映射,则应考虑使用 pread 调用,从而重用单个文件描述符,并且仅根据需要进行读取。
与往常一样,以下规则适用于优化。您必须配置文件。您必须检查您对工作解决方案所做的更改是否正在改进。很多时候,您会发现内存映射、线程和其他恶作剧对性能没有任何明显影响。如果您要放弃 C 或 C++,这也是一场艰苦的战斗。
另外,您应该生成 goroutine 来处理文件的每个部分,并通过通道减少计算结果。确保将
GOMAXPROCS
设置为适当的值。I'm sure this answer will cop a lot of heat but here goes:
You won't get reduced running time by switching to Go, especially if your code is already mutex free. Go doesn't guarantee efficient balancing of goroutines, and will not currently make best use of the available cores. The generated code is slower than C++. Go's current strengths are in clean abstractions, and concurrency, not parallelism.
Reading the entire file up front isn't particular efficient if you then have to go and backtrack through memory. Parts of the file you won't use again until much later will be dropped from the cache, only to be reloaded again later. You should consider memory mapping if your platform will allow it, so that pages are loaded from disk as they're required.
If there is any intense inter-routine communication, or dependencies between the data you should try to make the algorithm single threaded. It's difficult to say without knowing more about the routines you're applying to the data, but it does sound possible that you've pulled out threads prematurely in the hope to get a magic performance boost.
If you're unable to rely on memory mapping due to file size, or other platform constraints, you should consider making use of the pread call, thereby reusing a single file descriptor, and only reading as required.
As always, the following rule applies to optimization. You must profile. You must check that changes you make from a working solution, are improving things. Very often you'll find that memory mapping, threading and other shenanigans have no noticeable effect on performance whatsoever. It's also an uphill battle if you're switching away from C or C++.
Also, you should spawn goroutines to handle each part of the file, and reduce the results of the calculations through a channel. Make sure to set
GOMAXPROCS
to an appropriate value.该程序在多个 goroutine 中对文件中的所有字节求和(不用担心溢出)。
您需要为您的案例重新实现 processChunk 和aggregateResults。您可能还想更改结果通道的通道类型。根据您正在执行的操作,您甚至可能不需要汇总结果。块大小和通道的缓冲区大小是您可以调整的其他旋钮。
This program sums all the bytes in a file in multiple goroutines (without worrying about overflow).
You'll want to reimplement processChunk and aggregateResults for your case. You may also want to change the channel type of the results channel. Depending on what you're doing, you may not even need to aggregate the results. The chunk size and the channel's buffer size are other knobs you can tweak.