多线程会加快代码从磁盘读取到内存的速度吗?
如果我有将文件的每个单词读入 ArrayList 或 HashSet 的代码,那么将代码拆分为多个工作线程会更快吗? /code> 并为每个文件分配一个块来处理(假设有多个核心)?我的直觉告诉我不会,因为在这种情况下,I/O 通常会成为瓶颈,而不是 CPU。
If I had code which were to read every word of a file into an ArrayList
or HashSet
, would it be any faster to split the code into multiple worker threads
and assign each a chunk of the file to work on (assuming multiple cores)? My gut says no since the I/O
would usually be the bottleneck rather than the CPU in a case like this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
常规驱动器的 IO 通道通常比物理介质本身可以提供的速度快得多,因此 IO 不会成为瓶颈。使用磁性介质(也称为标准硬盘驱动器),当磁头寻找您正在读取的各个位置时,您会使磁盘疯狂地抖动。性能将非常糟糕,相当于一辆购物车在空荡荡的六车道高速公路上滚动。
固态硬盘不会受到寻道惩罚的影响,但它们还不够普及(或负担得起),还没有发挥多大作用。
The IO channel off a regular drive is usually much faster than what the physical media itself can provide, so IO won't be the bottleneck. With magnetic media (aka a standard harddrive), you'd make the disk thrash like crazy as the heads seek to the various places you're reading from. Performance would be abysmal, the equivalent of a shopping cart rolling down an empty 6lane freeway.
Solid state drives don't suffer from the seek penalty, but they're not widespread (or affordable) enough yet to count for much yet.
这取决于。您认为 IO 将成为瓶颈的想法可能是正确的,因为许多磁盘以串行方式工作。但是,如果该磁盘非常特殊,例如确实支持并发访问的 SSD 或 RAID,该怎么办?此外,如果需要对数据进行大量受 CPU 限制的后处理,那么您可以在读取另一批数据时同时进行该处理。不要这么快就注销并发选项!
It depends. Your line of thinking that the IO is going to be the bottleneck could be correct since a lot of disks work in a serial fashion. But, what if that disk were special like an SSD or a RAID that really did support concurrent access? Also, if there were a significant amount of CPU bound post processing that needs to be done with the data then you could get that going concurrently while another batch of data is being read. Do not write off the concurrent options so quickly!