同时或顺序写入——速度重要吗?
同时或顺序写入操作——速度重要吗?
使用多核处理器,使用多线程并行化所有文件写入操作是否有意义,只是为了提高速度? 当然,所有这些写操作都是独立的。
Simultaneous Or Sequential write operation-- Does it matter in terms of speed?
With multicore processor, does it make sense to parallelize all the file write operation using multi thread, just to get a boost of speed? Of course, all those write operations are independent.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
一般来说,不会。
到目前为止,对磁盘的物理写入是几个数量级的瓶颈,并且在大多数情况下都是连续的。 并行写入时,您很可能会因产生搜索而恶化性能。 在大多数情况下,顺序读取和写入将大大优于交错读取和写入。
每磁盘并行化(TCQ 和 NCQ)主要通过减少不同客户端同时从磁盘不同部分请求数据时自然需要的寻道来实现。 如果您能够从一开始就避免这些寻求,那么您的情况会更好。
在某些情况下 - RAID 1、JBOD 或当不同的数据流到达相当慢时 - 正确的调度可以提高您的吞吐量,但这需要对现有硬件有深入的了解,并且其他过程不会破坏您的乐趣。
最好的情况是,您可以将其作为决定留给最终用户(例如提供关闭它的选项),并提供性能指标来指导他。 (你甚至可能证明我错了;))
Generally, no.
As of now, the physical write to disk IS the bottle neck by some orders of magnitude, and it is in most scenarios rather sequential. Parallelizing writes you have good chances to worsen performance by incurring seeks. Sequential reads and writes will largely outperform interleaving n most cases.
Per-disk parallelization (TCQ and NCQ) mainly work by reducing the seeks that are naturally required when different clients concurrently request data from different sections of the disk. If you can avoid these seeks in the first place, you are better off.
I some scenarios - RAID 1, JBOD or when different streams of data arrive rather slowly - the right scheduling can improve your throughput, but that requires intimate knowledge of the hardware at hand, and other processes not spoiling your fun.
At best, you can leave that as a decision to the end user (e.g. give an option to turn it off), and provide performance measures to guide him. (You might even prove me wrong ;))
这取决于磁盘及其控制器。 他们有 TCQ/NCQ 吗? 是RAID吗?
如果是这样,这可能有一定道理。 如果使用一块没有 NCQ 的常规 SATA 磁盘,则不会。
That depends on the disks and their controller. Do they have TCQ/NCQ? Is it RAID?
If so that might make some sense. With one regular SATA disk w/o NCQ, it won't.
首先编写最简单的代码,然后看看它在目标环境中是否表现得足够好。 (不同的磁盘、操作系统版本、CPU、驱动程序等可能会显着影响结果。)
如果最简单的正确代码不够快,那么尝试找出更快的方法是有意义的执行IO。 猜测,如果您要写入不同的磁盘,则并行化写入操作可能有意义,但否则可能没有意义。 但这只是一个完整的猜测。
纯属巧合,我计划很快对相关情况进行基准测试。 我有一篇博客文章< /a> 描述我打算执行的测试,并且当我得到一些结果时将使用结果链接更新条目。 它与您所描述的不太一样,但足够接近,也许您会感兴趣。
Write the simplest code first, and see whether that performs well enough with the target environment. (Different disks, operating system versions, CPUs, drivers etc may well affect the result significantly.)
If the simplest correct code isn't fast enough, then it makes sense to try to work out faster ways of performing IO. At a guess, it might make sense to parallelize the write operations if you're writing to different disks, but possibly not otherwise. That's only a complete guess though.
Purely by coincidence, I'm planning to benchmark a related situation soon. I have a blog post describing the tests I intend to perform, and will update the entry with a link to results when I've got some. It's not quite the same as what you're describing, but close enough to perhaps be of interest.
从技术上讲,您可以映射一个文件并让多个线程写入该文件,但磁盘可能仍会造成瓶颈。
如果您需要最大化 I/O 吞吐量,首先要研究您的环境支持的异步 I/O。
Technically, you can mmap a file and have multiple threads write to it, but the disk will probably still create a bottleneck.
If you need maximize I/O throughput, a starting point would be to investigate the asynchronous I/O your environment supports.
这是一个简单的问题,但答案可能非常非常复杂。 Les 尝试通过一些假设来缩小场景范围:操作系统是 Windows,您有相对大量的真正独立的写入。
最坏的情况是,这将比并行 ATA 控制器上的单个普通旧日常磁盘慢:它会很慢。
最好的情况是,操作系统可以非常高效地安排写入。 对于具有大量轴的存储系统或具有支持 NCQ 的磁盘的情况来说,情况确实如此。
这里要记住的关键一点是,磁盘 I/O(一般来说)不受 CPU 限制,因此特意使用多核不会对您有帮助; 它只会让生活变得复杂。
请注意,如果您对写入进行排序,以便它们在文件中(总体)是连续的,或者通过按范围排序在磁盘上是连续的,那么您可以帮助解决问题。
This is a simple question, but the answer can be really really complicated. Les try to narrow down the scenario with some assumptions: The OS is Windows, you have a relatively large number of writes that are truly independent.
Worst case, this won' be any slower than a single plain old every day disk on a parallel ATA controller: it will be slow.
Best case, the OS can schedule the writes very efficiency. This would be true in the case of a storage system with lots of spindles, or with a disk that supports NCQ.
The key thing to remember here is that disk I/O (in general) isn't CPU bound, so going out of your way to use multi-core won't help you; it will just make life complex.
Note, you can help things if you order the writes so they are sequential in a file (overall) or sequential on the disk by sorting them by their extent.
如果您谈论的是写入一个文件,答案是否定的。 您无法并行写入一个文件,因为每个进程或线程都必须从操作系统获取文件的锁才能进行写入。
否则,这必须取决于硬件控制器和存储类型、操作系统内核和文件系统实现。
If you are talking about writing to one file, the answer is no. You can't parallelize writing to one file since every process or thread has to acquire a lock for the file from the OS to do writes.
Otherwize this has to depend on the hardware controllers and type of storage, the OS kernel and filesystem implementation.