Check whether your disks are using their built-in write cache. It can make a considerable difference. On Linux, you can toggle the behaviour with hdparm:
Obviously, if write caching is enabled, then there is the potential for data loss or corruption if your system shuts down uncleanly (e.g., a power cut).
In terms of software, the Linux kernel uses two main numbers to parameterize the write behaviour.
Modern defaults are to write more frequently, to avoid huge write spikes. You could try tuning these to suit your needs. Here is an excellent discussion of the available parameters and how you might try adjusting them.
但了解它的工作原理并根据您的特定应用程序进行调整是值得的。 Linux 已经针对一般情况进行了调整,只有您知道您的具体情况有何不同。 :)
You could create a RAM disk and RAID 1 it with a physical partition. Look at the --write-mostly and --write-behind options. You can use those to make the physical disk one which is not to be read from (only written to), and to set the number of outstanding write operations, respectively.
But it'd be worthwhile to learn how it all works, and tune it to your specific application. Linux is already tuned for the general case, and only you know how your specific situation differs. :)
The question here really is how much durability do you require?
Normally Linux will happily use as much RAM as there is to cache files for a while, and then write the changes back. This is normally what you want, so you will lose some, but not too much, data in the event of a crash.
Applications can of course force a write back with (for example) fdatasync() and fsync().
In order to get better performance, you could call fdatasync less often, for example, sacrificing durability.
By default, Linux will use free RAM (almost all of it) to cache disk accesses, and will delay writes. The heuristics used by the kernel to decide the caching strategy are not perfect, but beating them in a specific situation is not easy. Also, on journalling filesystems (i.e. all the default filesystems nowadays), actual writes to the disk will be performed in a way which is resilient the crashes; this implies a bit of overhead. You may want to try to fiddle with filesystem options. E.g., for ext3, try mounting with data=writeback or even async (these options may improve filesystem performance, at the expense of reduced resilience towards crashes). Also, use noatime to reduce filesystem activity.
Programmatically, you might also want to perform disk accesses through memory mappings (with mmap). This is a bit hand-on, but it gives more control about data management and optimization.
发布评论
评论(4)
检查您的磁盘是否正在使用其内置写入缓存。它可以产生很大的变化。在 Linux 上,您可以使用 hdparm 切换行为:
显然,如果启用了写入缓存,那么如果您的系统非正常关闭(例如断电),则可能会导致数据丢失或损坏。
在软件方面,Linux 内核使用两个主要数字来参数化写入行为。
现代默认设置是更频繁地写入,以避免巨大的写入峰值< /a>.您可以尝试调整这些以满足您的需求。这是关于可用参数以及如何使用的精彩讨论尝试调整它们。
Check whether your disks are using their built-in write cache. It can make a considerable difference. On Linux, you can toggle the behaviour with
hdparm
:Obviously, if write caching is enabled, then there is the potential for data loss or corruption if your system shuts down uncleanly (e.g., a power cut).
In terms of software, the Linux kernel uses two main numbers to parameterize the write behaviour.
Modern defaults are to write more frequently, to avoid huge write spikes. You could try tuning these to suit your needs. Here is an excellent discussion of the available parameters and how you might try adjusting them.
您可以创建一个 RAM 磁盘,并使用物理分区RAID 1 它。查看 --write-mostly 和 --write-behind 选项。您可以使用它们分别将物理磁盘设置为不读取(仅写入),并设置未完成的写入操作的数量。
或者,查看 pdflush 文档。除了ire_and_curses 提到,您可能希望将 swappiness 提高到 100,以支持磁盘缓存而不是交换。
但了解它的工作原理并根据您的特定应用程序进行调整是值得的。 Linux 已经针对一般情况进行了调整,只有您知道您的具体情况有何不同。 :)
You could create a RAM disk and RAID 1 it with a physical partition. Look at the --write-mostly and --write-behind options. You can use those to make the physical disk one which is not to be read from (only written to), and to set the number of outstanding write operations, respectively.
Alternatively, look at the documentation for pdflush. Beyond what ire_and_curses mentioned, you'll probably want to crank swappiness up to 100 to favor disk cache over swap.
But it'd be worthwhile to learn how it all works, and tune it to your specific application. Linux is already tuned for the general case, and only you know how your specific situation differs. :)
这里的问题实际上是您需要多少耐用性?
通常,Linux 会乐意使用尽可能多的 RAM 来缓存文件一段时间,然后将更改写回。这通常是您想要的,因此在发生崩溃时您将丢失一些但不会太多的数据。
应用程序当然可以使用(例如)fdatasync() 和 fsync() 强制写回。
为了获得更好的性能,您可以减少调用 fdatasync 的频率,例如牺牲持久性。
The question here really is how much durability do you require?
Normally Linux will happily use as much RAM as there is to cache files for a while, and then write the changes back. This is normally what you want, so you will lose some, but not too much, data in the event of a crash.
Applications can of course force a write back with (for example) fdatasync() and fsync().
In order to get better performance, you could call fdatasync less often, for example, sacrificing durability.
默认情况下,Linux 将使用空闲 RAM(几乎全部)来缓存磁盘访问,并延迟写入。内核用来决定缓存策略的启发式方法并不完美,但在特定情况下击败它们并不容易。此外,在日志文件系统(即当今所有默认文件系统)上,对磁盘的实际写入将以对崩溃有弹性的方式执行;这意味着一些开销。您可能想尝试摆弄文件系统选项。例如,对于
ext3
,尝试使用data=writeback
甚至async
挂载(这些选项可能会提高文件系统性能,但代价是降低对文件系统的恢复能力)崩溃)。另外,使用 noatime 来减少文件系统活动。以编程方式,您可能还想通过内存映射(使用
mmap
)执行磁盘访问。这有点麻烦,但它提供了对数据管理和优化的更多控制。By default, Linux will use free RAM (almost all of it) to cache disk accesses, and will delay writes. The heuristics used by the kernel to decide the caching strategy are not perfect, but beating them in a specific situation is not easy. Also, on journalling filesystems (i.e. all the default filesystems nowadays), actual writes to the disk will be performed in a way which is resilient the crashes; this implies a bit of overhead. You may want to try to fiddle with filesystem options. E.g., for
ext3
, try mounting withdata=writeback
or evenasync
(these options may improve filesystem performance, at the expense of reduced resilience towards crashes). Also, usenoatime
to reduce filesystem activity.Programmatically, you might also want to perform disk accesses through memory mappings (with
mmap
). This is a bit hand-on, but it gives more control about data management and optimization.