The flash in modern SSD is usually(!) structured as follows: A page size of 2K or 4K than can be written and 256K erase blocks. A page cannot be overwritten without erasing it before. But the erase operation only works on full erase blocks. However, each erase operations takes a long time (in contrast to other IO operations) and slowly wears out the SSD.
A component of the SSD controller called FTL (Flash Transition Layer) is used to provide the illusion of a HDD-like block device on the flash semantics. SSD can be used like HDD, but to get the most out of it (and to do it for a long time) a software IO design incorporating the knowledge of the storage works best.
However, the SSD controller logic is usually not known. So it might differ from SSD to SSD, but here are a few rules of thumb:
If possible I would align my IO pattern and the file sizes to full erase blocks (or a multiple of it). So writing a file of 256K uses a full erase block without any internal fragmentation. Smaller files like 64K would use only a portion of it. Writing data to the rest of the block might lead to a read-modify-write cycle. This means that the complete block is read, modified and then written to another location. Very expensive.
This is not a problem when the SSD is empty (because the controller has enough unused blocks), but may become an issue if the SSD is full and also heavily used. Or if IO pattern are usually very small writes and the SSD becomes fragmentated. So that the FTL has a harder time finding consecutive free flash pages.
As a side note: the system administrator should align the filesystem to the SSD erase block boundaries, it's really important.
This is made even worse since the system's view of any modern disk does not match actual locations on the physical device. Modern disks, both SSDs and spinning ones, put sectors where they desire.
Since SSDs have wear leveling sector 27 might not be anywhere close to sector 28, and, even if they started out 'close' together might not be close after a bit of writing. Plus, of course, the concept of 'close' with an SSD is kind of an odd concept since there is no seek time.
I would shy away from any design that has loads and loads of files, if the design is just as simple with fewer big files. If, on the other hand, you find yourself writing what amounts to a file system yourself to do the mapping to blocks in your single big file then unless your problem has very specific features it is probably better to take advantage of all the time and thought that has gone into existing file system designs.
发布评论
评论(2)
好问题。
现代 SSD 中的闪存通常(!)结构如下:可写入的页面大小为 2K 或 4K,擦除块为 256K。如果未先擦除页面,则无法覆盖该页面。但擦除操作仅适用于完整擦除块。然而,每次擦除操作都需要很长时间(与其他 IO 操作相比),并且会慢慢磨损 SSD。
SSD 控制器的一个组件称为 FTL(闪存转换层),用于在闪存语义上提供类似 HDD 块设备的假象。 SSD 可以像 HDD 一样使用,但为了充分利用它(并长时间使用),结合存储知识的软件 IO 设计效果最好。
然而,SSD 控制器逻辑通常是未知的。因此,SSD 与 SSD 之间可能有所不同,但这里有一些经验法则:
如果可能,我会将 IO 模式和文件大小与完全擦除块(或其倍数)对齐。所以写入256K的文件使用的是全擦除块,没有任何内部碎片。较小的文件(例如 64K)将仅使用其中的一部分。将数据写入块的其余部分可能会导致读取-修改-写入周期。这意味着读取、修改整个块,然后将其写入另一个位置。非常贵。
当 SSD 为空时,这不是问题(因为控制器有足够的未使用块),但如果 SSD 已满且使用频繁,则可能会成为问题。或者,如果 IO 模式通常是非常小的写入,并且 SSD 会出现碎片。因此 FTL 更难找到连续的空闲闪存页面。
附带说明:系统管理员应该将文件系统与 SSD 擦除块边界对齐,这非常重要。
Good questions.
The flash in modern SSD is usually(!) structured as follows: A page size of 2K or 4K than can be written and 256K erase blocks. A page cannot be overwritten without erasing it before. But the erase operation only works on full erase blocks. However, each erase operations takes a long time (in contrast to other IO operations) and slowly wears out the SSD.
A component of the SSD controller called FTL (Flash Transition Layer) is used to provide the illusion of a HDD-like block device on the flash semantics. SSD can be used like HDD, but to get the most out of it (and to do it for a long time) a software IO design incorporating the knowledge of the storage works best.
However, the SSD controller logic is usually not known. So it might differ from SSD to SSD, but here are a few rules of thumb:
If possible I would align my IO pattern and the file sizes to full erase blocks (or a multiple of it). So writing a file of 256K uses a full erase block without any internal fragmentation. Smaller files like 64K would use only a portion of it. Writing data to the rest of the block might lead to a read-modify-write cycle. This means that the complete block is read, modified and then written to another location. Very expensive.
This is not a problem when the SSD is empty (because the controller has enough unused blocks), but may become an issue if the SSD is full and also heavily used. Or if IO pattern are usually very small writes and the SSD becomes fragmentated. So that the FTL has a harder time finding consecutive free flash pages.
As a side note: the system administrator should align the filesystem to the SSD erase block boundaries, it's really important.
由于任何现代磁盘的系统视图与物理设备上的实际位置都不匹配,因此情况变得更糟。现代磁盘,无论是 SSD 还是旋转磁盘,都将扇区放置在它们想要的位置。
由于 SSD 具有磨损均衡功能,扇区 27 可能不会靠近扇区 28,并且即使它们一开始“靠近”在一起,在写入一段时间后也可能不会靠近。另外,当然,SSD“接近”的概念有点奇怪,因为没有寻道时间。
如果设计同样简单且大文件较少,我会回避任何具有大量文件的设计。另一方面,如果您发现自己编写了相当于文件系统的内容来映射到单个大文件中的块,那么除非您的问题具有非常具体的功能,否则最好充分利用所有时间和想法这已经融入到现有的文件系统设计中。
This is made even worse since the system's view of any modern disk does not match actual locations on the physical device. Modern disks, both SSDs and spinning ones, put sectors where they desire.
Since SSDs have wear leveling sector 27 might not be anywhere close to sector 28, and, even if they started out 'close' together might not be close after a bit of writing. Plus, of course, the concept of 'close' with an SSD is kind of an odd concept since there is no seek time.
I would shy away from any design that has loads and loads of files, if the design is just as simple with fewer big files. If, on the other hand, you find yourself writing what amounts to a file system yourself to do the mapping to blocks in your single big file then unless your problem has very specific features it is probably better to take advantage of all the time and thought that has gone into existing file system designs.