快速磁盘存储 (SSD) 的优化算法?

发布于 2024-07-24 07:24:51 字数 347 浏览 5 评论 0原文

鉴于固态硬盘 (SSD) 的价格正在下降,并且很快将作为系统驱动器变得更加普遍,并且其访问率明显高于旋转磁性介质,哪些标准算法将通过使用 SSD 进行本地存储来提高性能贮存? 例如,SSD 的高随机读取速度使得基于磁盘的哈希表之类的东西成为大型哈希表的可行性; 4GB 的磁盘空间随时可用,这使得对 32 位整数的整个范围进行散列变得可行(不过,更多的是用于查找而不是填充,这仍然需要很长时间); 虽然由于访问速度的原因,这种大小的哈希表无法与旋转介质一起使用,但对于 SSD 来说这不应该是一个问题。

即将过渡到 SSD 是否会在其他领域带来算法性能的潜在提升? 我宁愿看到关于一件事如何运作的推理,而不是意见; 我不希望这件事引起争议。

Given that Solid State Disks (SSDs) are decreasing in price and soon will become more prevalent as system drives, and given that their access rates are significantly higher than rotating magnetic media, what standard algorithms will gain in performance from the use of SSDs for local storage? For example, the high random read speed of SSDs makes something like a disk-based hashtable a viability for large hashstables; 4GB of disk space is readily available, which makes hashing to the entire range of a 32-bit integer viable (more for lookup than population, though, which would still take a long time); while this size of a hashtable would be prohibitive to work with with rotating media due to the access speed, it shouldn't be as much of an issue with SSDs.

Are there any other areas where the impending transition to SSDs will provide potential gains in algorithmic performance? I'd rather see reasoning as to how one thing will work rather than opinion; I don't want this to turn contentious.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

ㄖ落Θ余辉 2024-07-31 07:24:51

您的哈希表示例确实是受益的关键数据库结构。 无需将整个 4GB 或更多文件加载到内存中来探测值,而是可以直接探测 SSD。 SSD 仍然比 RAM 慢几个数量级,但是在磁盘上有一个 50GB 的哈希表是相当合理的,但不在 RAM 中,除非你花大钱买大铁。

一个例子是国际象棋位置数据库。 我有超过 50GB 的哈希位置。 有复杂的代码来尝试将哈希中彼此靠近的相关位置分组,因此我可以一次分页表的 10MB,并希望将其中的一些重用于多个类似的位置查询。 为了提高效率,需要大量的代码和复杂性。

更换为 SSD 后,我能够降低集群的所有复杂性,只使用非常愚蠢的随机散列。 我的性能也得到了提高,因为我只从磁盘获取所需的数据,而不是 10MB 的大块。 延迟确实更大,但净加速是显着的..并且超级干净的代码(20 行,而不是 800+),也许更好。

Your example of hashtables is indeed the key database structure that will benefit. Instead of having to load a whole 4GB or more file into memory to probe for values, the SSD can be probed directly. The SSD is still slower than RAM, by orders of magnitude, but it's quite reasonable to have a 50GB hash table on disk, but not in RAM unless you pay big money for big iron.

An example is chess position databases. I have over 50GB of hashed positions. There is complex code to try to group related positions near each other in the hash, so I can page in 10MB of the table at a time and hope to reuse some of it for multiple similar position queries. There's a ton of code and complexity to make this efficient.

Replaced with an SSD, I was able to drop all the complexity of the clustering and just use really dumb randomized hashes. I also got an increase in performance since I only fetch the data I need from the disk, not big 10MB chunks. The latency is indeed larger, but the net speedup is significant.. and the super-clean code (20 lines, not 800+), is perhaps even nicer.

东走西顾 2024-07-31 07:24:51

SSD 的随机访问速度明显更快。 对磁盘的顺序访问,它们的性能仅是主流旋转驱动器的两倍。 许多 SSD 在许多情况下性能较差,导致其性能较差,如所述 此处

虽然 SSD 确实取得了很大进展,但它们仍然比 CPU 操作和物理内存慢得多。 对于 4GB 哈希表示例,您可以通过 SSD 维持 250+ MB/s 的速度来访问随机哈希表存储桶。 对于旋转驱动器,如果能突破个位数 MB/s 就已经很幸运了。 如果您可以将这个 4 GB 哈希表保留在内存中,那么您可以每秒千兆字节的速度访问它 - 甚至比非常快的 SSD 还要快得多。

引用的文章列出了微软在 SSD 上运行时对 Windows 7 所做的几项更改,这可以让您了解可以考虑进行哪些更改。 首先,用于从磁盘预取数据的 SuperFetch 被禁用 - 它旨在解决缓慢的磁盘随机访问时间,而 SSD 可以缓解这一问题。 碎片整理已禁用,因为将文件分散在磁盘上不会影响 SSD 的性能。

SSDs are only significantly faster for random access. Sequential access to disk they are only twice as performant as mainstream rotational drives. Many SSDs have poorer performance in many scenarios causing them to perform worse, as described here.

While SSDs do move the needle considerably, they are still much slower than CPU operations and physical memory. For your 4GB hash table example, you may be able to sustain 250+ MB/s off of an SSD for accessing random hash table buckets. For a rotational drive, you'd be lucky to break the single digit MB/s. If you can keep this 4 GB hash table in memory, you could access it on the order of gigabytes a second - much faster than even a very swift SSD.

The referenced article lists several changes MS made for Windows 7 when running on SSD's, which can give you an idea of the sort of changes you could consider making. First, SuperFetch for prefetching data off of disk is disabled - it's designed to get around slow random access times for disk which are alleviated by SSDs. Defrag is disabled, because having files scattered across the disk aren't a performance hit for SSDs.

诺曦 2024-07-31 07:24:51

事实上,任何你能想到的算法都需要大量的随机磁盘 I/O(随机是关键词,这有助于将局部性原则抛给小鸟,从而消除大量缓存的用处) 。

不过,我可以看到某些数据库系统从中受益。 MySQL,例如使用 MyISAM 存储引擎(其中数据记录基本上是美化的 CSV)。 然而,我认为非常大的哈希表将是最好的选择。

Ipso facto, any algorithm you can think of which requires lots of random disk I/O (random being the key word, which helps to throw the principle of locality to the birds, thus eliminating the usefulness of a lot of caching that goes on).

I could see certain database systems gaining from this though. MySQL, for instance using the MyISAM storage engine (where data records are basically glorified CSVs). However, I think very large hashtables are going to be your best bet for good examples.

雨轻弹 2024-07-31 07:24:51

SSD 的随机读取速度要快得多,顺序读取速度要快一些,而写入速度(随机或非随机)则要慢一些。

因此,基于磁盘的哈希表对于 SSD 来说是没有用的,因为现在更新它需要大量时间,但搜索磁盘(与普通 HDD 相比)变得非常便宜。

SSD are a lot faster for random reads, a bit for sequential reads and properly slower for writes (random or not).

So a diskbased hashtable is properly not useful with an SSD, since it now takes significantly time to update it, but searching the disk becomes (compared to a normal hdd) very cheap.

旧情别恋 2024-07-31 07:24:51

别自欺欺人了。 SSD 仍然比系统内存慢很多。 在其他条件相同的情况下,任何选择使用系统内存而不是硬盘的算法仍然会快得多。

Don't kid yourself. SSDs are still a whole lot slower than system memory. Any algorithm that chooses to use system memory over the hard disk is still going to be much faster, all other things being equal.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文