随机更新磁盘是否主要绑定在标准数据库和仅附加数据库中？

发布于 2024-10-16 03:07:04 字数 411 浏览 0 评论 0原文

如果我有大型数据集并进行随机更新，那么我认为更新主要是磁盘限制的（如果仅附加数据库，我认为与搜索无关，而是与带宽有关）。当我稍微更新记录时，必须更新一个数据页，因此如果我的磁盘可以写入 10MB/s 的数据并且页面大小为 16KB，那么我每秒最多可以进行 640 次随机更新。另外，数据库每秒大约处理 320 个数据，因为一次更新可能需要两页 - 索引和数据。在其他数据库中，由于随机尝试就地更新页面可能会更糟糕，例如每秒 100 次更新。

我假设缓存中的一页在写入之前只有一次更新（随机更新）。未来，所有数据页周围的随机插入（例如不是按时间排序的 UUID）也会出现同样的情况，甚至最糟糕。

我指的是脏页（更新后）必须刷新到磁盘并同步（不能再保留在缓存中）的情况。那么在这种情况下每秒的更新次数是否受到磁盘带宽的限制？我的计算是否可能每秒更新 320 次？也许我错过了什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最近可好 2024-10-23 03:07:04

“这取决于。”

为了完整起见，还有其他事情需要考虑。

首先，随机更新与追加的唯一区别是涉及的头搜索。随机更新将使头部在整个盘片上跳舞，而理想情况下，追加将像电唱机一样进行跟踪。这还假设每个磁盘写入都是完整写入并且完全独立于所有其他写入。

当然，这是在一个完美的世界里。

对于大多数现代数据库，每次更新通常至少涉及 2 次写入。一个用于实际数据，另一个用于日志。

在典型情况下，如果您更新一行，数据库将在内存中进行更改。如果您提交该行，数据库将通过在日志中进行记录来确认这一点，同时将实际的脏页保留在内存中。稍后，当数据库检查点时，它将把脏页写入磁盘。但当它这样做时，它会对块进行排序并尽可能按顺序写入它们。然后它会将检查点写入日志。

在恢复期间，当数据库崩溃且无法检查点时，数据库会读取日志直至最后一个检查点，“前滚”并将这些更改应用到实际磁盘页面，标记最终检查点，然后使系统可供服务。

日志写入是顺序的，数据写入大多是顺序的。

现在，如果日志是普通文件的一部分（现在很常见），那么您将写入日志记录，并将其附加到磁盘文件中。然后，文件系统（可能）将附加到您刚刚所做的更改的 ITS 日志，以便它可以更新其本地文件系统结构。稍后，文件系统还将提交其脏页并使其元数据更改永久化。

因此，您可以看到，即使是简单的追加也可以调用对磁盘的多次写入。

现在考虑像 CouchDB 这样的“仅附加”设计。 Couch 会做的是，当您进行简单的写入时，它没有日志。该文件是它自己的日志。 Couch DB 文件无限增长，并且需要在维护期间进行压缩。但是当它进行写入时，它不仅写入数据页，还写入任何受影响的索引。当索引受到影响时，Couch将重写索引更改的整个BRANCH，从根到叶。因此，在这种情况下，简单的写入可能比您最初想象的要昂贵。

当然，现在您投入所有随机读取来破坏随机写入，这一切很快就会变得非常复杂。但我了解到，虽然流带宽是 IO 操作的一个重要方面，但每秒的总体操作更为重要。您可以拥有 2 个具有相同带宽的磁盘，但仅从磁头行进时间和盘片寻道时间来看，盘片和/或磁头速度较慢的磁盘的每秒操作数就会更少。

理想情况下，您的数据库使用专用原始存储而不是文件系统进行存储，但目前大多数数据库都没有这样做。基于文件系统的存储在操作上的优势通常超过性能优势。

如果您使用的是文件系统，那么预分配的顺序文件是一个好处，这样您的“仅附加”就不会简单地跳过文件系统上的其他文件，从而变得类似于随机更新。此外，通过使用预分配的文件，您的更新只是在写入期间更新数据库数据结构，而不是在文件扩展时更新数据库和文件系统数据结构。

将日志、索引和数据放在单独的磁盘上允许多个驱动器同时工作，干扰更少。例如，与随机数据读取或索引更新相比，您的日志实际上只能附加。

因此，所有这些因素都会影响数据库的吞吐量。

"It depends."

To be complete, there are other things to consider.

First, the only thing distinguishing a random update from an append is the head seek involved. A random update will have the head dancing all over the platter, whereas an append will ideally just track like record player. This also assumes that each disk write is the full write and completely independent of all other writes.

Of course, that's in a perfect world.

With most modern databases, each update will typically involve, at a minimum, 2 writes. One for the actual data, the other for the log.

In a typical scenario, if you update a row, the database will make the change in memory. If you commit that row, the database will acknowledge that by making a note in the log, while keeping the actual dirty page in memory. Later, when the database checkpoints it will right the dirty pages to the disk. But when it does this, it will sort the blocks and write them as sequentially as it can. Then it will write a checkpoint to the log.

During recovery when the DB crashed and could not checkpoint, the database reads the log up to the last checkpoint, "rolls it forward" and applies those changes to actual disk page, marks the final checkpoint, then makes the system available for service.

The log write is sequential, the data writes are mostly sequential.

Now, if the log is part of a normal file (typical today) then you write the log record, which appends to the disk file. The FILE SYSTEM will then (likely) append to ITS log that change you just made so that it can update it's local file system structures. Later, the file system will, also, commit its dirty pages and make it's meta data changes permanent.

So, you can see that even a simple append can invoke multiple writes to the disk.

Now consider an "append only" design like CouchDB. What Couch will do, is when you make a simple write, it does not have a log. The file is its own log. Couch DB files grow without end, and need compaction during maintenance. But when it does the write, it writes not just the data page, but any indexes affected. And when indexes are affected, then Couch will rewrite the entire BRANCH of the index change from root to leaf. So, a simple write in this case can be more expensive than you would first think.

Now, of course, you throw in all of the random reads to disrupt your random writes and it all get quite complicated quite quickly. What I've learned though is that while streaming bandwidth is an important aspect of IO operations, overall operations per second are even more important. You can have 2 disks with the same bandwidth, but the one with the slower platter and/or head speed will have fewer ops/sec, just from head travel time and platter seek time.

Ideally, your DB uses dedicated raw storage vs a file system for storage, but most do not do that today. The advantages of file systems based stores operationally typically outweigh the performance benefits.

If you're on a file system, then preallocated, sequential files are a benefit so that your "append only" isn't simply skipping around other files on the file system, thus becoming similar to random updates. Also, by using preallocated files, your updates are simply updating DB data structures during writes rather than DB AND file system data structures as the file expands.

Putting logs, indexes, and data on separate disks allow multiple drives to work simultaneously with less interference. Your log can truly be append only for example compared to fighting with the random data reads or index updates.

So, all of those things factor in to throughput on DBs.

回复收藏 0 原文

~没有更多了~