使用 NIO,我需要关心块边界上的 R/W 吗?

发布于 2025-01-02 10:21:42 字数 933 浏览 1 评论 0原文

背景

优化数据库设计已经投入了大量工作,特别是在从磁盘(主轴和 SSD)读取和写入数据的最佳方式方面。

从工作中得出的知识表明,在块边界上读写、匹配正在运行的文件系统的块大小是最佳方法。

问题

假设我在内存相对较低的环境中运行,想要使用一个 32MB 的小型内存映射文件来读取和写入一个 500GB 的巨大文件的内容。

如果我使用 Java 的 NIO 机制,特别是 MappedByteBuffer(Java的内存映射文件机制),在配对我需要的数据之前,我是否需要注意在块边界(例如4KB)上执行读和写操作到内存中,或者我可以只发出R/W在我想要的任何位置进行操作,并允许操作系统、VM 分页逻辑、文件系统和存储固件处理操作的优化并根据需要剔除我不需要的其他块数据?

其他细节

这个问题的原因在于数据库设计,我发现这种痴迷专注于块优化,以至于似乎不存在一个可以让你只会在没有块概念的情况下读取和写入数据。

让我困惑的是,文件系统是执行块操作单元的系统,为什么我的更高级别的应用程序需要担心这一点呢?如果我想要偏移量 71 处的 17,631 个字节,我不能直接抓取它们并将它们读入吗?还是我能更快地弄清楚这一点 读取操作从块 0 开始,跨越块 0、1 和 2 的边界...将所有这 3 个块读入内部 byte[],然后剔除我想要的 17,631 字节?

如果关于数据库设计的文献对这个块的想法不是那么虔诚,那么这个问题永远不会出现在我的脑海中,但因为它是,我想知道我是否在这里遗漏了一个关键细节WRT文件系统和最佳块设备我/O。

感谢您的阅读。

Background

A lot of work has gone into optimizing database design, especially in the realm of the most optimal ways to read and write data from disks (both spindle and SSD).

The knowledge that has come out of the work suggests that reading and writing on block boundaries, matching the block sizes of the filesystem you are running on, is the most optimal approach.

Question

Say I am operating in a relatively low-memory environment and want to use a small 32MB memory-mapped file to read and write the contents of a huge 500GB file.

If I were using Java's NIO mechanisms, specifically the MappedByteBuffer (Java's memory-mapped file mechanism), would I need to take care to execute READ and WRITE operations on block boundaries (e.g. 4KB) into memory before pairing out the data I needed, or can I just issue R/W ops at any location I want and allow the operating system, VM paging logic, filesystem and storage firmware handle the optimization of the operations and culling of additional block data I didn't need as-needed?

Additional Detail

The reason for the question is in database design, I see this obsessive focus on block-optimization to the point that there doesn't seem to exist a world where you would ever just read and write data without the concept of a block.

What confuses me is that the filesystem is the one enforcing the block units of operation, why would my higher level app need to worry about this then? If I want the 17,631 bytes at offset 71, can't I just grab them and read them in, or is it really faster for me to figure out that
the read operation starts at block 0 and falls across the boundaries of blocks 0, 1 and 2... read all of those 3 blocks in to an internal byte[], then cull out the 17,631 bytes I wanted in the first place?

If the literature on DB design wasn't so religious about this block idea, the question would have never come up in my mind, but because it is, I am wondering if I am missing a critical detail here WRT filesystems and optimal block device I/O.

Thank you for reading.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

荒芜了季节 2025-01-09 10:21:42

我认为数据库了解块大小(可能与 fs 块大小不完全相同,但当然应该对齐)的部分原因不仅仅是为了执行块对齐 I/O,而且还为了管理如何执行块对齐 I/O。磁盘数据缓存在内存中,而不仅仅是依赖操作系统缓存。事实上,有些数据库完全绕过操作系统文件系统缓存。让数据库管理缓存有时可以更智能地了解如何利用缓存,而操作系统可能无法提供这一点。

RDBMS 通常会考虑查询期间可以读/写的块数,以便比较不同的执行计划:从同一块获取所有数据的可能性可能是一个值得注意的有用优化的。

我熟悉的大多数数据库都有块缓存/缓冲区的概念,数据库工作集的某些部分位于其中。管理完全由任意范围组成的缓存可能会更难管理。此外,许多数据库实际上将其存储的数据排列为块序列,因此 I/O 模式由此产生。当然,这可能只是最初为没有丰富的操作系统缓存设施的平台编写的数据库的遗留问题......

试图用对你的问题的某种答案来结束这篇漫谈......我的感觉是,阅读映射文件中的任意范围并让操作系统处理额外的溢出应该没问题。就性能而言,尝试让操作系统进行预读可能更重要:例如,使用“advise”调用,以便操作系统可以在处理当前扩展时开始从磁盘读取下一个扩展。当然,还有一种建议操作系统取消缓存已完成的扩展区的方法。

I think part of the reason databases have awareness of a block size (which may not be exactly the same as the fs block size, but of course should align) is not just to perform block-aligned I/O, but also to manage how the disk data is cached in memory rather than just relying on the OS caching. Some databases bypass the OS filesystem cache completely, in fact. Having the database manage the cache sometimes allows greater intelligence as to how that cache is utilised, that the OS might not be able to provide.

An rdbms will typically take account of the number of blocks that could be read/written during a query in order to compare different execution plans: and the possibilities for all the data to be fetched from the same block can be a useful optimisation to take note of.

Most databases I'm familiar with have the concept of a block cache/buffer where some portion of the working set of the database lives. Managing a cache entirely made up of arbitrary extents could potentially be quite a bit harder to manage. Also many databases actually arrange their stored data as a sequence of blocks, so the I/O pattern grows out of that. Of course, this might simply be a legacy of databases originally written for platforms that didn't have rich OS caching facilities...

Trying to conclude this ramble with some sort of answer to your question... my feeling would be that reading from arbitrary extents within the mapped file and letting the OS deal with the extra slop should be fine. Performance-wise, it's probably more important to try and let the OS do read-ahead: e.g. using the "advise" calls so the OS can start reading the next extent from disk while you process the current one. And, of course, a way to advise the OS to uncache extents you've finished with.

谢绝鈎搭 2025-01-09 10:21:42

4KB 块很重要,因为它通常是 MMU 的粒度,因此也是操作系统虚拟内存管理器的粒度。当项目经常一起使用时,设计数据库布局以使这些项目最终出现在同一页面中非常重要。这样,页面错误就会调入页面中的所有项目。

4KB blocks are important because it's typically the granularity of the MMU and hence the OS virtual memory manager. When items are frequently used together, it's important to design your database layout so that these items end up in the same page. This way, a page fault will page in all the items in the page.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文