只读/写入内存段的最佳内存布局
假设我有两个内存段(每个大小相等,大小约为1kb),一个是只读的(初始化后),另一个是读/写的。
就内存性能而言,此类段的最佳内存布局是什么?一次分配、连续段或两次分配(通常不连续)。我的主要架构是linux Intel 64位。
我的感觉是前一种(缓存友好)情况更好。 是否存在首选第二种布局的情况?
Suppose I have two memory segments (equal size each, approximately 1kb in size) , one is read-only (after initialization), and other is read/write.
what is the best layout in memory for such segments in terms of memory performance? one allocation, contiguous segments or two allocations (in general not contiguous). my primary architecture is linux Intel 64-bit.
my feeling is former (cache friendlier) case is better.
is there circumstances, where second layout is preferred?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我会将 2KB 数据放在 4KB 页面的中间,以避免靠近页面边界的读写干扰。同样,出于同样的原因,将写入数据分开也是一个好主意。
拥有连续的读/写块可能比将它们分开的效率低。例如,为仅对只读部分感兴趣的代码存储数据的高速缓存可能会因来自另一个 CPU 的写入而变得无效。即使代码没有读取可写数据,缓存行也会失效并刷新。通过保持块分离,可以避免这种情况,并且写入可写数据块只会使可写块的缓存行无效,并且不会干扰只读块的缓存行。
请注意,这只是可读块和可写块之间的块边界的问题。如果您的块大小远大于缓存行大小,那么这将是一个外围问题,但由于您的块很小,只需要几个缓存行,那么使行无效的问题可能会很严重。
I would put the 2KB of data in the middle of a 4KB page, to avoid interference from reads and writes close to the page boundary. Similarly, keeping the write data separate is also good idea for the same reason.
Having contiguous read/write blocks may be less effiicent than keeping them separate. For example, a cache that is storing data for code interested in just the read-only portion may become invalidated by a write from another cpu. The cache line will be invalidated and refreshed, even though the code wasn't reading the writable data. By keeping the blocks separate, you avoid this case, and writes to the writable data block only invalidate cache lines for the writable block, and do not interfere with cache lines for the read only block.
Note that this is only a concern at the block boundary between the readable and writable blocks. If your block sizes were much larger than the cache line size, then this would be a peripheral problem, but as your blocks are small, requiring just a few cache lines, then the problem of invalidating lines could be significant.
有了这么小的数据,这确实不重要。这两个数组都适合任何级别的缓存。
With that small of data, it really shouldn't matter much. Both of those arrays will fit into any level cache just fine.
这取决于你用内存做什么。我相当确定连续的(并且页面对齐!)永远不会比两个随机放置的段慢,但也不一定会更快。
It'll depend on what you're doing with the memory. I'm fairly certain that contiguous (and page aligned!) would never be slower than two randomly placed segments, but it won't necessarily be any faster.
鉴于它是 Intel 处理器,您可能只需要确保地址间隔不完全是 64k 的倍数。如果是,则来自映射到相同模 64k 地址的任一部分的加载将在 L1 中发生冲突并导致 L1 未命中。还有一个 4MB 别名问题,但如果您遇到这个问题,我会感到惊讶。
Given that it's an Intel processor, you probably only need to ensure that the addresses are not exactly a multiple of 64k apart. If they are, loads from either section that map to the same modulo 64k address will collide in L1 and cause an L1 miss. There's also a 4MB aliasing issue, but I'd be surprised if you ran into that.