Linux 内核使 TLB 条目失效

发布于 2024-12-19 10:26:54 字数 1396 浏览 4 评论 0原文

在 Linux 内核中，我编写了类似于 copy_page_range (mm/memory.c) 的代码，因此可以通过 COW 优化将内存从一个进程复制到另一个进程。目标地址和源地址可以偏移 PAGE_SIZE 并且 COW 仍然有效。然而，我注意到，在用户程序中，当我从相同的源地址复制到不同的目标地址时，TLB 似乎没有正确刷新。在较高级别上，我的用户级代码执行以下操作（我一次在计算机上复制一页，0x1000 字节）：

SRC=0x20000000

写入 SRC（调用关联的页面 page1）。
将 SRC 复制到目标进程中的 0x30000000 的系统调用。现在，源进程地址 0x20000000 和目标进程地址 0x30000000 指向同一页 (page1)。
编写与 SRC 不同的内容（这应该触发页面错误来处理 COW）。假设源地址现在指向page2。
将 SRC 复制到目标进程中的 0x30001000 的系统调用。

此时，应该存在两个单独的页面： SRC 0x20000000 第2页 夏令时 0x30000000 第 1 页 DST 0x30001000 page2

我发现在步骤 3 中，当我向 src 0x20000000 写入不同的内容时，不会生成页面错误。经检查，实际页面映射为： SRC 0x20000000 第1页 夏令时 0x30000000 第 1 页 DST 0x30001000 page1

在我的代码中，如果我调用 flush_tlb_page 并传递源地址，用户代码将按照正确的页面映射按预期工作。所以我确信我没有正确维护 TLB。在copy_page_range中，内核在更改页表之前和之后调用mmu_notifier_invalidate_range_start/end。我正在做完全相同的事情，并仔细检查了我确实将正确的 struct_mm 和地址传递给 mmu_notifier_invalidate_range_start/end。这个函数不处理刷新 tlb 吗？

好吧，从字面上看，当我写完这段代码时，我检查了 dup_mmap 并意识到 copy_page_range 的主要调用者是 dup_mmap (kernel/fork.c ），调用flush_tlb_mm。我猜我应该在内核代码之前和之后调用 flush_cache_range 和 flush_tlb_range 。这是正确的吗？ mmu_notifier_invalidate_range_start/end 到底是做什么的？

原文

In the linux kernel, I wrote code that resembles copy_page_range (mm/memory.c) so copy memory from one process to another with COW optimization. The destination and source addresses can be offset by PAGE_SIZE and COW still works. I noticed, however, that in a user program when I copy from the same source address to different destination addresses, the TLB does not seem to be properly flushed. At a high level, my user level code does the following (I copy exactly one page, 0x1000 bytes on my machine, at a time):

SRC=0x20000000

Write to SRC (call the associated page page1).
Syscall to copy SRC into 0x30000000 in destination process. Now, src process address 0x20000000 and destination process address 0x30000000 point to the same page (page1).
Write something different to SRC (this should trigger a page fault to handle the COW). Assume source address now points to page2.
Syscall to copy SRC into 0x30001000 in destination process.

At this point, two separate pages should exist:
SRC 0x20000000 page2
DST 0x30000000 page1
DST 0x30001000 page2

I find that at step 3, when I write something different into src 0x20000000, no page fault is generated. Upon inspection, the actual page mappings are:
SRC 0x20000000 page1
DST 0x30000000 page1
DST 0x30001000 page1

In my code, if I call flush_tlb_page and pass the source address, the user code works as expected with the proper page mappings. So I am convinced I am not maintaining the TLB correctly. In copy_page_range, the kernel calls mmu_notifier_invalidate_range_start/end before and after it alters page tables. I am doing the exact same thing and have double checked I am indeed passing the correct struct_mm and addresses to mmu_notifier_invalidate_range_start/end. Does this function not handle flushing the tlb?

Ok, so literally as I finished typing this, I checked dup_mmap and realized that the primary caller of copy_page_range, dup_mmap (kernel/fork.c), calls flush_tlb_mm. I am guessing I should call flush_cache_range and flush_tlb_range before and after my kernel code. Is this correct? What exactly does mmu_notifier_invalidate_range_start/end do?