在VMware的ESXi Server等虚拟机监视器中,影子页表是如何实现的?

发布于 2024-08-29 14:51:54 字数 327 浏览 15 评论 0原文

我的理解是,VMM(例如 VMware 的 ESXi Server)维护影子页表,以将来宾操作系统的虚拟页地址直接映射到机器(硬件)地址。有人告诉我,影子页表随后由处理器的分页硬件直接使用,以允许虚拟机中的内存访问在没有转换开销的情况下执行。

我想更多地了解影子页表机制在 VMM 中的工作原理。我上面的高层理解正确吗?如果是,

  • 影子页表的实现使用什么样的数据结构?

  • 从客户操作系统到硬件的控制流程是什么?

  • 如果没有直接阅读开源 VMM 的源代码,我可以查看哪些资源来了解有关硬件虚拟化的更多信息?

My understanding is that VMMs such as VMware's ESXi Server maintain shadow page tables to map virtual page addresses of guest operating systems directly to machine (hardware) addresses. I've been told that shadow page tables are then used directly by the processor's paging hardware to allow memory accesses in the VM to execute without translation overhead.

I would like to understand a bit more about how the shadow page table mechanism works in a VMM. Is my high level understanding above correct? If so,

  • What kind of data structures are used in the implementation of shadow page tables?

  • What is the flow of control from the guest operating system to the hardware?

  • Short of straight up reading the source code of an open source VMM, what resources can I look into to learn more about hardware virtualization?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

终陌 2024-09-05 14:51:54

这是我能告诉你的。如果我错了,请纠正我。
影子页表由Hypervisor/VMM 创建和维护。它是包含客户虚拟地址和机器物理地址的表。想象一下,如果没有影子页表,要获取机器物理地址,我们必须首先获取虚拟地址,然后遍历操作系统(来宾)页表来获取来宾物理地址,然后我们需要将来宾物理地址转换为机器物理地址。那么发生了什么,看看在影子页表的情况下一个客户虚拟地址如何转换为机器物理地址:

  1. 第一个物理处理器将看到虚拟地址,其目的地是获取机器物理地址。第一件事
    它所做的是尝试查看 TLB(翻译查找缓冲区),如果
    条目位于 TLB 中,我们现在获取机器地址。这是最
    简单的情况,我们称之为 TLB 命中情况。没有任何表现
    根本没有问题。它将以所谓的本机速度运行。

    <块引用>

    如果 TLB 中没有条目(TLB 未命中)会发生什么?

  2. 如果 TLB 中没有条目,处理器将在影子页表中执行页表遍历。假设有一个
    相应的映射(Guest VA 到 Machine Physical 地址),
    处理器会将值插入TLB然后重新开始执行
    我们很乐意处理这个案子。这是另一个很好的案例。它可能
    在影子页表中查找大约需要 10 个周期,所以
    就性能而言,我们不必太担心。

    <块引用>

    如果影子页表中没有条目会发生什么?

  3. 处理器正在影子页表中查找,但找不到该条目。在这种情况下,由于查找是特权,因此会出现错误。 VMM(虚拟机监视器)将查找来宾页表来解决问题。这个案子有点复杂。无论如何,当 VMM 遍历来宾页表时,都会有两种可能性。

    1. 在查找的情况下找到了条目:
      当查找找到入口后,我们只能在访客页表中走下去,最后
      获取客人的实际地址。嘿,我们的目标是获取物理机地址。如何
      我们能到达那里吗?监视器将获取访客的物理地址并进行查看
      进入他们的 PMap 表(或结构)。如果找到该条目,它将插入该值
      (基本上是客户虚拟地址,机器物理地址)进入影子页表。
      现在我们已经有了影子页表中的条目,我们就可以开始处理处理器了
      重新启动指令它可以从影子页表中获取映射。

      啊!忘记提及这种情况,监视器正在执行隐藏页面错误来解决
      问题通过使用PMap或PhysMap来获取对应的机器物理地址。

    2. 查找未找到条目的情况
      监视器(VMM)将注入虚拟客户页面错误。现在在客人里面看到
      存在页面错误。操作系统会来解决问题。这可能需要数千
      十万个周期或更多,以防页面被换出到磁盘
      客人。现在假设操作系统(来宾操作系统)解决了该问题。我们可以重新开始3.1步骤。

嗯,整个流程有点复杂。我希望你能理解这个过程。

笔记:
影子页表是在VMware、Microsof等软件中实现的。它仅用于二进制翻译模式(BT)。使用嵌套页表,我们根本不需要影子页表。

影子页表存在一些问题。

  • 我们依靠来宾来使 TLB 失效。
    问题是我们希望保持来宾页表和
    影子页表。想象一下如果来宾更新页表会发生什么
    如果客人正在切换进程。它必须切换页表。在这种情况下
    必须通知硬件嘿,我更新了页表中的条目,然后使其无效。

  • 积极的影子页表缓存是必要的:
    我们需要缓存影子页表。看看来宾进行上下文切换会发生什么
    我们有很多来宾进程。它必须通知硬件它必须改变
    它影子页表指针。每个开关都会使 TLB 闪烁。传统上我们有一个
    每个正在运行的进程都有影子页表,但我们没有影子页表那么多
    与流程进行比较有它的表。

  • 对来宾页表进行写保护(另一个词是跟踪),看看发生什么情况
    例如,页面由于某种原因被操作系统锁定,我们必须得到通知。

Here is what I can tell. Please correct me if I am wrong.
Shadow page table is created and maintained by Hypervisor/VMM. It is the table which contains guest virtual addresses and machine physical addresses. Imagine without shadow page table, to get into machine physical address we have to first get virtual address then walk through the OS(guest) page table to get guest physical address, then we need to convert guest physical address into machine physical address. So here is what happening, see how one guest virtual address get translated into machine physical address under the senario of shadow page table:

  1. First physical processor will see the virutal address, and its destination is to get machine physical address. The first thing
    it do is trying to look at TLB(Translation look aside buffer) if the
    entry is in TLB we are now get the machine address. This is the most
    simple case which we called a TLB hit case. There is no performance
    issue at all. It will run in what ever call a native speed.

    What happen if there is no entry in TLB(TLB miss)?

  2. If there is no entry in TLB, the processor will do a page table walk in shadow page table. Assuming that there is a
    corresponding mapping(Guest VA to Machine Physical address), the
    processor will insert the value in TLB then restart the execution
    and we are good to go this case. This is one other good case. It may
    take around 10 cycle to do a look up in shadow page table, so
    performance wise we dont have to worry much.

    What happen if there is no entry in shadow page table?

  3. Processor is doing a look up in shadow page table and it could not find the entry. Well in this case as the look up is privilege there will be a fault. The VMM(Virtual Machine Monitor) will look up into the guest page table to resolve the issue. This case is a little complicate. Any way when the VMM walk through the guest page table there will be two possibilities.

    1. In the case of the look up found the entry:
      When the look up found the entry, we can only walk in the guest page table to finally
      get guest physical address. Hey our target is to get the physical machine address. How
      do we get there. The monitor will take the guest physical address and will do the look
      up into their PMap table(or structure). If it found the entry, it will insert the value
      (basically guest virtal address, machine physical address) in to the shadow page table.
      Now we have the entry in shadow page table, we are good to go as when the processor
      restart the instruction it can get the mapping from the shadow page table.
      .
      Ah! forget to mention this case the monitor is doing a hidden page fault to resolve the
      issue by using PMap or PhysMap to get the corresponding machine physical address.

    2. In the case of the look up not found the entry
      the monitor(VMM) will inject a virtual guest page fault. Now inside the guest it see
      that there is a page fault. OS will come and resolve the issue. This can take thousand
      to hundred thousand cycle or more in case of the page was swap out to the disk by the
      guest. Now assuming that the OS(guest OS) resolve the issue. We can restart the 3.1 steps.

Well the whole flow is a little complicate. I hope you will understand the process.
.
Note:
Shadow page table is implemented in a software like: VMware, Microsof. It is only used in Binary Translation Mode(BT). With Nested Page Table we dont need a shadow page table at all.

There are some issue with shadow page table.

  • We are rely on the guest to invalidate the TLB.
    The thing is we want to keep the consistence between the guest page table and the
    shadow page table. Imagine what happen if the guest is update the page table, what happen
    if the guest is switching the process. It has to switch the page table. In this case it
    has to inform the hardware hey I update entry in page table and I invalidate it.

  • Aggressive shadow page table caching is necessary:
    We need to cach the shadow page table. See what happen if guest doing context switch
    and we have a lot of guest processes. It has to inform the hardware that it has to change
    it shadow page table pointer. Every switch will flash the TLB. Traditionally we have a
    shadow page table for every running process but we dont have as many as shadow page table
    compare to the processes have it table.

  • Write protect to guest page table (another word is tracing) to see what happen incase of
    example the page got lock by operating system for some reason, we have to get inform.

岁月染过的梦 2024-09-05 14:51:54

浏览次数超过 50 次却什么都没有?

我通过 freenode.net 上的 ##linux 上的 IRC 与 Vidar Holen 进行了交谈。他建议我看一下这份 AMD 技术报告。事实证明这是一个很好的资源。还有其他人有其他建议吗?

over 50 views and nothing?

I spoke to Vidar Holen over IRC on ##linux on freenode.net. He suggested that I take a look at this AMD technical report. It has proven to be a great resource. Anyone else have other suggestions?

硬不硬你别怂 2024-09-05 14:51:54

基本上,来宾操作系统会尝试将虚拟地址转换为物理地址,但这个看似物理地址实际上并不是真正的物理地址,因为它们来自 VMM/管理程序,因此这些地址不是连续的地址,就像常规操作系统的情况一样,没有虚拟机。
因此需要再进行一次转换来将这些客户物理地址映射到真实机器地址。为了实现这一点,VMM/管理程序保留影子页表以将这些来宾物理地址映射到机器物理地址。

此外,硬件提供了一种机制,通过提供 TLB 来避免页表遍历,但如果您可以想象,来宾内部的这些 TLB 一定不是真正的硬件 TLB,并且 VMM/虚拟机管理程序也必须以某种方式模拟它们。同时,影子页表可以用作guest 的TLB。

这就是影子页表的基本思想,但它可能是硬件虚拟化技术中最复杂的技术。我遗漏了很多我也不完全理解的细节和要点。

以下链接讨论了简化影子页表的一些问题以及 kvm 如何尝试避免这些问题。

http://lwn.net/Articles/216794/

还有一点就是还有一个硬件支持这种机制,称为EPT和NPT,intel和amd都支持。

HTH。

Basically, guest OS will try to translate virtual address to physical address but this seemingly physical address is not actually real physical address in that these are coming from VMM/hypervisor and hence these addresses are not contiguous addresses as that is the case with regular OS without VM.
So there requires one more translation to map these guest physical address to real machine address. In order to accomplish this, VMM/hypervisor keeps shadow page tables to map these guest physical address to machine physical addresses.

In addition, hardware provides a mechanism to avoid page table walk by providing TLB but if you can imagine, these TLB inside guest must not be the real hardware TLB and VMM/hypervisor has to somehow emulate these as well. At the same time, shadow page tables can be used as a TLB for guest.

So that is a basic idea of shadow page table but it is probably most complicated piece of technology in hardware virtualization technology. I have left out a lot of details and catches that I also do not completely understand.

Following is a link that talks about some of those issues with simplified shadow page tables and how kvm tries to avoid them.

http://lwn.net/Articles/216794/

One more thing is that there is also a hardware support for this mechanism and they are called EPT and NPT supported by both intel and amd.

HTH.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文