Hypervisor: Difference between revisions

From PS5 Developer wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(6 intermediate revisions by 5 users not shown)
Line 1: Line 1:
The PS5 utilizes a presumably custom hypervisor (HV) to protect the non-secure kernel. The hypervisor is a virtual machine monitor (VMM), where kernel and usermode applications such as games run in a guest OS. Hardware, as well as the x86 kernel, talks to the Hypervisor through "hypercalls" and [https://en.wikipedia.org/wiki/Memory-mapped_I/O memory-mapped I/O (MMIO)]. It can be considered the highest privilege level for x86 on the system.
The more conventional use of a hypervisor is to run separate virtual machines on a host machine that are isolated from each other and can run their own guest operating systems. In the PS5 case, it is used primarily for Virtualization Based Security (VBS) to protect the kernel integrity.
The PS5 Hypervisor protects the integrity of the [https://en.wikipedia.org/wiki/Control_register Control Registers (CRs)], which by extension includes Write Protection (WP) and other protections such as [https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention Supervisor Mode Access/Execution Prevention (SMAP/SMEP)]. It also protects the kernel page table entries through the use of nested paging via [http://developer.amd.com/wordpress/media/2012/10/NPT-WP-1%201-final-TM.pdf Second Level Address Translation (SLAT)]. By looking at the hypercalls, it seems Sony has also moved the I/O memory management unit (IOMMU) to the Hypervisor from the kernel.
== Hypercalls ==
The PS5 Hypervisor has very few hypercalls compared to the PS3's. Little information is known about them as of yet beyond the call names.
{| class="wikitable"
{| class="wikitable"
|+ vmmcalls (VMMCALL_HV_*)
|+ vmmcalls (VMMCALL_HV_*)
Line 30: Line 40:
| 0xd || GET_TMR_VIOLATION_ERROR
| 0xd || GET_TMR_VIOLATION_ERROR
|-
|-
| 0xe || VMCLOSURE_INVOCATION
| 0xe || VMCLOSURE_INVOCATION (03.00.00.33 and above)
|-
|-
| 0xf || STARTUP_MP
| 0xf || STARTUP_MP (03.00.00.33 and above)
|-
|-
| 0x10 || DISABLE_STARTUP_MP
| 0x10 || DISABLE_STARTUP_MP (03.00.00.33 and above)
|}
|}
== In-Kernel Hypervisor (<= 2.70) ==
On PS5 System Software 2.70 and lower, the Hypervisor is integrated as part of the kernel binary. Later versions have the Hypervisor as a separately loaded component.
The hypervisor's main goals are to protect kernel code integrity and enforce <code>xotext</code> (aka. eXecute Only Memory or "XOM") on the kernel. To accomplish this, Sony takes advantage of various features provided by AMD Secure Virtual Machine (SVM), such as; Nested Page Tables (NPT), Guest Mode Execute Trap (GMET), and intercepting reads/writes to Control Registers (CRs) as well as Machine State Registers (MSRs). Furthermore, xotext seems to be hardware-backed as a collaboration with AMD, named "nda feature". The hypervisor also manages the I/O Memory Management Unit (IOMMU), as hinted by the fact that it exposes various hypercalls for configuring it.
It is worth noting the hypervisor is very small, especially when compared to that of the PS3. It only supports a handful of hypercalls and mainly exists to protect the kernel. It does not run multiple VMs or use nested virtualization. It only virtualizes the kernel/usermode, which Sony calls "GameOS".
=== Page Tables ===
==== Hypervisor Page Tables ====
On boot, the hypervisor sets up two page tables. It first sets up its own tables, which essentially involves copying the kernel page tables constructed by FreeBSD and re-mapping kernel pages as read/write. Kernel text pages are also not mapped with the <code>xotext</code> bit set, as the hypervisor needs to be able to read kernel .text pages in specific intercept handlers.
==== Nested Page Tables ====
The other set of page tables that are built are the nested page tables for the guest kernel. This is also known as Second-Level Address Translation (SLAT). How it works is that the physical addresses that the kernel "sees" are then translated again through the NPT, which makes the hypervisor the ultimate authority on how physical memory is mapped and what the page permissions are.
Of course, the NPT are stored in a data segment accessible only to the hypervisor, so the guest kernel cannot edit nested Page Table Entries (PTEs). As opposed to the hypervisor's own page tables, kernel text pages have the <code>xotext</code> bit (bit 58) set in most cases for NPT PTEs.
Also noteworthy is the fact that the hypervisor enables the GMET feature. At its core, this feature prevents the CPU from executing code from lower-privileged pages in a higher-privileged context. In other words, if you try to execute a user-mapped code page as kernel (CPL0), a Nested Page Fault or <code>#NPF</code> is thrown and the system will crash.
=== Control Register Protection ===
One of the most important tasks of the hypervisor is protecting the integrity of sensitive control register bits, especially CR0 and CR4. Bits such as the Write Protect (WP) bit, Protection Enabled (PE) bit, and Supervisor Mode Access/Execution Prevention (SMAP/SMEP) bits are very useful for attacking the kernel, and so writes to these registers are intercepted and checked.
Attempting to write to the following CR0 bits gets filtered out and will result in a <code>#GP</code> fault injected into the guest:
{| class="wikitable"
|+ Filtered CR0 Bits
|-
! Bit !! Mnemonic !! Description
|-
| 31 || PG || Paging
|-
| 16 || WP || Write Protect
|-
| 5 || NE || Numeric Error
|-
| 0 || PE || Protection Enable
|}
Similarly, the following CR4 bits are filtered:
{| class="wikitable"
|+ Filtered CR4 Bits
|-
! Bit !! Mnemonic !! Description
|-
| 21 || SMAP || Supervisor Mode Access Protection
|-
| 20 || SMEP || Supervisor Mode Execution Prevention
|-
| 0 || VME || Virtual 8086 Mode Extensions
|}
=== Machine State Register Protection ===
MSRs are another vector that the hypervisor mitigates. This is done by constructing an MSR Protection Map (MSRPM), which is essentially a bitmap of all MSRs that indicate if they're protected from read and/or write. A listing of protected MSRs dumped from a script is provided in a paste link below.
For most MSRs that are protected, violating this protection will result in a <code>#GP</code> fault injected into the guest. One exception to this rule is the Extended Features (EFER) register, which allows some writes which are masked. Attempting to change the following EFER bits will simply be dropped and not take effect:
{| class="wikitable"
|+ Masked EFER Bits
|-
! Bit !! Mnemonic !! Description
|-
| 16 || nda || xotext (XOM)
|-
| 12 || SVME || Secure Virtual Machine Enable
|-
| 11 || NXE || No-Execute Enable
|}
The parsed MSR protection map can be found [https://gist.github.com/Cryptogenic/83235b4cf4315500cb3146ed6d978ad0 here], and the script that parsed it [https://gist.github.com/Cryptogenic/aab893a2c608f2aeb117983fd97822d8 here].
=== Other intercepts ===
Beyond CR accesses, MSR accesses, and hypercalls, the hypervisor also handles intercepts and various other exit codes. They are listed below:
* <code>VMEXIT_CPUID</code> (presumably for PS4 emulation)
* <code>VMEXIT_RDPRU</code> (always injects a <code>#GP</code> exception into guest)

Latest revision as of 04:27, 30 October 2024

The PS5 utilizes a presumably custom hypervisor (HV) to protect the non-secure kernel. The hypervisor is a virtual machine monitor (VMM), where kernel and usermode applications such as games run in a guest OS. Hardware, as well as the x86 kernel, talks to the Hypervisor through "hypercalls" and memory-mapped I/O (MMIO). It can be considered the highest privilege level for x86 on the system.

The more conventional use of a hypervisor is to run separate virtual machines on a host machine that are isolated from each other and can run their own guest operating systems. In the PS5 case, it is used primarily for Virtualization Based Security (VBS) to protect the kernel integrity.

The PS5 Hypervisor protects the integrity of the Control Registers (CRs), which by extension includes Write Protection (WP) and other protections such as Supervisor Mode Access/Execution Prevention (SMAP/SMEP). It also protects the kernel page table entries through the use of nested paging via Second Level Address Translation (SLAT). By looking at the hypercalls, it seems Sony has also moved the I/O memory management unit (IOMMU) to the Hypervisor from the kernel.

Hypercalls[edit | edit source]

The PS5 Hypervisor has very few hypercalls compared to the PS3's. Little information is known about them as of yet beyond the call names.

vmmcalls (VMMCALL_HV_*)
0 GET_MESSAGE_CONF
1 GET_MESSAGE_COUNT
2 START_LOADING_SELF
3 FINISH_LOADING_SELF
4 SET_CPUID_PS4
5 SET_CPUID_PPR
6 IOMMU_SET_GUEST_BUFFERS
7 IOMMU_ENABLE_DEVICE
8 IOMMU_BIND_PASID
9 IOMMU_UNBIND_PASID
0xa IOMMU_CHECK_CMD_COMPLETION
0xb IOMMU_CHECK_EVLOG_REGS
0xc IOMMU_READ_DEVICE_TABLE
0xd GET_TMR_VIOLATION_ERROR
0xe VMCLOSURE_INVOCATION (03.00.00.33 and above)
0xf STARTUP_MP (03.00.00.33 and above)
0x10 DISABLE_STARTUP_MP (03.00.00.33 and above)

In-Kernel Hypervisor (<= 2.70)[edit | edit source]

On PS5 System Software 2.70 and lower, the Hypervisor is integrated as part of the kernel binary. Later versions have the Hypervisor as a separately loaded component.

The hypervisor's main goals are to protect kernel code integrity and enforce xotext (aka. eXecute Only Memory or "XOM") on the kernel. To accomplish this, Sony takes advantage of various features provided by AMD Secure Virtual Machine (SVM), such as; Nested Page Tables (NPT), Guest Mode Execute Trap (GMET), and intercepting reads/writes to Control Registers (CRs) as well as Machine State Registers (MSRs). Furthermore, xotext seems to be hardware-backed as a collaboration with AMD, named "nda feature". The hypervisor also manages the I/O Memory Management Unit (IOMMU), as hinted by the fact that it exposes various hypercalls for configuring it.

It is worth noting the hypervisor is very small, especially when compared to that of the PS3. It only supports a handful of hypercalls and mainly exists to protect the kernel. It does not run multiple VMs or use nested virtualization. It only virtualizes the kernel/usermode, which Sony calls "GameOS".

Page Tables[edit | edit source]

Hypervisor Page Tables[edit | edit source]

On boot, the hypervisor sets up two page tables. It first sets up its own tables, which essentially involves copying the kernel page tables constructed by FreeBSD and re-mapping kernel pages as read/write. Kernel text pages are also not mapped with the xotext bit set, as the hypervisor needs to be able to read kernel .text pages in specific intercept handlers.

Nested Page Tables[edit | edit source]

The other set of page tables that are built are the nested page tables for the guest kernel. This is also known as Second-Level Address Translation (SLAT). How it works is that the physical addresses that the kernel "sees" are then translated again through the NPT, which makes the hypervisor the ultimate authority on how physical memory is mapped and what the page permissions are.

Of course, the NPT are stored in a data segment accessible only to the hypervisor, so the guest kernel cannot edit nested Page Table Entries (PTEs). As opposed to the hypervisor's own page tables, kernel text pages have the xotext bit (bit 58) set in most cases for NPT PTEs.

Also noteworthy is the fact that the hypervisor enables the GMET feature. At its core, this feature prevents the CPU from executing code from lower-privileged pages in a higher-privileged context. In other words, if you try to execute a user-mapped code page as kernel (CPL0), a Nested Page Fault or #NPF is thrown and the system will crash.

Control Register Protection[edit | edit source]

One of the most important tasks of the hypervisor is protecting the integrity of sensitive control register bits, especially CR0 and CR4. Bits such as the Write Protect (WP) bit, Protection Enabled (PE) bit, and Supervisor Mode Access/Execution Prevention (SMAP/SMEP) bits are very useful for attacking the kernel, and so writes to these registers are intercepted and checked.

Attempting to write to the following CR0 bits gets filtered out and will result in a #GP fault injected into the guest:

Filtered CR0 Bits
Bit Mnemonic Description
31 PG Paging
16 WP Write Protect
5 NE Numeric Error
0 PE Protection Enable

Similarly, the following CR4 bits are filtered:

Filtered CR4 Bits
Bit Mnemonic Description
21 SMAP Supervisor Mode Access Protection
20 SMEP Supervisor Mode Execution Prevention
0 VME Virtual 8086 Mode Extensions

Machine State Register Protection[edit | edit source]

MSRs are another vector that the hypervisor mitigates. This is done by constructing an MSR Protection Map (MSRPM), which is essentially a bitmap of all MSRs that indicate if they're protected from read and/or write. A listing of protected MSRs dumped from a script is provided in a paste link below.

For most MSRs that are protected, violating this protection will result in a #GP fault injected into the guest. One exception to this rule is the Extended Features (EFER) register, which allows some writes which are masked. Attempting to change the following EFER bits will simply be dropped and not take effect:

Masked EFER Bits
Bit Mnemonic Description
16 nda xotext (XOM)
12 SVME Secure Virtual Machine Enable
11 NXE No-Execute Enable

The parsed MSR protection map can be found here, and the script that parsed it here.

Other intercepts[edit | edit source]

Beyond CR accesses, MSR accesses, and hypercalls, the hypervisor also handles intercepts and various other exit codes. They are listed below:

  • VMEXIT_CPUID (presumably for PS4 emulation)
  • VMEXIT_RDPRU (always injects a #GP exception into guest)