Hypervisor: Difference between revisions

From PS5 Developer wiki
Jump to navigation Jump to search
(Add technical info on hypervisor for 2.xx and lower fw)
Line 88: Line 88:
| 21 || SMAP || Supervisor Mode Access Protection
| 21 || SMAP || Supervisor Mode Access Protection
|-
|-
| 20 || SMEP || Supervisor Mode Execution Preventino
| 20 || SMEP || Supervisor Mode Execution Prevention
|-
|-
| 0 || VME || Virtual 8086 Mode Extensions
| 0 || VME || Virtual 8086 Mode Extensions

Revision as of 14:13, 22 June 2024

Hypercalls

vmmcalls (VMMCALL_HV_*)
0 GET_MESSAGE_CONF
1 GET_MESSAGE_COUNT
2 START_LOADING_SELF
3 FINISH_LOADING_SELF
4 SET_CPUID_PS4
5 SET_CPUID_PPR
6 IOMMU_SET_GUEST_BUFFERS
7 IOMMU_ENABLE_DEVICE
8 IOMMU_BIND_PASID
9 IOMMU_UNBIND_PASID
0xa IOMMU_CHECK_CMD_COMPLETION
0xb IOMMU_CHECK_EVLOG_REGS
0xc IOMMU_READ_DEVICE_TABLE
0xd GET_TMR_VIOLATION_ERROR
0xe VMCLOSURE_INVOCATION (03.00.00.33 and above)
0xf STARTUP_MP (03.00.00.33 and above)
0x10 DISABLE_STARTUP_MP (03.00.00.33 and above)

In-Kernel Hypervisor (<= 2.50)

On 2.50 and lower, the hypervisor is integrated as part of the kernel binary. This is the "first iteration" of the hypervisor, later versions have the hypervisor as a separately loaded component. The hypervisor's main goals are to protect kernel code integrity and enforce xotext (aka. eXecute Only Memory or "XOM") on the kernel.

To accomplish this, Sony takes advantage of various features provided by AMD Secure Virtual Machine (SVM), such as; Nested Page Tables (NPT), Guest Mode Execute Trap (GMET), and intercepting reads/writes to Control Registers (CRs) as well as Machine State Registers (MSRs). Furthermore, xotext seems to be hardware-backed as a collaboration with AMD, named "nda feature". The hypervisor also manages the I/O Memory Management Unit (IOMMU), as hinted by the fact that it exposes various hypercalls for configuring it.

It's worth noting the hypervisor is very small, especially when compared to that of the PS3. It only supports a handful of hypercalls and mainly exists to protect the kernel. It doesn't run multiple VMs or use nested virtualization, it only virtualizes the kernel/userspace, which Sony calls "GameOS".

Page Tables

Hypervisor Page Tables

On boot, the hypervisor sets up two page tables. It first sets up its own tables, which essentially involves copying the kernel page tables constructed by FreeBSD and re-mapping kernel pages as read/write. Kernel text pages are also not mapped with the xotext bit set, as the hypervisor needs to be able to read kernel .text pages in specific intercept handlers.

Nested Page Tables

The other set of page tables that are built are the nested page tables for the guest kernel. This is also known as Second-Level Address Translation (SLAT). How it works is that the physical addresses that the kernel "sees" are then translated again through the NPT, which makes the hypervisor the ultimate authority on how physical memory is mapped and what the page permissions are.

Of course, the NPT are stored in a data segment accessible only to the hypervisor, so the guest kernel cannot edit nested Page Table Entries (PTEs). As opposed to the hypervisor's own page tables, kernel text pages have the xotext bit (bit 58) set in most cases for NPT PTEs.

Also noteworthy is the fact that the hypervisor enables the GMET feature. At its core, this feature prevents the CPU from executing code from lower-privileged pages in a higher-privileged context. In other words, if you try to execute a user-mapped code page as kernel (CPL0), a Nested Page Fault or #NPF is thrown and the system will crash.

Control Register Protection

One of the most important tasks of the hypervisor is protecting the integrity of sensitive control register bits, especially CR0 and CR4. Bits such as the Write Protect (WP) bit, Protection Enabled (PE) bit, and Supervisor Mode Access/Execution Prevention (SMAP/SMEP) bits are very useful for attacking the kernel, and so writes to these registers are intercepted and checked.

Attempting to write to the following CR0 bits gets filtered out and will result in a #GP fault injected into the guest:

Filtered CR0 Bits
Bit Mnemonic Description
31 PG Paging
16 WP Write Protect
5 NE Numeric Error
0 PE Protection Enable

Similarly, the following CR4 bits are filtered:

Filtered CR4 Bits
Bit Mnemonic Description
21 SMAP Supervisor Mode Access Protection
20 SMEP Supervisor Mode Execution Prevention
0 VME Virtual 8086 Mode Extensions

Machine State Register Protection

MSRs are another vector that the hypervisor mitigates. This is done by constructing an MSR Protection Map (MSRPM), which is essentially a bitmap of all MSRs that indicate if they're protected from read and/or write. A listing of protected MSRs dumped from a script is provided in a paste link below.

For most MSRs that are protected, violating this protection will result in a #GP fault injected into the guest. One exception to this rule is the Extended Features (EFER) register, which allows some writes which are masked. Attempting to change the following EFER bits will simply be dropped and not take effect:

Masked EFER Bits
Bit Mnemonic Description
16 nda xotext (XOM)
12 SVME Secure Virtual Machine Enable
11 NXE No-Execute Enable

The parsed MSR protection map can be found here, and the script that parsed it here.

Other intercepts

Beyond CR accesses, MSR accesses, and hypercalls, the hypervisor also handles intercepts and various other exit codes. They are listed below:

  • VMEXIT_CPUID (presumably for PS4 emulation)
  • VMEXIT_RDPRU (always injects a #GP exception into guest)