Hypervisor: Difference between revisions
CelesteBlue (talk | contribs) (Undo revision 1210 by 205.174.147.66 (talk)) Tag: Undo |
(Add technical info on hypervisor for 2.xx and lower fw) |
||
Line 1: | Line 1: | ||
== Hypercalls == | |||
{| class="wikitable" | {| class="wikitable" | ||
|+ vmmcalls (VMMCALL_HV_*) | |+ vmmcalls (VMMCALL_HV_*) | ||
Line 36: | Line 38: | ||
| 0x10 || DISABLE_STARTUP_MP (03.00.00.33 and above) | | 0x10 || DISABLE_STARTUP_MP (03.00.00.33 and above) | ||
|} | |} | ||
== In-Kernel Hypervisor (<= 2.50) == | |||
On 2.50 and lower, the hypervisor is integrated as part of the kernel binary. This is the "first iteration" of the hypervisor, later versions have the hypervisor as a separately loaded component. The hypervisor's main goals are to protect kernel code integrity and enforce <code>xotext</code> (aka. eXecute Only Memory or "XOM") on the kernel. | |||
To accomplish this, Sony takes advantage of various features provided by AMD Secure Virtual Machine (SVM), such as; Nested Page Tables (NPT), Guest Mode Execute Trap (GMET), and intercepting reads/writes to Control Registers (CRs) as well as Machine State Registers (MSRs). Furthermore, xotext seems to be hardware-backed as a collaboration with AMD, named "nda feature". The hypervisor also manages the I/O Memory Management Unit (IOMMU), as hinted by the fact that it exposes various hypercalls for configuring it. | |||
It's worth noting the hypervisor is very small, especially when compared to that of the PS3. It only supports a handful of hypercalls and mainly exists to protect the kernel. It doesn't run multiple VMs or use nested virtualization, it only virtualizes the kernel/userspace, which Sony calls "GameOS". | |||
=== Page Tables === | |||
==== Hypervisor Page Tables ==== | |||
On boot, the hypervisor sets up two page tables. It first sets up its own tables, which essentially involves copying the kernel page tables constructed by FreeBSD and re-mapping kernel pages as read/write. Kernel text pages are also not mapped with the <code>xotext</code> bit set, as the hypervisor needs to be able to read kernel .text pages in specific intercept handlers. | |||
==== Nested Page Tables ==== | |||
The other set of page tables that are built are the nested page tables for the guest kernel. This is also known as Second-Level Address Translation (SLAT). How it works is that the physical addresses that the kernel "sees" are then translated again through the NPT, which makes the hypervisor the ultimate authority on how physical memory is mapped and what the page permissions are. | |||
Of course, the NPT are stored in a data segment accessible only to the hypervisor, so the guest kernel cannot edit nested Page Table Entries (PTEs). As opposed to the hypervisor's own page tables, kernel text pages have the <code>xotext</code> bit (bit 58) set in most cases for NPT PTEs. | |||
Also noteworthy is the fact that the hypervisor enables the GMET feature. At its core, this feature prevents the CPU from executing code from lower-privileged pages in a higher-privileged context. In other words, if you try to execute a user-mapped code page as kernel (CPL0), a Nested Page Fault or <code>#NPF</code> is thrown and the system will crash. | |||
=== Control Register Protection === | |||
One of the most important tasks of the hypervisor is protecting the integrity of sensitive control register bits, especially CR0 and CR4. Bits such as the Write Protect (WP) bit, Protection Enabled (PE) bit, and Supervisor Mode Access/Execution Prevention (SMAP/SMEP) bits are very useful for attacking the kernel, and so writes to these registers are intercepted and checked. | |||
Attempting to write to the following CR0 bits gets filtered out and will result in a <code>#GP</code> fault injected into the guest: | |||
{| class="wikitable" | |||
|+ Filtered CR0 Bits | |||
|- | |||
! Bit !! Mnemonic !! Description | |||
|- | |||
| 31 || PG || Paging | |||
|- | |||
| 16 || WP || Write Protect | |||
|- | |||
| 5 || NE || Numeric Error | |||
|- | |||
| 0 || PE || Protection Enable | |||
|} | |||
Similarly, the following CR4 bits are filtered: | |||
{| class="wikitable" | |||
|+ Filtered CR4 Bits | |||
|- | |||
! Bit !! Mnemonic !! Description | |||
|- | |||
| 21 || SMAP || Supervisor Mode Access Protection | |||
|- | |||
| 20 || SMEP || Supervisor Mode Execution Preventino | |||
|- | |||
| 0 || VME || Virtual 8086 Mode Extensions | |||
|} | |||
=== Machine State Register Protection === | |||
MSRs are another vector that the hypervisor mitigates. This is done by constructing an MSR Protection Map (MSRPM), which is essentially a bitmap of all MSRs that indicate if they're protected from read and/or write. A listing of protected MSRs dumped from a script is provided in a paste link below. | |||
For most MSRs that are protected, violating this protection will result in a <code>#GP</code> fault injected into the guest. One exception to this rule is the Extended Features (EFER) register, which allows some writes which are masked. Attempting to change the following EFER bits will simply be dropped and not take effect: | |||
{| class="wikitable" | |||
|+ Masked EFER Bits | |||
|- | |||
! Bit !! Mnemonic !! Description | |||
|- | |||
| 16 || nda || xotext (XOM) | |||
|- | |||
| 12 || SVME || Secure Virtual Machine Enable | |||
|- | |||
| 11 || NXE || No-Execute Enable | |||
|} | |||
The parsed MSR protection map can be found [https://gist.github.com/Cryptogenic/83235b4cf4315500cb3146ed6d978ad0 here], and the script that parsed it [https://gist.github.com/Cryptogenic/aab893a2c608f2aeb117983fd97822d8 here]. | |||
=== Other intercepts === | |||
Beyond CR accesses, MSR accesses, and hypercalls, the hypervisor also handles intercepts and various other exit codes. They are listed below: | |||
* <code>VMEXIT_CPUID</code> (presumably for PS4 emulation) | |||
* <code>VMEXIT_RDPRU</code> (always injects a <code>#GP</code> exception into guest) |
Revision as of 21:40, 28 June 2023
Hypercalls
0 | GET_MESSAGE_CONF |
1 | GET_MESSAGE_COUNT |
2 | START_LOADING_SELF |
3 | FINISH_LOADING_SELF |
4 | SET_CPUID_PS4 |
5 | SET_CPUID_PPR |
6 | IOMMU_SET_GUEST_BUFFERS |
7 | IOMMU_ENABLE_DEVICE |
8 | IOMMU_BIND_PASID |
9 | IOMMU_UNBIND_PASID |
0xa | IOMMU_CHECK_CMD_COMPLETION |
0xb | IOMMU_CHECK_EVLOG_REGS |
0xc | IOMMU_READ_DEVICE_TABLE |
0xd | GET_TMR_VIOLATION_ERROR |
0xe | VMCLOSURE_INVOCATION (03.00.00.33 and above) |
0xf | STARTUP_MP (03.00.00.33 and above) |
0x10 | DISABLE_STARTUP_MP (03.00.00.33 and above) |
In-Kernel Hypervisor (<= 2.50)
On 2.50 and lower, the hypervisor is integrated as part of the kernel binary. This is the "first iteration" of the hypervisor, later versions have the hypervisor as a separately loaded component. The hypervisor's main goals are to protect kernel code integrity and enforce xotext
(aka. eXecute Only Memory or "XOM") on the kernel.
To accomplish this, Sony takes advantage of various features provided by AMD Secure Virtual Machine (SVM), such as; Nested Page Tables (NPT), Guest Mode Execute Trap (GMET), and intercepting reads/writes to Control Registers (CRs) as well as Machine State Registers (MSRs). Furthermore, xotext seems to be hardware-backed as a collaboration with AMD, named "nda feature". The hypervisor also manages the I/O Memory Management Unit (IOMMU), as hinted by the fact that it exposes various hypercalls for configuring it.
It's worth noting the hypervisor is very small, especially when compared to that of the PS3. It only supports a handful of hypercalls and mainly exists to protect the kernel. It doesn't run multiple VMs or use nested virtualization, it only virtualizes the kernel/userspace, which Sony calls "GameOS".
Page Tables
Hypervisor Page Tables
On boot, the hypervisor sets up two page tables. It first sets up its own tables, which essentially involves copying the kernel page tables constructed by FreeBSD and re-mapping kernel pages as read/write. Kernel text pages are also not mapped with the xotext
bit set, as the hypervisor needs to be able to read kernel .text pages in specific intercept handlers.
Nested Page Tables
The other set of page tables that are built are the nested page tables for the guest kernel. This is also known as Second-Level Address Translation (SLAT). How it works is that the physical addresses that the kernel "sees" are then translated again through the NPT, which makes the hypervisor the ultimate authority on how physical memory is mapped and what the page permissions are.
Of course, the NPT are stored in a data segment accessible only to the hypervisor, so the guest kernel cannot edit nested Page Table Entries (PTEs). As opposed to the hypervisor's own page tables, kernel text pages have the xotext
bit (bit 58) set in most cases for NPT PTEs.
Also noteworthy is the fact that the hypervisor enables the GMET feature. At its core, this feature prevents the CPU from executing code from lower-privileged pages in a higher-privileged context. In other words, if you try to execute a user-mapped code page as kernel (CPL0), a Nested Page Fault or #NPF
is thrown and the system will crash.
Control Register Protection
One of the most important tasks of the hypervisor is protecting the integrity of sensitive control register bits, especially CR0 and CR4. Bits such as the Write Protect (WP) bit, Protection Enabled (PE) bit, and Supervisor Mode Access/Execution Prevention (SMAP/SMEP) bits are very useful for attacking the kernel, and so writes to these registers are intercepted and checked.
Attempting to write to the following CR0 bits gets filtered out and will result in a #GP
fault injected into the guest:
Bit | Mnemonic | Description |
---|---|---|
31 | PG | Paging |
16 | WP | Write Protect |
5 | NE | Numeric Error |
0 | PE | Protection Enable |
Similarly, the following CR4 bits are filtered:
Bit | Mnemonic | Description |
---|---|---|
21 | SMAP | Supervisor Mode Access Protection |
20 | SMEP | Supervisor Mode Execution Preventino |
0 | VME | Virtual 8086 Mode Extensions |
Machine State Register Protection
MSRs are another vector that the hypervisor mitigates. This is done by constructing an MSR Protection Map (MSRPM), which is essentially a bitmap of all MSRs that indicate if they're protected from read and/or write. A listing of protected MSRs dumped from a script is provided in a paste link below.
For most MSRs that are protected, violating this protection will result in a #GP
fault injected into the guest. One exception to this rule is the Extended Features (EFER) register, which allows some writes which are masked. Attempting to change the following EFER bits will simply be dropped and not take effect:
Bit | Mnemonic | Description |
---|---|---|
16 | nda | xotext (XOM) |
12 | SVME | Secure Virtual Machine Enable |
11 | NXE | No-Execute Enable |
The parsed MSR protection map can be found here, and the script that parsed it here.
Other intercepts
Beyond CR accesses, MSR accesses, and hypercalls, the hypervisor also handles intercepts and various other exit codes. They are listed below:
VMEXIT_CPUID
(presumably for PS4 emulation)VMEXIT_RDPRU
(always injects a#GP
exception into guest)