Talk:PS2 Emulation: Difference between revisions
(→Regs) |
|||
Line 47: | Line 47: | ||
*Cop2 clamping is hardcodded in pcsx2 as far as i know, if no then is likely also governed by EE/FPU setting not VU/VU0 | *Cop2 clamping is hardcodded in pcsx2 as far as i know, if no then is likely also governed by EE/FPU setting not VU/VU0 | ||
*xx-no-clamping setting is not really no clamping known from pcsx2. This is special mode which can be used regardless of other clamp commands. To compare pcsx2 have similar mode only for FPU (Full), to fully mimic that mode we still need fpu-to-double enabled. | *xx-no-clamping setting is not really no clamping known from pcsx2. This is special mode which can be used regardless of other clamp commands. To compare pcsx2 have similar mode only for FPU (Full), to fully mimic that mode we still need fpu-to-double enabled. | ||
==ee-native-function== | |||
Emulator have set of predefined functions used in popular PS2 SDK libraries. Those function are highly optimized to run natively on x64. <br> | |||
'''--ee-native-function=name,address''' under the hood this is hooking selected address, and replace it with jump to predefined function. Functions available in JAK TPL emu: | |||
memset | fptoui | ieee754_sinf | |||
memcpy | fptodp | ieee754_cosf | |||
strlen | dptofp | ieee754_sqrtf | |||
strcmp | fabs | asinf | |||
strcasecmp | fabsf | acosf | |||
litodp | ieee754_atan2f | sinf | |||
dptoli | ieee754_asinf | cosf | |||
floatdidf | ieee754_acosf | sqrtf | |||
This drastically reduce emitted code size for selected function. Additionally there is no need to recompile that at all, emulator just emit jump to label, and that's all. Additionally emulator advance delta clock to compensate cycles which will be normally took by original function. | |||
<br><br> | |||
Example ee_native_floatdidf | |||
vcvtsi2sd xmm0, xmm0, rdi | |||
vmovq rax, xmm0 | |||
retn | |||
This is what real floatdidf looks like originally in ps2 mips, you can imagine that recompiled x64 code will be much longer. Every single instruction will be translated/recompiled separately. | |||
addiu $sp, -0x30 | |||
sd $s0, 0x20+saved_s0($sp) | |||
move $s0, $a0 | |||
sd $s1, 0x20+saved_s1($sp) | |||
li $s1, 0x81E0 | |||
dsll32 $s1, 15 | |||
dsra32 $a0, $s0, 0 | |||
sd $ra, 0x20+saved_ra($sp) | |||
jal litodp | |||
nop | |||
move $a1, $s1 | |||
jal dpmul | |||
move $a0, $v0 | |||
move $a1, $s1 | |||
jal dpmul | |||
move $a0, $v0 | |||
move $s1, $v0 | |||
lui $v0, 0xFFFF | |||
dsrl32 $v0, 0 | |||
and $s0, $v0 | |||
dsll32 $s0, 0 | |||
dsra32 $s0, 0 | |||
jal litodp | |||
move $a0, $s0 | |||
bgez $s0, loc_2F3734 | |||
move $a0, $s1 | |||
li $a1, 0x83E0 | |||
dsll32 $a1, 15 | |||
jal dpadd | |||
move $a0, $v0 | |||
move $a0, $s1 | |||
jal dpadd | |||
move $a1, $v0 | |||
ld $ra, 0x20+saved_ra($sp) | |||
ld $s1, 0x20+saved_s1($sp) | |||
ld $s0, 0x20+saved_s0($sp) | |||
jr $ra | |||
addiu $sp, 0x30 | |||
This is corner case example as floatdidf convert a 64bit signed integer to IEEE double, and PS2 developers generally had no reason to use doubles (fpu/vu are operating on 32 bit floats). But you can see that whole conversion is practically done in 1 opcode, while ps2 take massive function to do this. Other functions are usually less optimized, but still really worth it. |
Revision as of 13:39, 5 January 2023
TODO: Please remove unneeded uppercase letters not at the start of sentences.
- This Is Not Elon Musk Here :P - Roxanne
Regs
VF regs you (Scalerize) described are VU0/COP2 only. Right after vf regs you can find vi regs (210+). Vi regs are only 32regs x 32 bit (vi00 to vi15, and 16 control/special regs) Edit: mapped by 0x10 tho. You can find similar array of regs for VU1 on 1040000000 or 1050000000. I don't know exactly where. This is virtual mapping and i don't own ps4 to test it really. --Kozarovv (talk) 16:54, 2 January 2023 (UTC)
Will work on this stuff when i get the time! thank you so much! -- Scalerize Edit: i do not know what the registers for vu1 are since pcsx2 does not use them, So here are dumps that i hope will help you figure it out! it's a dump from rayman m during the language select screen. why rayman m ? well because the values do not change in the registers at this screen! so it's the same values for both of us!
- Thanks. Pcsx2 use vu1 regs, you just can't see them in debugger because for VU1 that will be pointless. :) From your dumps:
- 1040000000 VU1 regs, mapping like on VU0.
- 1050000000 VU1 micro data memory (1100C000 on real ps2 and pcsx2 debugger) size 0x4000.
- 1050004000 VU1 micro data memory mirror (1100C000 on real ps2 and pcsx2 debugger) size 0x4000. Likely mirrored 2 more times on 8000 and c000
- 104000C000 emulator place here VU1 constants used in popular operations. Eatan/eexp constants, masks for clamping, etc. Similar array can be found in Pcsx2 (mVU_Globals), Dobiestation (atan_const, etc), Play! (GenerateEATAN, etc.)
- 1030004000 emulator place here VU0 constants used in popular operations. Like above (vu0 don't have efu so placing there efu constants for eatan/eexp is pointless, but there they are).
--Kozarovv (talk) 09:37, 5 January 2023 (UTC)
Misc info
Some data that eventually need to be posted on main emulation page. All data posted here is obtained from jak tpl (so called v1) emulator. All data is confirmed in code itself, no guessing (unless said otherwise). Time to start releasing that old work to public.
- Settings ignored by emulator (there is more than that): https://pastebin.com/Hm9bfnF6
- Settings which use bool (0/1, emu accept true/false on/off too) as value: https://pastebin.com/iaLLAXHn
- Settings which use double float as value: https://pastebin.com/cZvxCb6K (unk max values are likely DBL_MAX )
- Default VU1 settings used by Jak TPL emu: https://pastebin.com/tDsTNWFH
- Default VU0 settings used by Jak TPL emu: https://pastebin.com/iSEngpJh
- Default VU settings used by Jak TPL emu: https://pastebin.com/NL8Vae1b
- Default IOP settings used by Jak TPL emu: https://pastebin.com/9K4dk6vb
- Default FPU settings used by Jak TPL emu: https://pastebin.com/YrF7fBT5
- Default EE settings used by Jak TPL emu: https://pastebin.com/SBXimZhc (awesome formatting pastebin, good job)
- Default COP2 settings used by Jak TPL emu: https://pastebin.com/aG0LDryy
Misc misc info
- Both settings do the same thing:
--external-hdd-fix --cdvd-determinism
--ee-kernel-hle --ee-injection-kernel
- Setting take unused value:
--ee-cache-breaks-block No matter which value is used, 1 is set.
Few popular misunderstandings
- vu-xgkick-delay take integer between 0-31 (confirmed on both emu and compiler side), and not float (0.5 is invalid, will be truncated to 0 probably)
- Cop2 rounding in pcsx2 is governed by "EE/FPU" rounding setting, not by VU or VU0.
- Cop2 clamping is hardcodded in pcsx2 as far as i know, if no then is likely also governed by EE/FPU setting not VU/VU0
- xx-no-clamping setting is not really no clamping known from pcsx2. This is special mode which can be used regardless of other clamp commands. To compare pcsx2 have similar mode only for FPU (Full), to fully mimic that mode we still need fpu-to-double enabled.
ee-native-function
Emulator have set of predefined functions used in popular PS2 SDK libraries. Those function are highly optimized to run natively on x64.
--ee-native-function=name,address under the hood this is hooking selected address, and replace it with jump to predefined function. Functions available in JAK TPL emu:
memset | fptoui | ieee754_sinf memcpy | fptodp | ieee754_cosf strlen | dptofp | ieee754_sqrtf strcmp | fabs | asinf strcasecmp | fabsf | acosf litodp | ieee754_atan2f | sinf dptoli | ieee754_asinf | cosf floatdidf | ieee754_acosf | sqrtf
This drastically reduce emitted code size for selected function. Additionally there is no need to recompile that at all, emulator just emit jump to label, and that's all. Additionally emulator advance delta clock to compensate cycles which will be normally took by original function.
Example ee_native_floatdidf
vcvtsi2sd xmm0, xmm0, rdi vmovq rax, xmm0 retn
This is what real floatdidf looks like originally in ps2 mips, you can imagine that recompiled x64 code will be much longer. Every single instruction will be translated/recompiled separately.
addiu $sp, -0x30 sd $s0, 0x20+saved_s0($sp) move $s0, $a0 sd $s1, 0x20+saved_s1($sp) li $s1, 0x81E0 dsll32 $s1, 15 dsra32 $a0, $s0, 0 sd $ra, 0x20+saved_ra($sp) jal litodp nop move $a1, $s1 jal dpmul move $a0, $v0 move $a1, $s1 jal dpmul move $a0, $v0 move $s1, $v0 lui $v0, 0xFFFF dsrl32 $v0, 0 and $s0, $v0 dsll32 $s0, 0 dsra32 $s0, 0 jal litodp move $a0, $s0 bgez $s0, loc_2F3734 move $a0, $s1 li $a1, 0x83E0 dsll32 $a1, 15 jal dpadd move $a0, $v0 move $a0, $s1 jal dpadd move $a1, $v0 ld $ra, 0x20+saved_ra($sp) ld $s1, 0x20+saved_s1($sp) ld $s0, 0x20+saved_s0($sp) jr $ra addiu $sp, 0x30
This is corner case example as floatdidf convert a 64bit signed integer to IEEE double, and PS2 developers generally had no reason to use doubles (fpu/vu are operating on 32 bit floats). But you can see that whole conversion is practically done in 1 opcode, while ps2 take massive function to do this. Other functions are usually less optimized, but still really worth it.