Editing Syscon Error Codes

Jump to navigation Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 267: Line 267:
**[[CELL BE|CELL]] voltage regulators
**[[CELL BE|CELL]] voltage regulators
**[[Thermal#Temperature_Monitors|Temperature Monitors]]
**[[Thermal#Temperature_Monitors|Temperature Monitors]]
Thermal monitor No Command tag Specified Error when the thermal monitor was communicating with the SYSCON. Or Thermal Shutdown (Hardware Initialized). Possibly because the thermal monitor is failing or the connections on it are iffy. Has been seen with CPU or GPU Overheats (1200 or 1201) . A0801103/A0902203 seen from CPU Trace damage (failed delid). Sometimes seen with 1001/1002 combo associated with Bad NEC/TOKINs (Rare). Or when the PS3 was powered on when hot (after heatgun).


Some PS3 motherboards ([[TMU-520]], [[COK-001]], [[COK-002]]), have a temperature monitor located somewhere in the CELL power block. The other retail PS3 motherboard models doesn't measure the temperature of the CELL VR
Some PS3 motherboards ([[TMU-520]], [[COK-001]], [[COK-002]]), have a temperature monitor located somewhere in the CELL power block. The other retail PS3 motherboard models doesn't measure the temperature of the CELL VR
Line 294: Line 292:


In all 3 cases the CPU was damaged or heated in some way.
In all 3 cases the CPU was damaged or heated in some way.
401001/401301/402120 occured when IC6002 pins 17/18 were accidentally short. It blew out C6025 & PS6001. Dead CPU possible. Double check CELL_PLL Voltage and clock generators. Rare code.
Occured while overvolting a 40nm CXD5300 RSX to achieve a higher overclock on a CECH-2501A. An A0801002/A0801301 occured when the voltage regulation module exceeded it's design specifications. SB UART reported there was a Busy loop detected, suggesting that poor filtering under these extreeme conditions can cause a PLL unlock. It did not occur on every YLOD. Note, the 25xx models do not have NEC/Tokins.


==== 14FF (Check Stop) ====
==== 14FF (Check Stop) ====
Line 348: Line 342:
==== 1802 ([[RSX]] Initialization) ====
==== 1802 ([[RSX]] Initialization) ====


A0801802 occurring after the console has booted (step# 80) and causes BE Attention (1701) alarm raised when a Checkstop error (14FF) occurs. Likely the 1802 was the hardware failure that caused the checkstop error. That causes BE ATTENTION to be driven High and the SYSCON shuts the console down with A0801802, A08014FF, and A0801701. That makes sense because the CPU couldn't continue with it's process when the RSX interrupt occurred. These errors have been seen in consoles that were repaired by an RSX reball/replacement.
A0201802 is the error the SYSCON will return when there is no RSX installed at all! Step# 20 is when the RSX is first Initialized. So if it's not responding that early in the Power On Sequence, then it's Dead-Dead or completely missing!


1802 is confirmation that the RSX was involved, if there's any doubt about what's cauing a 3034.
A0801802 is occuring after the console has booted (step# 80) and causes BE Attention (1701) alarm raised when a Checkstop error (14FF) occurs. Likely the 1802 was the hardware failure that caused the checkstop error. That causes BE ATTENTION to be driven High and the SYSCON shuts the console down with A0801802, A08014FF, and A0801701. That makes sense because the CPU couldn't continue with it's process when the RSX interrupt occurred. These errors have been seen in consoles that were repaired by an RSX reball/replacement.


==== 1900 (RTC Voltage) ====
==== 1900 (RTC Voltage) ====
Line 360: Line 354:
==== 1902 (RTC Access) ====
==== 1902 (RTC Access) ====
RTC access
RTC access
'''1b01 ([[CELL]] Initialization)'''
CPU Thermal Sense Error. Thermal Monitor (IC1101) external sense line. Check C1103, R1106/7 & replace IC1101 (COK-00x) before reballing CPU. If all else fails the CPU's thermal diode is dead.
This error tends to occur at step number 20, during core intialization.
Suspected that removing R1106/7 or the CPU iteslf will cause A0201b01/A0A02030 errors. Hasn't been confirmed through sabotauge testing.


==== 1b02 ([[RSX]] Initialization) ====
==== 1b02 ([[RSX]] Initialization) ====
RSX Thermal Sense Error. Thermal Monitor (IC2101) external sense line. Check C2103, R2101/2 & replace IC2101 (COK-00x) before reballing/Replacing RSX. If all else fails the GPU's thermal diode is dead.
Probably same as 1802
 
This error tends to occur at step number 20, during core intialization.
 
Confirmed that removing R2101/2 or the GPU iteslf causes A0201b02/A0A02031 errors.


----
----
Line 403: Line 385:
==== 2013 (Clock CELL, RSX, South Bridge) ====
==== 2013 (Clock CELL, RSX, South Bridge) ====
Clock Generator Error (IC5004)
Clock Generator Error (IC5004)
'''2014 (Unknown)'''
Bad GPU NEC/TOKINs if assiciated with 1002. Was reported in a DIA-002 with other errors. A0091002/A0102014, A0101002/A0102113, and also had A0101002 + 10x A0202120s. Presumably caused by failed RSX tokins.


==== 2020 (HDMI) ====
==== 2020 (HDMI) ====
HDMI Error (IC2502)
HDMI Error (IC2502)
This code is not diagnostic on its own. When coinciding with 1601/1701, 14FF, 1301, and 3034 it usually means a GPU issue. When coinciding with a 1002 it's usually NEC/TOKIN proadlizers. When they occur in bunches AND without more diagnostic codes, all in the same power on, it may be the MultiAV or HDMI Transmitter ICs. The presence of other codes give you context to their meaning.
'''2021 (Unknown)'''
Rare code. Occurred in a CECHBxx model with 10x A0202121 occuring throughout the power on sequencing before starting the bootloader. It did not prevent boot before the console overheated. Errorlog shows two 10x 2121 + 1200 error combos. One of which also had A0802021 coinside. Log shows the console originally had a NEC/Tokin issue (A0801002) before this started. While this is an Unknown error combo, it may be similar to 2120. On another console, timestamps showed A0101002 occurred 1st and then 10x 2120's occurred over the next 10 seconds. Replacing tokins fixed that console. It's possable this is a similar situation, but with a new code.


==== 2022 (DVE) ====
==== 2022 (DVE) ====
Line 421: Line 393:
DVE Error (IC2406, CXM4024R MultiAV controller for analog out)
DVE Error (IC2406, CXM4024R MultiAV controller for analog out)


This error may be normal in an otherwise working console. They have been observed in th errorlogs of perfectly operational units and can occur naturally from AV issues.  
This error occurs when you see no video out using HDMI on any Samsung Smart TV.


This error has been observed with no video out using HDMI on a Samsung Smart TV. They reproduced the error by making the TV detect another console first (a PS4), turn off the TV, swap the HDMI cable from the PS4 to the PS3, and turning back on the TV.
You can intentionally produce this error code by making the TV detect another console first (a PS4), turn off the TV, swap the HDMI cable from the PS4 to the PS3, and turning back on the TV. You can fix this error code by replugging it in while the TV is on.


This error is also present when the console produces graphical artifacts on the screen. The console freezes and cannot be used, forcing the user to turn off the console. This produces the 2022 error code and is an early sign of GLOD.
This error is also present when the console produces graphical artifacts on the screen. The console freezes and cannot be used, forcing the user to turn off the console. This produces the 2022 error code and is an early sign of GLOD.
It is often seen coinciding with 1601/1701, 14FF, 1301, and 3034 in case of Bad GPU (Common). DVE or HDMI Transmitter possible. If so, multiple errors at the same timestamp allow you to distinguish between causes.
This error could also show when opening and closing a PS2 emulated game in a CFW console, both in Evilnat and Rebug. The errors would be in dyads. If this is the case there is no reason of concern.


==== 2024 (AV) ====
==== 2024 (AV) ====
This code is not diagnostic on its own. When coinciding with 1601/1701, 14FF, 1301, and 3034 it usually means a GPU issue. When coinciding with a 1002 it's usually NEC/TOKIN proadlizers. When they occur in bunches AND without more diagnostic codes, all in the same power on, it may be the MultiAV or HDMI Transmitter ICs. The presence of other codes give you context to their meaning.
This error tends to cause a delayed Yellow Light Of Death (10s - 1min). Sometimes described as a Green Light Of Death (GLOD) or Red Light Of Death (RLOD).  
This error tends to cause a delayed Yellow Light Of Death (10s - 1min). Sometimes described as a Green Light Of Death (GLOD) or Red Light Of Death (RLOD).  


2124 and 2024 errors occuring in random bunches registering several per power on attempt been fixed by replacing both the AV and HDMI encoders. One user reported 2024/2124 errors resolved by replacing the HDMI encoder. Another removed the HDMI encoder and tested the console without it. That console primarily filled the errorlog with 2124 errors, but a few 2024's as well. So it is unclear if 2124 is specific to the HDMI Encoder or AV Encoder. It seems it could be either.
2124 and 2024 errors have been fixed by replacing both the AV and HDMI encoders. One user reported 2024/2124 errors resolved by replacing the HDMI encoder. Another removed the HDMI encoder and tested the console without it. That console primarily filled the errorlog with 2124 errors, but a few 2024's as well. So it is unclear if 2124 is specific to the HDMI Encoder or AV Encoder. It seems it could be either.
 
A0A02024 Occurred in a KTE-001 with a failed Bluetooth/Wifi module step-up voltage converter. A0002024/A0002124/A0003001 occured when attempting to power without 12v connected. A0A02024 also recorded. When 12v was connected the same codes would occur at step no. 09 instead of 00.


==== 2030 (Thermal Sensor, CELL) ====
==== 2030 (Thermal Sensor, CELL) ====
Line 444: Line 408:
**[[CELL BE|CELL]]
**[[CELL BE|CELL]]
**[[CELL BE|CELL]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC1101 on [[COK-001]])
**[[CELL BE|CELL]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC1101 on [[COK-001]])
Thermal Monitor (IC1101) external sense line. Check C1103, R1006/7 & replace IC1101 before reballing CPU. If all else fails the CPU's thermal diode is dead. Was seen in a PS3 that was destroyed by a heatgun. Also had A0A02031/2033 & A0902031.


Speculation: 2030-33 errors reported in case of dodgy PWR/EJT daughter board.
Speculation: 2030-33 errors reported in case of dodgy PWR/EJT daughter board.
Line 453: Line 415:
**[[RSX]]
**[[RSX]]
**[[RSX]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC2101 on [[COK-001]])
**[[RSX]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC2101 on [[COK-001]])
GPU Thermal Monitor (IC1002) external sense line. Check C2103, R2101/2 & replace IC2101 before reballing/Replacing RSX. If all else fails the GPU's thermal diode is dead. Confirmed when the RSX is removed, you'll get 1b02/2031 at step number 20.  Was seen in a PS3 that was destroyed by a heatgun, which also had A0A02030/2033 & A0902031. Once reported to be caused by a checksum mismatch at address 3dfe.


==== 2033 (Thermal Sensor, South Bridge) ====
==== 2033 (Thermal Sensor, South Bridge) ====
Line 459: Line 420:
**[[South Bridge]]
**[[South Bridge]]
**[[South Bridge]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC3101 on [[COK-001]])
**[[South Bridge]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC3101 on [[COK-001]])
Typically a dead SB Thermal Monitor IC. Check nearby SMDs & traces. Was seen in a PS3 that was destroyed by a heatgun. Also had A0A02030/2031 & A0902031.


==== 2040 ====
==== 2040 ====
Found during sabotage testing on a KTE-001 Board that removing F6300 caused a A0012040 error, this fuse appears to be on the 12v line.
I found during sabotage testing on a KTE-001 Board that removing F6300 caused a A0012040 error, this fuse appears to be on the 12v line.
 
for super slim reflow or reball CPU


==== 2044 (Super Slim short circuit - BT/Wi-Fi and 5Volt) ====  
==== 2044 (Super Slim short circuit - BT/Wi-Fi and 5Volt) ====  
Line 470: Line 428:
==== 2101 (CELL) ====
==== 2101 (CELL) ====
[[CELL BE|CELL]] (IC1001)
[[CELL BE|CELL]] (IC1001)
Often coincides with A0403034 indicating the GPU needs replaced (usually). Deliding can cause this, look for trace damage. In one case, errors A0402101 / A0403034 occured because RSX TX1 was shorted to ground by a nicked RSX trace during the delid. TX is the transmit line, so the CPU didn't recieve data from it, and noted the error (BitTraining BE:RRAC:BX0:BX:FLEXIO_ID).


==== 2102 (RSX) ====
==== 2102 (RSX) ====
[[RSX]] (IC2001)
[[RSX]] (IC2001)
I had detected a short in the CELL, after removing one of the NEC/TOKINs, the error had changed.


In several reports IC6301 replacement fixed it. In one case, 10x 2120 / 1x 2102 combo was fixed by replacing RSX_VDDIO voltage controler (IC6317).  RSX_FBVDDQ (VRAM voltage) implicated. In most cases, it's an RSX Failure. Sometimes coinciding with A0403034 or other codes indicating GPU fail. Often after reflow attempt.  
After applying slight pressure to the CELL, I used some thermal pads to create the pressure. 5 small, 1 mm pads, and two larger 2mm pads, the same size and depth as the ones used on the southbridge chip) The console now boots and runs without any issues.


==== 2103 (South Bridge) ====
==== 2103 (South Bridge) ====
Line 496: Line 453:
==== 2111 (Clock CELL) ====
==== 2111 (Clock CELL) ====
Clock Generator Error (IC5003)
Clock Generator Error (IC5003)
Once reported in a console with a bad RSX Thermal monitor. Had mostly 2031 errors at various step numbers. The 2111 was a rare occurance. SYSCON reported it as an "Unrecoverable FATAL ERROR by thermal." Check C2103, R2101/2 & replace IC2101 before reballing/Replacing RSX. If all else fails the GPU's thermal diode is dead.


==== 2112 (Clock CELL) ====
==== 2112 (Clock CELL) ====
Clock Generator Error (IC5002)  
Clock Generator Error (IC5002)


==== 2113 (Clock CELL, RSX, South Bridge) ====
==== 2113 (Clock CELL, RSX, South Bridge) ====
Line 508: Line 463:


SW_1_B enables control Pin 5 on IC6013, which generates +2.5V_LREG_XCG_500_MEM. If that fails it generates A0092113.​
SW_1_B enables control Pin 5 on IC6013, which generates +2.5V_LREG_XCG_500_MEM. If that fails it generates A0092113.​
Reportedly fixed by replacing IC5001. One person tried replacing X5301, but short C5142 (2.5v to GND). This killed power to IC5004 (RSX/CELL/SB Clock Generator for FlexIO) and caused error A0092113. IC5004 relies on +1.2V_YC_RC_VDDIO refrence voltage to carry the signals. That can be affected by RSX/CPU faults. Another possability is F6302 or nearby SMDs, which supplys 1.7V_MISC to IC6303 to generate +1.2V_YC_RC_VDDIO, among other voltages required to start CPU/SB/GPU.​


'''2114 (Unknown)'''​
'''2114 (Unknown)'''​


Bad GPU NEC/TOKINs if assiciated with 1002 and/or 3004. Has been reported in VER-001 and DYN-001 motherboard revisions. Related codes have been reported in a DIA-002, with A0091002/A0102014, A0101002/A0102113. That was presumably caused by failed RSX tokins. ​
Fails Generate A0092114 and A0092014​
 
A lone 2114 or one assiciated with 2124, 3020, 1301, and/or 3034 may be GPU/BGA related. Possably HDMI encoder (MN864709), or Texas instrument 88J9LKK C5714 G4 Clock generator, but evidence for both of those cases is weak. However, given it's similarity to error code A0092113, which is related to the clock generators, a connection is suspected.​


==== 2120 (HDMI I/O Error) ====
==== 2120 (HDMI I/O Error) ====
Line 542: Line 493:
**[[CELL BE|CELL]]
**[[CELL BE|CELL]]
** [[CELL BE|CELL]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC1101 on [[COK-001]])
** [[CELL BE|CELL]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC1101 on [[COK-001]])
CPU Thermal Monitor (IC1101) external sense line. Check C1002, R1003/4 & replace IC1002 before reballing CPU. If all else fails the CPU's thermal diode is dead.


====2131 (Thermal Sensor, RSX)====
====2131 (Thermal Sensor, RSX)====
Line 548: Line 498:
** [[RSX]]
** [[RSX]]
**[[RSX]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC2101 on [[COK-001]])
**[[RSX]] [[Thermal#Temperature_Monitors|Temperature Monitor]] (IC2101 on [[COK-001]])
GPU Thermal Monitor (IC1002). Check C2103, R2101/2 & replace IC2101 before reballing/Replacing RSX. If all else fails the GPU's thermal diode is dead.


====2133 (Thermal Sensor, South Bridge)====
====2133 (Thermal Sensor, South Bridge)====
Line 559: Line 508:
From sabotage tests it was found that disabling +2.5V_SB_PLL_VDDC
From sabotage tests it was found that disabling +2.5V_SB_PLL_VDDC
produced four A0802203 errors.​ Also, disabling +1.2V_SB_VDDR produced A0302203 & A0403034.
produced four A0802203 errors.​ Also, disabling +1.2V_SB_VDDR produced A0302203 & A0403034.
Sometime seen with a "SB Counter Error -  Explicit Bug" in bringup log. Oftern accompanies CPU (1200) or GPU (1201) Overheats. Once occurred in GLOD, after holding power "SB (FATAL) XDR Link not initilized."


====2310====
====2310====
Line 583: Line 530:
====3003 ([[CELL BE|CELL]] Core Power Failure)====
====3003 ([[CELL BE|CELL]] Core Power Failure)====


This error will occur in the case of a PWR failure on the main core voltage of the CPU (VDDC). CPU Bulk filter caps (Eg. NEC/TOKIN) or any SMD in the Feedback and Compensation network of the Voltage Regulation module (VRM). Including the Buck Converters (AKA IOR Power Blocks).
This error will occur in the case of a PWR failure on the main core voltage of the CPU (VDDC). For example, if the filtering capacitors (NEC/TOKINs) are severely damaged. There are other SMDs in that filter, so it could be related to them as well.


A short Blu-Ray drive can cause this error as well. Be sure that your drive is going well before doing anything on your console.
A shorted Blu-Ray drive can cause this error as well. Be sure that your drive is going well before doing anything on your console.


====3004 ([[RSX]] Core Power Failure)====
====3004 ([[RSX]] Core Power Failure)====


This error will occur in the case of a PWR failure on the main core voltage of the GPU (VDDC). Bulk filter caps (Eg. NEC/TOKIN) or any SMD in the Feedback and Compensation network of the Voltage Regulation module (VRM). Including the Buck Converters (AKA IOR Power Blocks).
This error will occur in the case of a PWR failure on the main core voltage of the GPU (VDDC). For example, if the filtering capacitors (NEC/TOKINs) are severely damaged. There are other SMDs in that filter, so it could be related to them as well.


====3005====
====3005====
Line 606: Line 553:


This problem may be related to the PLL signal generator circuit, open resistors, crystal oscillator or even the integrated itself (CDC735/CDC736/4227ANLG)
This problem may be related to the PLL signal generator circuit, open resistors, crystal oscillator or even the integrated itself (CDC735/CDC736/4227ANLG)
RSX FBVDDQ shorts, BE thermal/PLL VDDA open line, PWM signal disruption to CPU Buck Converters at startup have all been known to cause A0203010 errors. Seen in consoles that also had or developed 3034/4412.


====3011====
====3011====
Line 632: Line 577:
====3020====
====3020====
[[CELL BE|CELL]]
[[CELL BE|CELL]]
A0233020 occurred during the readiness check after VDDC is formed. It suggests a voltage instability or error preventing the CPU from reporting power good back to the SYSCON. Has occurred in a console where every chip was heatgunned. Associated errors in the log were, A0A02031/A0201802 (RSX thermal monitor and interrupt). Before the heatgut it had, A0801301/A0802120 (BE_PLL & VDDIO error). In another console it coincided with A0231002 (RSX VDDC filtering). That console had A0003001, A0002120/A0221002, A0221002/A0222120, A0231002/A0233020. Indicating a more serious issue with the PSU, Fuses or Core voltage ICs.


====3030====
====3030====
[[CELL BE|CELL]]
[[CELL BE|CELL]]
Reportedly, a CPU BGA defect caused by delid. No trace damage or knocked SMD's observed.


====3031====
====3031====
[[CELL BE|CELL]] XGC REF Voltage Error
[[CELL BE|CELL]]
 
Error during CPU initialization. This error appears to be a CPU BGA defect. In one PS3, it was caused by an "eraser mod," which puts pressure underneath the CPU (bad idea). In another, after delidding GPU/CPU an A0313032 was reported by knocking R5167 off. Which is +1.2V_YC_RC_VDDIO refrence voltage for the CPU's Redwood FlexIO ADC differential reference clock pair (BE_RC_REFCLK_P). An open line fault. He replaced the resistor and got A0402101 / A0403034 because RSX TX1 was shorted to ground by a nicked RSX trace during the delid. TX is the transmit line, so the CPU didn't recieve data from it, and noted the error (BitTraining BE:RRAC:BX0:BX:FLEXIO_ID). He messed with the nick and the error changed to A0313031.


====3032====
====3032====
[[CELL BE|CELL]] BE XGC REF Voltage Error
[[CELL BE|CELL]] Error
 
Error during CPU initialization. This error appears to be a CPU BGA defect. In one PS3, A0313031 was caused by an "eraser mod," which puts pressure underneath the CPU (bad idea). In another, after delidding GPU/CPU an A0313032 was reported by knocking R5167 off. Which is +1.2V_YC_RC_VDDIO refrence voltage for the CPU's Redwood FlexIO ADC differential reference clock pair (BE_RC_REFCLK_P). An open line fault. He replaced the resistor and got A0402101 / A0403034 because RSX TX1 was short to ground by a nicked RSX trace during the delid. TX is the transmit line, so the CPU didn't recieve data from it, and noted the error (BitTraining BE:RRAC:BX0:BX:FLEXIO_ID). He messed with the nick and the error changed to A0313031.


It was discovered through sabotage testing that disabling +1.5V_YC_RC_VDDA caused error A0313032
It was discovered through sabotage testing that disabling +1.5V_YC_RC_VDDA caused error A0313032
Line 655: Line 592:
[[CELL BE|CELL]]
[[CELL BE|CELL]]


This error has been triggered when pad N12 (RSXVRM_VID0) was damaged, preventing RSX VDDC voltage from being set correctly. SYSCON sets the CPU VID just before the Config ring data is loaded. Apparently, SYSCON sets RSX VID on IC6201 (Buck Controller) at step number 32, which is just after. These voltages must be stable before the FlexIO can calibrated (BitTraining at Step No. 40 & ByteTraining at 50 & 51).
====3034====
[[CELL BE|CELL]] / [[RSX]] Communication Error


====3034 ====
This error occurs when Bit Training fails. Bit Training, also know as bit calibration, is a critical process during the power-on-reset (POR) sequence of the CELL BE processor. It fine-tunes the behavior of individual bits within the 8-bit-wide Rambus channels. This adjustment accounts for variations in circuitry, wiring, and loading delays. Bit training plays a pivotal role in optimizing signal quality by calibrating the signal driver current, driver impedance, and ensuring that the timing of each of the eight data bits aligns with clock edges, effectively centering the data "eye" allowing for more accurate and reliable data transmission. '''Remeber ITS NOT ALWAYS bad connection between the CPU and GPU.''' A0403034 was seen PHAT consoles (VER-001) with a bad south bridge or cold joint underneath it. By putting pressure on the southbridge the console would boot fine. Always look at the data errors before trying to replace the GPU. 3034 means bad connection between CELL and the other components connected directly to it. '''IT DOENST MEAN BAD RSX'''. Look at the data error and other information of the console before trying to re place the RSX. Not every 90nm will to fail.
[[CELL BE|CELL]] / [[RSX]] / [[South Bridge]] error during Bit-Training


This error occurs when Bit Training fails. Bit Training, also know as bit calibration, is a critical process during the power-on-reset (POR) sequence of the CELL BE processor. It fine-tunes the behavior of individual bits within the 8-bit-wide Rambus channels. This adjustment accounts for variations in circuitry, wiring, and loading delays. Bit training plays a pivotal role in optimizing signal quality by calibrating the signal driver current, driver impedance, and ensuring that the timing of each of the eight data bits aligns with clock edges, effectively centering the data "eye" allowing for more accurate and reliable data transmission.
This is the most common error seen in early Phat model PS3's with the 90nm [[RSX]]. It is the hallmark of solder fatigue (such as a cracked solder ball or bump defect) which affects the Flex IO interface that allows the CPU, GPU, and SB to communicate. It is by no means limited to the early models, however. These errors have been seen in every model of PS3 with varying frequency. However, it's most common in the earliest models, likely due to a manufacturing defect in the 90nm RSX material set. Namely a CTE mismatch between underill and bump material that leads to premature solder fatigue and GPU failure. Dubbed "BumpGate," this is a well known failure modality among GPUs manufactured from 2005-2008. Although it has not been proven unequivocally that the 90nm RSX is affected by Bumpgate, members of the community have shown the 90nm RSX has an increased failure rate, similar material set, and exhibits similar symptoms to known bumpgate affected chipsets - such as black screens (GLOD), graphical artifacts like lines, double images, color splotches and pixelation, and etc.  
 
'''Remember ITS NOT ALWAYS bad connection between the CPU and GPU.''' Bit training calibrates the connection between the GPU, CPU AND South bridge. For example, A0403034 occurred on a VER-001 with a probably BGA defect. By putting pressure on the southbridge the console would boot. Look at the data error and other information of the console before assuming a bad GPU.
 
This is the most common error seen in early Phat model PS3's with the 90nm [[RSX]]. It is the hallmark of solder fatigue (such as a cracked solder ball or bump defect) which affects the Flex IO interface that allows the CPU, GPU, and SB to communicate. It is by no means limited to the early models, however. These errors have been seen in every model of PS3 with varying frequency. However, it's most common in the earliest models, likely due to a manufacturing defect in the 90nm RSX material set. Namely a CTE mismatch between underill and bump material that leads to premature solder fatigue and GPU failure. Dubbed "BumpGate," this is a well known failure modality among GPUs manufactured from 2005-2008. Although it has not been proven unequivocally that the 90nm RSX is affected by Bumpgate, members of the community have shown the 90nm RSX has an increased failure rate, similar material set, and exhibits similar symptoms to known bumpgate affected chipsets - such as black screens (GLOD), graphical artifacts like lines, double images, color splotches and pixelation.  


While Bumpgate is a plausible explanation, it's not the only one. The materials used to construct the motherboard and processors have different coefficient of thermal expansion (CTE). This means they will expand and contract at different rates as the chip heats up and cools down, which applies force to solder connections. Over many thermal cycle this deforms the solder and causes a defect. That may affect the Bumps, which attach the silicon die to the interposer (sometimes referred to as substrate) or the Ball-Grid Array (BGA) which connects the interposer to the Motherboard.  
While Bumpgate is a plausible explanation, it's not the only one. The materials used to construct the motherboard and processors have different coefficient of thermal expansion (CTE). This means they will expand and contract at different rates as the chip heats up and cools down, which applies force to solder connections. Over many thermal cycle this deforms the solder and causes a defect. That may affect the Bumps, which attach the silicon die to the interposer (sometimes referred to as substrate) or the Ball-Grid Array (BGA) which connects the interposer to the Motherboard.  
Line 670: Line 603:
3034 is triggered when Bit calibration, also known as BitTraining, cannot complete correctly. So it is not limited to a singular cause. BGA defects from thermal cycling, drop damage, pulling force from separating the heat sink from the processors while disassembling, or delidding can occur. The bumps on CPU, GPU, or SB can fail, Flex IO traces that connect them can be broken/scratched, or accumulated damage from wear and tear (electromigration) can also cause BitTraining to fail. Anything that can disrupt the impedance of the FlexIO can cause BitTraining to fail. A skilled technician will need to use deductive reasoning to diagnose the cause and choose the appropriate repair.
3034 is triggered when Bit calibration, also known as BitTraining, cannot complete correctly. So it is not limited to a singular cause. BGA defects from thermal cycling, drop damage, pulling force from separating the heat sink from the processors while disassembling, or delidding can occur. The bumps on CPU, GPU, or SB can fail, Flex IO traces that connect them can be broken/scratched, or accumulated damage from wear and tear (electromigration) can also cause BitTraining to fail. Anything that can disrupt the impedance of the FlexIO can cause BitTraining to fail. A skilled technician will need to use deductive reasoning to diagnose the cause and choose the appropriate repair.


A qualitative test known as a "pressure test" may be used to help make a diagnosis. Applying slight pressure, within reason (not your body weight or clamping force which could cause a BGA defect), to the processor flexes the motherboard beneath the BGA and "may" temporarily reconnect a solder ball with it's pad. Like holding 2 wires together. This can cause flickering on screen, a console to power on when it couldn't before, etc. If the console or error responds differently when pressure is applied, this may be taken as  evidence of a BGA defect. It is not definitive, but tips the odds in favor of that diagnosis. A reball in that case may be successful. However, if it does not respond to pressure is not likely to be the BGA and another explanation, such as bumps are more likely. It should be noted that bumps can be affected by force as well, but because the underfill supports them, it generally requires more force to reconnect them using this method. This is what the "Bolt mod," commonly performed on the XBOX 360 did. That much force permanently deforms the motherboard and causes irreparable damage. DO NOT DO THIS! But it illustrates the point. You don't need much force to see if the BGA is affected and if it responds to light pressure, it's unlikely to be the bumps. Therefore, taken together with other clues, it can be helpful to a skilled technician gathering evidence for a diagnosis.  
A qualitative test known as a "pressure test" may be used to help make a diagnosis. Applying slight pressure, within reason (not your body weight or clamping force which could cause a BGA defect), to the processor flexes the motherboard beneath the BGA and "may" temporarily reconnect a solder ball with it's pad. Like holding 2 wires together. This can cause flickering on screen, a console to power on when it couldn't before, etc. If the console or error responds differently when pressure is applied, this may be taken as  evidence of a BGA defect. It is not definitive, but tips the odds in favor of that diagnosis. A reball in that case may be sucessful. However, if it does not respond to pressure is not likely to be the BGA and another explanation, such as bumps are more likely. It should be noted that bumps can be affected by force as well, but because the underfill supports them, it generally requires more force to reconnect them using this method. This is what the "Bolt mod," commonly performed on the XBOX 360 did. That much force permanently deforms the motherboard and causes irreparable damage. DO NOT DO THIS! But it illustrates the point. You don't need much force to see if the BGA is affected and if it responds to light pressure, it's unlikely to be the bumps. Therefore, taken together with other clues, it can be helpful to a skilled technician gathering evidence for a diagnosis.  


In consoles with a 90nm [[RSX]] (CECH-Axx/Bxx/Cxx/Exx/Gxx/Hxx, M03 and Q00 models) the most likely cause of a 3034 is the GPU itself. It can be replaced with another 90nm RSX without modification. However, it can also be replaced with a more reliable 65nm or 40nm model, using a process nicknamed a "Frankenstein Mod." SONY service technicians performed this modification in some officially refurbished consoles. The PS3 community has developed a method as well. Since there is a question about the 90nm RSX's reliability and both a reball and Frankenstein mod require the 90nm to be desoldered, it is advisable to replace the 90nm GPU with a more reliable model instead of risking another 90nm GPU. Rework is hard on the motherboard and surrounding components, so choosing a repair with the fewest uncertainty's is wise.
In consoles with a 90nm [[RSX]] (CECH-Axx/Bxx/Cxx/Exx/Gxx/Hxx, and M03 models) the most likely cause of a 3034 is the GPU itself. It can be replaced with another 90nm RSX without modification. However, it can also be replaced with a more reliable 65nm or 40nm model, using a process nicknamed a "Frankenstein Mod." SONY service technicians performed this modification in some officially refurbished consoles. The PS3 community has developed a method as well. Since there is a question about the 90nm RSX's reliability and both a reball and Frankenstein mod require the 90nm to be desoldered, it is advisable to replace the 90nm GPU with a more reliable model instead of risking another 90nm GPU. Rework is hard on the motherboard and surrounding components, so choosing a repair with the fewest uncertainty's is wise.


In models without the 90nm RSX, 3034 is still possible, but far less likely to be caused by the GPU. CPU BGA defects are common in dropped consoles, those that have been delidded or have trace damage to the area around the processors. So troubleshooting is necessary to make a diagnosis.
In models without the 90nm RSX, 3034 is still possible, but far less likely to be caused by the GPU. CPU BGA defects are common in dropped consoles, those that have been delidded or have trace damage to the area around the processors. So troubleshooting is necessary to make a diagnosis.


====3035====
====3035====
[[CELL BE|CELL]] and [[RSX]] error during Byte-Training
[[CELL BE|CELL]] and [[RSX]]
 
Failing GPU. RSX BGA or Bump Defect. Gradual decline in the solder connection affected Byte Calibration, but it managed to pass bit calibration 1st. A0403034 is soon to follow.  As electromigration wears down RSX Core, A0801601/A0801701 become A0501802/A0503037, A0503035, and finally A0403034.


====3036====
====3036====
Line 686: Line 617:
==== 3037====
==== 3037====
[[CELL BE|CELL]] and [[RSX]]
[[CELL BE|CELL]] and [[RSX]]
RSX BGA or Bump Defect have cause A0503037/1802. A gradual decline in the solder connection affected Byte Calibration, but it managed to pass bit calibration 1st. A0403034 is soon to follow.


====3038====
====3038====
Line 694: Line 623:
====3039====
====3039====
[[CELL BE|CELL]] and [[RSX]]
[[CELL BE|CELL]] and [[RSX]]
Occurred in a CECHL04 coinciding with a check stop error (14FF) during IO initialization at step# 52, which is after Byte-Training, but before the flash firmware sequence at step# 60. So maybe it's starship 2 related? Or it could be CPU/GPU related. Unknown.


====3040====
====3040====
Flash
Flash


A0603040 is known to be caused by not soldering the flash (NAND/NOR) back on properly. It happens when the flash is not powered. Step #60 is when the StarShip 2 flash controller and NAND/NOR are initialized, kicking off the firmware sequence that loads the Operating System. Check their voltages and be sure the FW is not corrupt. If you have a backup, you could try replacing the Flash to see if a module failed.
A0603040 is know to be caused by not soldering the flash (NAND/NOR) back on properly. It happens when the flash is not powered. Step #60 is when the StarShip 2 flash controller and NAND/NOR are initialized, kicking off the firmware sequence that loads the Operating System. Check their voltages and be sure the FW is not corrupt. If you have a backup, you could try replacing the Flash to see if a module failed.


====3041====
====3041====
A0523041 only reported once. Step #s 50-60 are when Southbridge paripherals are initialized. Step 52 is the last step before 60, when the flash and controller (SS2) are initialized. A0603040 will occur. Speculation: Perhaps 3041 is related to the SS2 or another SB paripheral. Perhaps a flash solder connection, or corruption issue. We don't know. Too few reports.
Flash (eMMC)


===Data Errors===
===Data Errors===
Line 816: Line 743:
====5FFF====
====5FFF====
[[CELL BE|CELL]] or [[RSX]]
[[CELL BE|CELL]] or [[RSX]]
In recent times, this error has been known for the CPU (CELL), but it is actually due to an error in the NOR of the Playstation 3 SLIM/SUPER SLIM. Due to a failure when performing the exploit, you can end up having a console Bricked, for this use E3 FLASHER, Tennsy.etc
For 3XXX, 4XXX consoles, the BRICK WITH Tennsy can be solved. in 4XXX keep in mind that the NOR can be emmc (12GB) therefore it will not be possible to solve it (for now...)
for Super Slim reflow or reball RSX


{{Hardware Modification}}<noinclude>
{{Hardware Modification}}<noinclude>
[[Category:Main]]
[[Category:Main]]
</noinclude>
</noinclude>
Please note that all contributions to PS3 Developer wiki are considered to be released under the GNU Free Documentation License 1.2 (see PS3 Developer wiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following hCaptcha:

Cancel Editing help (opens in new window)