Syscon Error Codes: Difference between revisions
m (→Fatal Errors: The same rule seems to apply to other errors... by doing this we could reduce the list a lot) |
m (→Fatal Errors) |
||
Line 135: | Line 135: | ||
=== Fatal Errors === | === Fatal Errors === | ||
---- | ---- | ||
*This error codes seems to be repeated 3 times for 3 special cases, as example, errors 2'''0'''03, 2'''1'''03, 2'''2'''03 are related to southbridge, the only thing that changes in the error code is the second digit (located inmediatly after | *This error codes seems to be repeated 3 times for 3 special cases, as example, errors 2'''0'''03, 2'''1'''03, 2'''2'''03 are related to southbridge, the only thing that changes in the error code is the second digit (located inmediatly after the category). If at some point we find what means that digit we can join the wiki page sections together (with titles: "2001 & 2101", "2002 & 2102", "2003 & 2103", etc...) | ||
==== 2001 ==== | ==== 2001 ==== |
Revision as of 15:46, 28 November 2021
Description
Syscon memory contains a table of size 0x100 bytes intended to store error codes, every error code is composed by 4 bytes + another 4 bytes for its timestamp, in total the table can store 32 errors. When the table is full of errors and a new error needs to be stored syscon deletes the oldest
The timestamps are in UTC format (number of elapsed seconds since 2000)
How to get the syscon error log
If the PS3 still boots up to the XMB and is able to install and run apps you can use programs like the ones mentioned at top of Platform ID page
If the PS3 doesnt boots is still posible to retrieve the syscon error log by connecting a PC to syscon UART port using a "USB to TTL UART adapter" and running the command errlog. There is also the command clearerrlog to empty the error table (handy to prevent confusions with old error codes that could be cummulated along the months/years and not related with the actual problem)
Error code format
The error codes follows the format: A R ST C ERR , where:
- A (Fixed)
- A = This is always "A"
- R (Reserved)
- 0-E = Unknown
- F = Frequent error (For example, Motherboard Damage/Breakdown, etc.)
- ST (Step Number)
- 00-7F = Step Number of the Power On Sequence (POS). This is the Power On Self Test (POST) process. If successful, the BOOT process begins, which loads the OS.
- 80 = Static State (Power ON). The console completed the POST and was in a static state. The error happened when the PS3 was powered on. You can get an error with Step No. 80 if your error occurs in game. For example, 80 1002 errors can happen if your NEC/TOKINs are going bad.
- 90 = Static State (Power OFF). The error happened when the PS3 was powering off. For example, if a problem causes the system to hang while shutting down the console will beep before powering off. An error with step no. 90 will be recorded in the errorlog.
- A0 = Immediately after SYSCON reset. A reset pulse is sent to the console's main chipset to coordinate and synchronize them. If an error occurs immediately after SYSCON reset, it means it occurred before anything else can happen. For example, if the CPU is completely dead it will not respond to the reset pulse and an error will be generated immediately after reset.
- C (Category)
- 1 = System Error
- 2 = Fatal Error
- 3 = Boot Error
- 4 = Data Error
- ERR (Error)
- Any number in hex
Examples:
A0801002
- System Error 002 (RSX VRAM Power Fail) which occurred while the System was successfully powered On.
- 1002 errors are known to be caused by bad NEC/TOKINs, but may not be the only cause. See Error Code section below for more details.
A0403034
- Fatal Booting Error 034 (RSX/CELL Communication Error) which occurred at step no. 40, before the Power On Sequence completed.
- 3034 errors are known to be caused by BGA Defects (among other issues). See Error Code section below for more details.
While the Reserved Area and Step Number can be useful to figure out when the error occurred and how frequent it is, the last four numbers are the most important for figuring out what the error means. So the following Error Code section will only list the last 4 numbers (category + error).
Error codes
System Errors
1001 (Power CELL)
- Components Involved:
Speculation:
1001 errors happen when the system encounters an unexpected shutdown. They often occur in testing, when the console is turned on/off a lot, instead of graceful shutdown. They have been associated with other errors, but there doesn't appear to be any single cause.
The hypothesis that this error is associated with insufficient Filtering on CPU's core voltage (VDDC) has not been confirmed. There is a range of voltage ripple/noise that "should" cause errors before it gets so bad it causes a CELL VDDC Power Failure (3003). There are numerous SMD components involved in filtering, but the main concern are the NEC/TOKIN Proadlizers (capacitors). 1002 errors are the fingerprint of bad tokins on the GPU, but 1001 has not been shown to have the same association with the CPU's filter. However, a connection is strongly suspected.
1002 (Power RSX)
- Components Involved:
This error has been associated with insufficient Filtering on RSX_VDDC power line. There is a range of voltage ripple/noise that will cause this error before it gets so bad it causes an RSX_VDDC Power Failure (3004). YLOD's causing 1002's range in duration from 2 seconds to only occurring during intense games.
There are numerous SMD components involved in filtering, but the main concern are the NEC/TOKIN Proadlizers (capacitors). 1002 errors are the fingerprint of bad tokins.
1004 (Power AC/DC)
- Components Involved:
1103 (Thermal)
- Components Involved:
- See: Thermal
1200 (Thermal CELL)
- Components Involved:
CPU Overheat. This is a common error. The usual culprit is failed Thermal Interface Material (TIM). As the material ages it "dries" allowing air inside. Air is a heat insulator, reducing the TIM's ability to transfer enough heat away from the processor. The system fan will steadily get louder over time until it cannot keep up. Once the processor approaches it's Thermal Shutdown Temperature a Yellow LED begins flashing on the console (Early Phat Models). Once it reaches the Thermal Shutdown Temperature the console will beep three times and hard shutdown, flashing red until the console is unplugged and the error state reset. Error 1200 is generated in the SYSCON errorlog.
First be sure the system fan is working. If so, apply new TIM Between the Internal Heat Spreader (IHS) and Heatsink (HS). If that does not resolve the problem, carefully remove the IHS (Delid) and replace the TIM between the IHS and processor DIE.
If that still doesn't work, it could be an issue with the temperature monitor chip (IC1101). Beyond that, some users have noted that dead CPU's can throw error 1200. However, that's the limit of our current understanding. It could be dead, or have another unexplained issue, but usually reflowing or reballing is the last ditch effort to revive such a console.
1201 (Thermal RSX)
- Components Involved:
GPU Overheat. This is the same as error 1200 above, except it's for the GPU. The same repair steps apply, except it's Temperature Monitor Chip is IC2101.
1203 (Thermal CELL VR)
- Components Involved:
- CELL voltage regulators
Some non-retail PS3 models (cytology series with a case in rack form factor), have a thermal monitor located somewhere in the CELL power block (codenamed voltage regulator), this component doesnt exists in retail PS3 models, so is imposible to see this error code in retail PS3 models
1204 (Thermal South Bridge)
- Components Involved:
1205 (Thermal EE/GS)
- Components Involved:
- CXD2953AGB or CXD2972GB
- See also: Emotion Engine / Graphics Synthesizer
This error is specific for COK-001/CXD2953AGB (with full PS2 hardware compatibility, EE+GS) or COK-002/CXD2972GB (with partial PS2 hardware compatibility, GS only)
1301
CELL PLL
14FF
Check stop
1601
BE Livelock Detection
Speculation: If a YLOD turns into a GLOD after reball/reflow then 1601 (with or without 1701) could mean the RSX RAM was damaged. This is a loose association based on a few user reports.
1701
CELL attention
1802
RSX init
1900
RTC voltage
1901
RTC oscilator
1902
RTC access
Fatal Errors
- This error codes seems to be repeated 3 times for 3 special cases, as example, errors 2003, 2103, 2203 are related to southbridge, the only thing that changes in the error code is the second digit (located inmediatly after the category). If at some point we find what means that digit we can join the wiki page sections together (with titles: "2001 & 2101", "2002 & 2102", "2003 & 2103", etc...)
2001
CELL (IC1001)
2002
RSX (IC2001)
2003
Southbridge Error (IC3001)
2010
Clock Generator Error (IC5001)
2011
Clock Generator Error (IC5003)
2012
Clock Generator Error (IC5002)
2013
Clock Generator Error (IC5004)
2020 (HDMI)
HDMI Error (IC2502)
2022 (DVE)
DVE Error (IC2406, CXM4024R MultiAV controller for analog out)
2024 (AV)
This error tends to cause a delayed Yellow Light Of Death (10s - 1min). Sometimes described as a Green Light Of Death (GLOD) or Red Light Of Death (RLOD).
2124 and 2024 errors have been fixed by replacing both the AV and HDMI encoders. One user reported 2024/2124 errors resolved by replacing the HDMI encoder. Another removed the HDMI encoder and tested the console without it. That console primarily filled the errorlog with 2124 errors, but a few 2024's as well. So it is unclear if 2124 is specific to the HDMI Encoder or AV Encoder. It seems it could be either.
2030 (Thermal Sensor, CELL)
- Components Involved:
- CELL
- CELL Temperature Monitor (IC1101 on COK-001)
Speculation: 2030-33 errors reported in case of dodgy PWR/EJT daughter board.
2031 (Thermal Sensor, RSX)
- Components Involved:
- RSX
- RSX Temperature Monitor (IC2101 on COK-001)
2033 (Thermal Sensor, South Bridge)
- Components Involved:
- South Bridge
- South Bridge Temperature Monitor (IC3101 on COK-001)
2101
CELL (IC1001)
2102
RSX (IC2001)
2103
Southbridge Error (IC3001)
2110
Clock Generator Error (IC5001)
2111
Clock Generator Error (IC5003)
2112
Clock Generator Error (IC5002)
2113
Clock Generator Error (IC5004)
2120 (HDMI)
HDMI Error (IC2502)
2122 (DVE)
DVE Error (IC2406, CXM4024R MultiAV controller for analog out)
2124 (AV)
This error tends to cause a delayed Yellow Light Of Death (10s - 1min). Sometimes described as a Green Light Of Death (GLOD) or Red Light Of Death (RLOD).
2124 and 2024 errors have been fixed by replacing both the AV and HDMI encoders. One user reported 2024/2124 errors resolved by replacing the HDMI encoder. Another removed the HDMI encoder and tested the console without it. That console primarily filled the errorlog with 2124 errors, but a few 2024's as well. So it is unclear if 2124 is specific to the HDMI Encoder or AV Encoder. It seems it could be either.
2130 (Thermal Sensor, CELL)
- Components Involved:
- CELL
- CELL Temperature Monitor (IC1101 on COK-001)
2131 (Thermal Sensor, RSX)
- Components Involved:
- RSX
- RSX Temperature Monitor (IC2101 on COK-001)
2133 (Thermal Sensor, South Bridge)
- Components Involved:
- South Bridge
- South Bridge Temperature Monitor (IC3101 on COK-001)
2203
Southbridge Error (IC3001)
Fatal Boot Errors
3000
Power Failure
3001
12v Power Failure
Usually this caused by a bad Power Supply Unit (PSU).
Alternatively, a failure on the 12v_main line can cause it. Check fuses, capacitors, resistors, and IC's on the 12v line. Measure resistance of the large 2 prong 12v connector on the motherboard. It should read in the Kilo ohms range if there is sufficient separation. Otherwise you may have a short somewhere on the line.
3002
Power Failure
3003
VDDC CELL Power Failure
This error will occur in the case of a PWR failure on the main core voltage of the CPU. For example, if the filtering capacitors (NEC/TOKIN's) are severely damaged. There are other SMD's in that filter, so it could be related to them as well.
3004
VDDC RSX Power Failure
This error will occur in the case of a PWR failure on the main core voltage of the GPU. For example, if the filtering capacitors (NEC/TOKIN's) are severely damaged. There are other SMD's in that filter, so it could be related to them as well.
3010
CELL Error
Observation: A user triggered this error by injecting 3.3V into PWRGD (power good) of IC6103 (NCP5318 CPU Buck Controller). It generated error 20 1001 and 20 3010.
3011
3012
3013
BE_SPI DI/DO ERROR
CELL not communicating to syscon via SPI (1.2V MC2_VDDIO and 1.2V BE_VCS no output) = Possible shorts on the line, check C4001 and trailing caps. Possible dead CPU?
Another user had one on a CPU he damaged while deliding.
3020
3030
3031
3032
CELL Error
+1.2v_YC_RC_VDDIO PWR Fail?
3033
3034
CELL / RSX Communication Error
This is the most common error seen in early Phat model PS3's with the hottest 90nm RSX and CELL processors. It is the hallmark of a BGA defect (such as a cracked solder ball). It is by no means limited to the early models, however. These arrors have been seen in every model of PS3 with varying frequency. The most reliable consoles appear to be those with a CPU/GPU of smaller manufacturing process, such as the Super Slim (SS) models (42xx and later) which have a 45nm CELL and 28nm RSX. The least reliable are the PS2 Backwards Compatable A-E Models, which have 90nm RSX/CELL.
The root cause is mechanical fatigue due to thermal cycling. The materials used to contruct the motherboard and processors have different properties. For example, the cooefficient of thermal expansion for FR4 Fiberglass used in the Motherboard and Processor Substrate is different than that of the copper BGA pads, which is different than that of the Lead-Free solder used to join them. This means they will expand and contract at different rates as the chip heats up and cools down, which applies shearing force to the BGA. Over many thermal cycle this deforms the solder balls and cause a defect (Such as a solder crack, torn trace, or the ball may pull away from the pad).
3034 is triggered when the voltage or data lines connecting the CPU/GPU are broken. There is often a data error (4XXX) that also appears, but not always. The most common cause is a BGA defect on the RSX, which usually requires a reball/reflow to repair. Something about the RSX construction or workload causes it to fail more frequently, but the CPU can fail too. However, it's not always a BGA defect. The bumps on either chip can fail, Flex IO traces (the data lines that connect the CPU/GPU) can be broken/scratched, or accumulated damage from wear and tear (electromigration) can also cause this error. The true percentage of consoles with BGA defects that can be fixed with a reball/reflow is unknown. However, there is evidence to suggest that the underfill used to reinforce the CPU/GPU die and RSX Ram bumps was not as effective when the PS3 was manufactured. This could explain many of the consoles who's reball fails prematurely afterwards.
If a reflow/reball of both the CPU/GPU fails, then the chip is beyond repair and needs replaced. The RSX can be replaced with the same model without modification. It can be replaced with a different model using a modchip that injects the correct RSX ID during boot. This has been nicknamed a "Frankenstein Mod." Since they are married to each other, the CPU can only be replaced if also replacing the chipset (NAND/NOR and SYSCON Chips). Since the CPU can't as easily be replaced, a dead CPU is usually considered unrepairable.
3035
3036
3037
3038
3039
3040
Flash
Data Errors
4001
4002
4003
Southbridge
4011
4101
4102
4103
Southbridge
4111
4201
4202
4203
Southbridge
4211
4212
4221
4222
4231
4261
4301
4302
4303
Southbridge
4311
4312
4321
4322
4332
4341
4401
4402
4403
4411
4412
4421
4422
4432
4441
|