Syscon Error Codes

From PS3 Developer wiki
Revision as of 13:30, 28 November 2021 by Sandungas (talk | contribs) (Wikilinks)
Jump to navigation Jump to search

Description

Syscon memory contains a table of size 0x100 bytes intended to store error codes, every error code is composed by 4 bytes + another 4 bytes for its timestamp, in total the table can store 32 errors. When the table is full of errors and a new error needs to be stored syscon deletes the oldest
The timestamps are in UTC format (number of elapsed seconds since 2000)

How to get the syscon error log

If the PS3 still boots up to the XMB and is able to install and run apps you can use programs like the ones mentioned at top of Platform ID page
If the PS3 doesnt boots is still posible to retrieve the syscon error log by connecting a PC to syscon UART port using a "USB to TTL UART adapter" and running the command errlog. There is also the command clearerrlog to empty the error table (handy to prevent confusions with old error codes that could be cummulated along the months/years and not related with the actual problem)

Error code format

The error codes follows the format: ARSSCEEE, where:

  • A (Fixed)
    • A = This is always "A"
  • R (Reserved)
    • 0-E = Unknown
    • F = Frequent error (For example, Motherboard Damage/Breakdown, etc.)
  • SS (Step Number)
    • 00-7F = Step Number of the Power On Sequence (POS). This is the Power On Self Test (POST) process. If successful, the BOOT process begins, which loads the OS.
    • 80 = Static State (Power ON). The console completed the POST and was in a static state. The error happened when the PS3 was powered on. You can get an error with Step No. 80 if your error occurs in game. For example, 80 1002 errors can happen if your NEC/TOKINs are going bad.
    • 90 = Static State (Power OFF). The error happened when the PS3 was powering off. For example, if a problem causes the system to hang while shutting down the console will beep before powering off. An error with step no. 90 will be recorded in the errorlog.
    • A0 = Immediately after SYSCON reset. A reset pulse is sent to the console's main chipset to coordinate and synchronize them. If an error occurs immediately after SYSCON reset, it means it occurred before anything else can happen. For example, if the CPU is completely dead it will not respond to the reset pulse and an error will be generated immediately after reset.
  • C (Category)
    • 1 = System Error
    • 2 = Fatal Error
    • 3 = Boot Error
    • 4 = Data Error
  • EEE (Error)
    • Any number in hex

Examples:

A0801002

  • System Error 002 (RSX VRAM Power Fail) which occurred while the System was successfully powered On.
  • 1002 errors are known to be caused by bad NEC/TOKINs, but may not be the only cause. See Error Code section below for more details.

A0403034

  • Fatal Booting Error 034 (RSX/CELL Communication Error) which occurred at step no. 40, before the Power On Sequence completed.
  • 3034 errors are known to be caused by BGA Defects (among other issues). See Error Code section below for more details.

While the Reserved Area and Step Number can be useful to figure out when the error occurred and how frequent it is, the last four numbers are the most important for figuring out what the error means. So the following Error Code section will only list the last 4 numbers (category + error).

Error codes

System Errors


1001

CELL Vram Power

Speculation:
1001 errors happen when the system encounters an unexpected shutdown. They often occur in testing, when the console is turned on/off a lot, instead of graceful shutdown. They have been associated with other errors, but there doesn't appear to be any single cause.

The hypothesis that this error is associated with insufficient Filtering on CPU's core voltage (VDDC) has not been confirmed. There is a range of voltage ripple/noise that "should" cause errors before it gets so bad it causes a CELL VDDC Power Failure (3003). There are numerous SMD components involved in filtering, but the main concern are the NEC/TOKIN Proadlizers (capacitors). 1002 errors are the fingerprint of bad tokins on the GPU, but 1001 has not been shown to have the same association with the CPU's filter. However, a connection is strongly suspected.

1002

RSX Vram Power

This error has been associated with insufficient Filtering on RSX_VDDC. There is a range of voltage ripple/noise that will cause this error before it gets so bad it causes an RSX_VDDC Power Failure (3004). YLOD's causing 1002's range in duration from 2 seconds to only occurring during intense games.

There are numerous SMD components involved in filtering, but the main concern are the NEC/TOKIN Proadlizers (capacitors). 1002 errors are the fingerprint of bad tokins.

1004

PSU Power

1103

Thermal

1200

CELL Thermal Error

CPU Overheat. This is a common error. The usual culprit is failed Thermal Interface Material (TIM). As the material ages it "dries" allowing air inside. Air is a heat insulator, reducing the TIM's ability to transfer enough heat away from the processor. The system fan will steadily get louder over time until it cannot keep up. Once the processor approaches it's Thermal Shutdown Temperature a Yellow LED begins flashing on the console (Early Phat Models). Once it reaches the Thermal Shutdown Temperature the console will beep three times and hard shutdown, flashing red until the console is unplugged and the error state reset. Error 1200 is generated in the SYSCON errorlog.

First be sure the system fan is working. If so, apply new TIM Between the Internal Heat Spreader (IHS) and Heatsink (HS). If that does not resolve the problem, carefully remove the IHS (Delid) and replace the TIM between the IHS and processor DIE.

If that still doesn't work, it could be an issue with the temperature monitor chip (IC1101). Beyond that, some users have noted that dead CPU's can throw error 1200. However, that's the limit of our current understanding. It could be dead, or have another unexplained issue, but usually reflowing or reballing is the last ditch effort to revive such a console.

1201

RSX Thermal Error

GPU Overheat. This is the same as error 1200 above, except it's for the GPU. The same repair steps apply, except it's Temperature Monitor Chip is IC2101.

1203

CELL voltage regulators thermal

1204

Southbridge thermal

1205

EE/GS thermal

1301

CELL PLL

14FF

Check stop

1601

BE Livelock Detection

Speculation: If a YLOD turns into a GLOD after reball/reflow then 1601 (with or without 1701) could mean the RSX RAM was damaged. This is a loose association based on a few user reports.

1701

CELL attention

1802

RSX init

1900

RTC voltage

1901

RTC oscilator

1902

RTC access


Fatal Errors


2001

CELL (IC1001)

2002

RSX (IC2001)

2003

Southbridge Error (IC3001)

2010

Clock Generator Error (IC5001)

2011

Clock Generator Error (IC5003)

2012

Clock Generator Error (IC5002)

2013

Clock Generator Error (IC5004)

2020

HDMI Error (IC2502)

2022

DVE Error (IC2406, CXM4024R MultiAV controller for analog out)

2024

This error tends to cause a delayed Yellow Light Of Death (10s - 1min). Sometimes described as a Green Light Of Death (GLOD) or Red Light Of Death (RLOD).

2124 and 2024 errors have been fixed by replacing both the AV and HDMI encoders. One user reported 2024/2124 errors resolved by replacing the HDMI encoder. Another removed the HDMI encoder and tested the console without it. That console primarily filled the errorlog with 2124 errors, but a few 2024's as well. So it is unclear if 2124 is specific to the HDMI Encoder or AV Encoder. It seems it could be either.

2030

Thermal Sensor Error (IC1101, CELL Temp. Monitor)

Speculation: 2030-33 errors reported in case of dodgy PWR/EJT daughter board.

2031

Thermal sensor Error (IC2101, RSX Temp. Monitor)

2033

Thermal Sensor Error (IC3101)

2101

CELL (IC1001)

2102

RSX (IC2001)

2103

Southbridge Error (IC3001)

2110

Clock Generator Error (IC5001)

2111

Clock Generator Error (IC5003)

2112

Clock Generator Error (IC5002)

2113

Clock Generator Error (IC5004)

2120

HDMI Error (IC2502)

2122

DVE Error (IC2406, CXM4024R MultiAV controller for analog out)

2124

This error tends to cause a delayed Yellow Light Of Death (10s - 1min). Sometimes described as a Green Light Of Death (GLOD) or Red Light Of Death (RLOD).

2124 and 2024 errors have been fixed by replacing both the AV and HDMI encoders. One user reported 2024/2124 errors resolved by replacing the HDMI encoder. Another removed the HDMI encoder and tested the console without it. That console primarily filled the errorlog with 2124 errors, but a few 2024's as well. So it is unclear if 2124 is specific to the HDMI Encoder or AV Encoder. It seems it could be either.

2130

Thermal Sensor Error (IC1101, CELL Temp. Monitor)

2131

Thermal sensor Error (IC2101, RSX Temp. Monitor)

2133

Thermal sensor Error (IC3101)

2203

Southbridge Error (IC3001)


Fatal Boot Errors


3000

Power Failure

3001

12v Power Failure

Usually this caused by a bad Power Supply Unit (PSU).

Alternatively, a failure on the 12v_main line can cause it. Check fuses, capacitors, resistors, and IC's on the 12v line. Measure resistance of the large 2 prong 12v connector on the motherboard. It should read in the Kilo ohms range if there is sufficient separation. Otherwise you may have a short somewhere on the line.

3002

Power Failure

3003

VDDC CELL Power Failure

This error will occur in the case of a PWR failure on the main core voltage of the CPU. For example, if the filtering capacitors (NEC/TOKIN's) are severely damaged. There are other SMD's in that filter, so it could be related to them as well.

3004

VDDC RSX Power Failure

This error will occur in the case of a PWR failure on the main core voltage of the GPU. For example, if the filtering capacitors (NEC/TOKIN's) are severely damaged. There are other SMD's in that filter, so it could be related to them as well.

3010

CELL Error

Observation: A user triggered this error by injecting 3.3V into PWRGD (power good) of IC6103 (NCP5318 CPU Buck Controller). It generated error 20 1001 and 20 3010.

3011

CELL

3012

CELL

3013

BE_SPI DI/DO ERROR

CELL not communicating to syscon via SPI (1.2V MC2_VDDIO and 1.2V BE_VCS no output) = Possible shorts on the line, check C4001 and trailing caps. Possible dead CPU?

Another user had one on a CPU he damaged while deliding.

3020

CELL

3030

CELL

3031

CELL

3032

CELL Error

+1.2v_YC_RC_VDDIO PWR Fail?

3033

CELL

3034

CELL / RSX Communication Error

This is the most common error seen in early Phat model PS3's with the hottest 90nm RSX and CELL processors. It is the hallmark of a BGA defect (such as a cracked solder ball). It is by no means limited to the early models, however. These arrors have been seen in every model of PS3 with varying frequency. The most reliable consoles appear to be those with a CPU/GPU of smaller manufacturing process, such as the Super Slim (SS) models (42xx and later) which have a 45nm CELL and 28nm RSX. The least reliable are the PS2 Backwards Compatable A-E Models, which have 90nm RSX/CELL.

The root cause is mechanical fatigue due to thermal cycling. The materials used to contruct the motherboard and processors have different properties. For example, the cooefficient of thermal expansion for FR4 Fiberglass used in the Motherboard and Processor Substrate is different than that of the copper BGA pads, which is different than that of the Lead-Free solder used to join them. This means they will expand and contract at different rates as the chip heats up and cools down, which applies shearing force to the BGA. Over many thermal cycle this deforms the solder balls and cause a defect (Such as a solder crack, torn trace, or the ball may pull away from the pad).

3034 is triggered when the voltage or data lines connecting the CPU/GPU are broken. There is often a data error (4XXX) that also appears, but not always. The most common cause is a BGA defect on the RSX, which usually requires a reball/reflow to repair. Something about the RSX construction or workload causes it to fail more frequently, but the CPU can fail too. However, it's not always a BGA defect. The bumps on either chip can fail, Flex IO traces (the data lines that connect the CPU/GPU) can be broken/scratched, or accumulated damage from wear and tear (electromigration) can also cause this error. The true percentage of consoles with BGA defects that can be fixed with a reball/reflow is unknown. However, there is evidence to suggest that the underfill used to reinforce the CPU/GPU die and RSXRam bumps was not as effective when the PS3 was manufactured. This could explain many of the consoles who's reball fails prematurely afterwards.

If a reflow/reball of both the CPU/GPU fails, then the chip is beyond repair and needs replaced. The RSX can be replaced with the same model without modification. It can be replaced with a different model using a modchip that injects the correct RSX ID during boot. This has been nicknamed a "Frankenstein Mod." Since they are married to each other, the CPU can only be replaced if also replacing the chipset (NAND/NOR and SYSCON Chips). Since the CPU can't as easily be replaced, a dead CPU is usually considered unrepairable.

3035

CELL and RSX

3036

CELL and RSX

3037

CELL and RSX

3038

CELL and RSX

3039

CELL and RSX

3040

Flash


Data Errors


4001

CELL

4002

RSX

4003

Southbridge

4011

CELL

4101

CELL

4102

RSX

4103

Southbridge

4111

CELL

4201

CELL

4202

RSX

4203

Southbridge

4211

CELL

4212

RSX

4221

CELL

4222

RSX

4231

CELL

4261

CELL

4301

CELL

4302

RSX

4303

Southbridge

4311

CELL

4312

RSX

4321

CELL

4322

RSX

4332

RSX

4341

CELL

4401

CELL or RSX

4402

CELL or RSX

4403

CELL or RSX

4411

CELL or RSX

4412

CELL or RSX

4421

CELL or RSX

4422

CELL or RSX

4432

CELL or RSX

4441

CELL or RSX