Syscon Error Codes: Difference between revisions

From PS3 Developer wiki
Jump to navigation Jump to search
Line 279: Line 279:
Has been reported in a console where the CPU die was chipped during a delid attempt gone wrong. The console exhibited a Green Light of Death (GLOD) and shutting down periodically with A0801301.
Has been reported in a console where the CPU die was chipped during a delid attempt gone wrong. The console exhibited a Green Light of Death (GLOD) and shutting down periodically with A0801301.


On another console 1301 it occurred after both the CPU and RSX were reballed. The reball of cell likely failed, or damaged it.
On another console 1301 occurred after both the CPU and RSX were reballed. The reball of cell likely failed, or damaged it.


On a third console it was reported after nearly every chip on the Motherboard was heat gunned. They probably didn't achieve the necessary temperatures to reflow the CPU, and if they did they probably damaged it by using too much heat.
On a third console it was reported after nearly every chip on the Motherboard was heat gunned. They probably didn't achieve the necessary temperatures to reflow the CPU, and if they did they probably damaged it by using too much heat.

Revision as of 04:36, 10 October 2022

Description

Syscon memory contains a table of size 0x100 bytes intended to store error codes, every error code is composed by 4 bytes + another 4 bytes for its timestamp, in total the table can store 32 errors. When the table is full of errors and a new error needs to be stored syscon deletes the oldest

How to get the syscon error log

If the PS3 still boots up to the XMB and is able to install and run apps you can use programs like the ones mentioned at top of Platform ID page
If the PS3 doesnt boots is still posible to retrieve the syscon error log by connecting a PC to syscon UART port using a "USB to TTL UART adapter" and running the command errlog. There is also the command clearerrlog to empty the error table (handy to prevent confusions with old error codes that could be cummulated along the months/years and not related with the actual problem)

Error log format

There are 2 error log formats that depends of the syscon type: for Mullion, or for Sherwood
The error codes and the timestamps are stored in little endian (right to left)
The timestamps are in J2000 format (number of elapsed seconds since 2000/1/1 12:00:00). It can be converted as an standared unix epoch and then summed 30 years - 12 hours (or 946684800 seconds). See 1
If the battery was empty or removed when the error was triggered the timestamp is recorded as FFFFFFFF
If the battery is replaced but the time is not configured under GameOS (either manualy or by network) it seems the errorlog stores timestamps starting with a date around 2005/12/31 00:00:00 (0x0B488680)
More info about errorlog timestamp formats and loops in the Talk page


Syscon Errorlog from CECHAxx, COK-001, CXR713120-201GB
Offset(h) 00 01 02 03  04 05 06 07  08 09 0A 0B  0C 0D 0E 0F
                                                 
00003700  01 10 80 A0  01 10 80 A0  01 10 80 A0  01 10 80 A0
00003710  01 10 80 A0  01 10 80 A0  04 10 80 A0  01 10 80 A0
00003720  01 10 80 A0  01 10 80 A0  01 10 80 A0  01 10 80 A0
00003730  01 10 80 A0  04 10 80 A0  01 10 80 A0  01 10 80 A0
00003740  01 10 80 A0  01 10 80 A0  01 10 80 A0  01 10 80 A0
00003750  01 10 80 A0  01 10 80 A0  04 30 09 A0  04 30 09 A0
00003760  04 30 09 A0  04 30 09 A0  FF FF FF FF  01 10 80 A0
00003770  01 10 80 A0  01 10 80 A0  01 10 80 A0  01 10 80 A0
                                                 
00003780  20 CF 6D 16  13 23 A7 16  3E D6 D9 16  87 13 2A 17
00003790  17 3C 7C 17  E4 A2 A3 17  A2 15 D4 17  13 FB EB 17
000037A0  CD 7D EF 17  33 85 EF 17  12 8C EF 17  A7 D9 FB 17
000037B0  58 5E 0E 18  BB C9 66 18  CD 25 B5 18  49 C4 29 19
000037C0  75 D5 F9 19  04 8B 61 1B  17 67 D0 22  2D 67 D0 22
000037D0  03 07 6C 27  12 09 6C 27  FF FF FF FF  FF FF FF FF
000037E0  FF FF FF FF  FF FF FF FF  F0 E7 27 16  06 BD 33 16
000037F0  E5 DE 38 16  DD D4 5C 16  C4 AC 6C 16  EA C7 6D 16
  • In the errorlog sample above:
    • Errorlog looped at least 1 time (1 errorcode FFFFFFFF)
    • Timestamps are valid, and the time was configured
    • Contains errors: A080 1  001 , A080 1  004 , A009 3  004 
Syscon Errorlog from CECHHxx, DIA-001, CXR714120-301GB
Offset(h) 00 01 02 03  04 05 06 07  08 09 0A 0B  0C 0D 0E 0F
                                                 
00003700  22 43 40 A0  34 30 40 A0  22 43 40 A0  34 30 40 A0
00003710  22 43 40 A0  34 30 40 A0  FF FF FF FF  34 30 40 A0
00003720  22 43 40 A0  34 30 40 A0  22 43 40 A0  34 30 40 A0
00003730  22 43 40 A0  34 30 40 A0  22 43 40 A0  34 30 40 A0
00003740  22 43 40 A0  34 30 40 A0  22 43 40 A0  34 30 40 A0
00003750  22 43 40 A0  34 30 40 A0  22 43 40 A0  34 30 40 A0
00003760  22 43 40 A0  34 30 40 A0  22 43 40 A0  34 30 40 A0
00003770  22 43 40 A0  34 30 40 A0  22 43 40 A0  34 30 40 A0
                                                 
00003780  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
00003790  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
000037A0  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
000037B0  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
000037C0  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
000037D0  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
000037E0  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
000037F0  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
  • In the errorlog sample above:
    • Errorlog looped at least 1 time (1 errorcode FFFFFFFF)
    • Timestamps are invalid
    • Contains errors: A040 4  322 , A040 3  034 
Syscon Errorlog from CECHLxx/Mxx/Pxx/Qxx, VER-001, SW-301
Offset(h) 00 01 02 03  04 05 06 07  08 09 0A 0B  0C 0D 0E 0F
                                                 
00000900  31 21 80 A0  73 D4 50 0B  30 20 40 A0  89 D4 50 0B
00000910  31 21 80 A0  8C D4 50 0B  30 20 80 A0  8E D4 50 0B
00000920  30 21 80 A0  8E D4 50 0B  31 21 80 A0  8F D4 50 0B
00000930  01 30 00 A0  FF FF FF FF  FF FF FF FF  FF FF FF FF
00000940  30 20 80 A0  75 D1 50 0B  30 20 80 A0  78 D1 50 0B
00000950  30 20 80 A0  B2 D1 50 0B  31 20 80 A0  B3 D1 50 0B
00000960  30 21 80 A0  BD D1 50 0B  30 21 80 A0  D5 D1 50 0B
00000970  30 21 80 A0  DF D1 50 0B  30 20 80 A0  E0 D1 50 0B
00000980  31 21 80 A0  84 D2 50 0B  30 21 80 A0  DC D2 50 0B
00000990  31 21 32 A0  4F D3 50 0B  31 20 40 A0  50 D3 50 0B
000009A0  31 21 80 A0  51 D3 50 0B  30 21 80 A0  57 D3 50 0B
000009B0  31 21 80 A0  59 D3 50 0B  31 21 80 A0  FF D3 50 0B
000009C0  31 21 80 A0  05 D4 50 0B  30 20 80 A0  06 D4 50 0B
000009D0  30 20 80 A0  07 D4 50 0B  31 21 80 A0  2D D4 50 0B
000009E0  31 20 80 A0  3A D4 50 0B  30 21 80 A0  42 D4 50 0B
000009F0  30 21 80 A0  72 D4 50 0B  31 21 80 A0  72 D4 50 0B
  • In the errorlog sample above:
    • Errorlog looped at least 1 time (1 errorcode FFFFFFFF)
    • Timestamps are valid, but the time was not configured
    • Contains errors: A080 2  131 , A040 2  030 , A080 2  030 
      A080 2  130 , A000 3  001 , A080 2  031 
      A032 2  131 , A040 2  031 
Syscon Errorlog from CECH-42xx, PQX-001, SW3-304
Offset(h) 00 01 02 03  04 05 06 07  08 09 0A 0B  0C 0D 0E 0F
                                                 
00000900  02 18 61 A0  FF FF FF FF  02 18 61 A0  FF FF FF FF
00000910  02 18 61 A0  FF FF FF FF  02 18 61 A0  FF FF FF FF
00000920  02 18 61 A0  FF FF FF FF  02 18 61 A0  FF FF FF FF
00000930  02 18 61 A0  FF FF FF FF  02 18 61 A0  FF FF FF FF
00000940  02 18 61 A0  FF FF FF FF  02 18 61 A0  FF FF FF FF
00000950  02 18 61 A0  FF FF FF FF  02 18 61 A0  FF FF FF FF
00000960  02 18 61 A0  FF FF FF FF  02 40 40 A0  FF FF FF FF
00000970  34 30 40 A0  FF FF FF FF  02 40 40 A0  FF FF FF FF
00000980  34 30 40 A0  FF FF FF FF  02 40 40 A0  FF FF FF FF
00000990  34 30 40 A0  FF FF FF FF  02 40 40 A0  FF FF FF FF
000009A0  34 30 40 A0  FF FF FF FF  02 40 40 A0  FF FF FF FF
000009B0  34 30 40 A0  FF FF FF FF  02 40 40 A0  FF FF FF FF
000009C0  34 30 40 A0  FF FF FF FF  FF FF FF FF  FF FF FF FF
000009D0  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
000009E0  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
000009F0  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
  • In the errorlog sample above:
    • Errorlog not looped (more than 1 errorcode FFFFFFFF)
    • Timestamps are invalid
    • Contains errors: A061 1  802 , A040 4  002 , A040 3  034 

Error code format

The error codes follows the format:  A  R  ST  C  ERR , where:

 A  (Fixed) A = This is always "A"


 R  (Reserved) 0-E = Unknown

  • F = Frequent error (For example, Motherboard Damage/Breakdown, etc.)


 ST  (Step Number) 00-7F = Step Number of the Power On Sequence (POS). This is the Power On Self Test (POST) process. If successful, the BOOT process begins, which loads the OS.


80 = Static State (Power ON). The console completed the POST and was in a static state. The error happened when the PS3 was powered on. You can get an error with Step No. 80 if your error occurs in game. For example, 80 1002 errors can happen if your NEC/TOKINs are going bad.


90 = Static State (Power OFF). The error happened when the PS3 was powering off. For example, if a problem causes the system to hang while shutting down the console will beep before powering off. An error with step no. 90 will be recorded in the errorlog.


A0 = Immediately after SYSCON reset. When you supply/plug-in power, the PS3 is supposed to enter Standby. A solid red LED will illuminate on early "Phat" models, for example. There is a Standby circuit in the PS3 that constantly needs power, so the PS3 can wait for the user to start the console. Sometimes this is called a Vampire circuit, because it uses power even when you're not using the console. Many electronics do this to allow you to turn them on remotely. Otherwise, you would need to physically flip a switch to turn them on.

The PS3 reset circuit consists of the SYSCON and it's Clock generating Crystal, Bluetooth/WIFI Card, Front PWR/EJT and LED panel, and Thermal Monitor ICs. The SYSCON needs to know if you are trying to start the console. Either manually with the PWR button or over Bluetooth using your controller. So those modules need to be powered. It also needs to know that the thermal monitors are functioning properly before it can safely send power to the Southbrige, CPU or GPU. Otherwise, the console would be a fire hazard! Thermal monitors ICs are like the PS3s fire alarms. They are critical safety equipment.

If there is a Hardware issue anywhere in that circuit, you will get an error immediately after SYSCON reset that will prevent you from even attempting to power on the console. The front LED will flash Red Indefinitely as soon as you plug in the console, instead of giving you a solid RED LED. This is how you know you have an error with the standby circuit. And there will be an associated error logged in the syscon that may help you track it down to the specific component.

5v_MISC powers the reset circuit, which are a collection of SMD/SMT components and IC's that power the above modules. Check the Service manual (If available) for specifics. For example, in the COK-001 service manual you can find the circuit diagram on page 23/45. You can see that IC6005 (a DC/DC converter) is responsible for generating +3.3v_EVER. IC6006 generates +1.8V_EVER, and IC6009 generates +3.3V_THERMAL. These are the main voltages used by the component in the reset circuit, such as the WIFI/Bluetooth card needed to remote start the console by pressing the PS button on the controller.


 C  (Category)

  • 1 = System Error
  • 2 = Fatal Error
  • 3 = Booting Error
  • 4 = Data Error


 ERR  (Error)

  • This is a 3-Digit error code that gives specific information about the issue. For example, System error 002 (1002) means "RSX VRM Failure."

Discussion


The 3-digit error code can repeat in others Categories, but doesn't mean the same thing. System error 001 and Fatal error 001 don't mean the same thing. 1001 is "BE VRM Power Failure" and 2001 is "BE Error." We wouldn't be able to tell the difference between them if we just referred to the error. That's not enough information by itself to understand the problem. The category gives the error context. So we use the 4-Digit code CERR to differentiate them from one another.

Likewise, the Step Nump (ST) provides context to the 4-Digit code. For example, you can have a CELL VRM Power Failure occur while playing an intense game (Static State, Power on). That would generate an 801001 (Step number 80). However, you can also get that error when turning the console on, during Power On Sequence Testing (POST), before it even has a chance to start the bootloader, which loads the OS, which in turn let's you load the game. This time it might generate error 101001. Step number 10 is lower than Step number 80, telling you this 1001 occurred earlier. The Categoty + Error tells you "What" happened. The Step Number tells you "when" it happened. It's building context that can help you figure out what is causing the error.

It is important to consider the context of the full errorlog. Not just the 4-digit CERR. The reason is because you are attempting to build the case for "Why" it happened! In the previous example, the A0801001 could mean the NEC/TOKIN Proadlizers (a type of capacitor that is part of the CELL CPU's VRM, Voltage Regulation Module) may be at fault. The A0101001 can result from other causes simply because there is a larger number of things that can go wrong. It may not be the NEC/TOKINs at all.

All of this means you really need to be familiar with the hardware to make sense of the errors stored in the SYSCON. Unfortunately, there isn't a single error for every potential issue. For example, there is no error that can tell you that Capacitor C6900 is short. There are a few exceptions, where most of the time a certain code means the same thing. Like a fuse blew and "usually" causes said code. But we cannot rule out the possibility that a cap blowing on the same line could also cause it. However rare as that may be.


Examples:

A0801002

  • System Error 002 (RSX VRAM Power Fail) which occurred while the System was successfully powered On.
  • 1002 errors are known to be caused by bad NEC/TOKINs, but may not be the only cause. See Error Code section below for more details.

A0403034

  • Fatal Booting Error 034 (RSX/CELL Communication Error) which occurred at step no. 40 (BitTraining), before the Power On Sequence completed.
  • 3034s are caused by BGA/Bump Defects (among other issues). Experienced PS3 repair technicians have noted that it is almost exclusive to the RSX. While our knowledge of the hardware interface cannot rule out the possibility a CELL BE BGA/Bump defect can cause it, that has been the exception to the rule. Experience and time have shown 3034's are primarily an RSX issue. A repair technician needs to decide which processor to reball/replace based upon the more likely candidate. They have to use discretion.
  • See Error Code section below for more details.

A0213013

  • 3013 errors have been caused by Dead CELL BE CPU.

The following Error Code section will only list the last 4 numbers (category + error). However, remember the Reserved Area and Step Number can be useful to figure out "when" the error occurred and how frequent it is. The last four numbers are the most important for figuring out what specific error means, but you still need to figure out what it means in context of your issue. So you can diagnose the error and then fix it.

Error codes

System Errors


1001 (Power CELL)

  • Components Involved:
    • CELL (IC1001 on COK-001)
    • NEC/TOKIN Proadlizers (C6140/C6141/C6142/C6143 on COK-001)
    • Other nearby components of the power block

This error is due to insufficient Filtering on the CPU's core voltage (VDDC) or an unexpected shutdown. There is a range of voltage ripple/noise that cause errors before it gets so bad it causes a CELL VDDC Power Failure (3003). There are numerous SMD components involved in filtering, but the main concern are the NEC/TOKIN Proadlizers (capacitors). 1002 errors are the fingerprint of bad NEC/TOKINs on the GPU, but 1001 is not as easy to diagnose. You need to witness the console YLOD under load and see a new 1001 error was generated by it. Otherwise, the 1001 could simply mean the console wasn't shut off properly.

1001 errors can be logged naturally when the system encounters an unexpected shutdown or AC power loss. They often occur in testing, when the console is switched on/off a lot, instead of a graceful shutdown. A0801001 errors by themselves cannot be used as evidence of failing NEC/TOKINs. Such errors are commonly found in the log of perfectly working machines, and are nothing to worry about unless the system is shutting itself off unexpectedly.

A common case where 1001 errors can be misinterpreted is a machine that can power ON, but has graphical artifacts or no video (GLOD). In these cases the console must be forced off using the power rocker at the back of the console (in Phat models), or by pulling the power cord (slim & super slim models), which can cause the 1001. This can also cause 1004 errors. These errors can be ignored if they were not generated under normal circumstances. In the case of a console exhibiting artifacts/GLOD, the bigger issue should be addressed first (typically a GPU issue requiring a reball/replacement). Afterwards, if 1001 errors return during stress testing, then you can diagnose the CPU NEC/TOKINs.

Anecdote: One console with bad CPU NEC/TOKINs exhibited an A0901001 only upon shutdown. It was stable in The Last of Us (strenuous game) and did not seem to have the typical behavior associated with bad NEC/TOKINs. However, it hung in shutdown for an extended period of time, finally issuing the YLOD (3-beeps and flashing red). It needed to be reset before it could be powered on again. Replacing the NEC/TOKINs repaired the issue.

1002 (Power RSX)

  • Components Involved:
    • RSX (IC2001 on COK-001)
    • NEC/TOKIN Proadlizers (C6229/C6230/C6231/C6232 on COK-001)
    • Other nearby components of the power block

This error has been associated with insufficient Filtering on RSX_VDDC power line. There is a range of voltage ripple/noise that will cause this error before it gets so bad it causes an RSX_VDDC Power Failure (3004). YLOD's causing 1002's range in duration from 2 seconds to only occurring during intense games.

There are numerous SMD components involved in filtering, but the main concern are the NEC/TOKIN Proadlizers (capacitors). 1002 errors are the fingerprint of bad tokins.

1004 (Power AC/DC)

When a console looses AC power, error A0801004 may be generated. A common case where 1004 errors occur is a machine that can power ON, but has graphical artifacts or no video (GLOD). In these cases the console must be forced off using the power rocker at the back of the console (in Phat models), or by pulling the power cord (slim & super slim models). Doing this causes a loss of AC, which can cause the error.

This error can be ignored if it wasn't generated under normal circumstances. Such as a power outtage or accidental unplugging. Since it didn't result from a hardware fault, it's not serious. In the case of a console exhibiting artifacts/GLOD, the bigger issue should be addressed first (typically a GPU issue requiring a reball/replacement). Afterwards, if 1004 errors return, then you can should diagnose the AC/DC line. PSU and it's connection upto the DC-DC converters.

1103 (Thermal Alert SYSTEM)

  • Components Involved:
    • CELL
    • CELL temperature monitor (only in mullion syscons, the CELL temperature monitor for PS3 slims and superslims cant send this error code)

Syscon have a pad/pin dedicated to this signal, the reason why it was given an official generic name (not indicating who was triggering it) is because this signal can be sent by several components, in the first PS3 models (with mullion syscon ?) it can be sent by CELL, or by the CELL temperature monitor, using the official function names SYS_THR_ALRT or THERMAL_OVERLOAD
But this electrical design is not specific for the PS3, there could be other devices based in the IBM/CELL and developed by sony where this error code is sent by other components and could have more than one CELL... so in general we could say this error code indicates one (or more) of the CELL processors (or his temperature monitor chips) is overheating

1200 (Thermal CELL)

  • Components Involved:

CPU Overheat. This is a common error. The usual culprit is failed Thermal Interface Material (TIM). As the material ages it "dries" allowing air inside. Air is a heat insulator, reducing the TIM's ability to transfer enough heat away from the processor. The system fan will steadily get louder over time until it cannot keep up. Once the processor approaches it's Thermal Shutdown Temperature a Yellow LED begins flashing on the console (Early Phat Models). Once it reaches the Thermal Shutdown Temperature the console will beep three times and hard shutdown, flashing red until the console is unplugged and the error state reset. Error 1200 is generated in the SYSCON errorlog.

First be sure the system fan is working. If so, apply new TIM Between the Internal Heat Spreader (IHS) and Heatsink (HS). If that does not resolve the problem, carefully remove the IHS (Delid) and replace the TIM between the IHS and processor DIE.

If that still doesn't work, it could be an issue with the temperature monitor chip (IC1101). Beyond that, some users have noted that dead CPU's can throw error 1200. However, that's the limit of our current understanding. It could be dead, or have another unexplained issue, but usually reflowing or reballing is the last ditch effort to revive such a console.

1201 (Thermal RSX)

  • Components Involved:

GPU Overheat. This is the same as error 1200 above, except it's for the GPU. The same repair steps apply, except it's Temperature Monitor Chip is IC2101. This error is rare. Out of hundreds of consoles and years of user reports this error has only occurred when the user forgot to replace the RSX heatsink when testing the console. It has not been reported under normal circumstances. The RSX tends to fail long before the TIM degrades to the point thermal shutdown is reached.

1203 (Thermal CELL VR)

Some PS3 motherboards (TMU-520, COK-001, COK-002), have a temperature monitor located somewhere in the CELL power block. The other retail PS3 motherboard models doesnt meassures the temperature of the CELL VR

All the PS3 temperature monitor chips have a internal thermal sensor integrated + 2 pins for an optional external sensor. The temperature monitors for CELL and RSX are configured to use the external sensor, but this one for CELL VR probably uses the internal

1204 (Thermal South Bridge)

1205 (Thermal EE/GS)

This error is specific for COK-001/CXD2953AGB (with full PS2 hardware compatibility, EE+GS) or COK-002/CXD2972GB (with partial PS2 hardware compatibility, GS only)

1301

CELL PLL Unlock

Has been reported in a console where the CPU die was chipped during a delid attempt gone wrong. The console exhibited a Green Light of Death (GLOD) and shutting down periodically with A0801301.

On another console 1301 occurred after both the CPU and RSX were reballed. The reball of cell likely failed, or damaged it.

On a third console it was reported after nearly every chip on the Motherboard was heat gunned. They probably didn't achieve the necessary temperatures to reflow the CPU, and if they did they probably damaged it by using too much heat.

In all 3 cases the CPU was damaged or heated in some way.

14FF

Check stop

This error can occur when the console was on at the time the YLOD occurred. On consoles exhibiting this error, subsequent attempts to start the console resulted in a GLOD with 1601/1701 errors, or a YLOD within 2 seconds. SYSCON errors usually show one A0801601/A0801701 occurring at the same timestamp, followed thereafter by 3034/4xxx errors for all subsequent attempts to PWR it on. Or it'll GLOD and throw more 1701/1601's. The working theory is that there is a precarious solder joint (BGA or bump defect) teetering on the edge of breaking. It'll soon switch to 3034 with or without 4xxx errors.

Complicating the issue is the fact that sometimes people will get a 1301 or 1802 also. It likely has to do with where the joints are failing and it's involving those sub systems briefly before fully breaking.

Unlike Livelocks, which can be caused by both hardware and software conditions, Checkstops are a hardware issue. A checkstop occurs when the CPU or GPU, cache, memory, or I/O bus controller, finds something in an impossible state (impossible unless the Hardware is broken). The error isn't identified as a particular bus transfer in progress, or the CPU/GPU detects the console is stuck (frozen, no progress being made with that operation). When nothing can be done for a long enough period of time, the checkstop errors is logged and BE ATTENTION is driven High. SYSCON immediately shuts the console down with error A08014FF and A0801701.

The most likely cause of the error is a failing GPU (RSX) solder joint (BGA or Bumps). A distant second is a failing CPU (CELL) solder joint (BGA).

1601

BE Livelock Detection

CPU is deadlocked and cannot proceed. Some kind of error occurred, preventing a process from completing. It is the software equivalent of trying to pass someone in a hallway and you both keep choosing the same direction to swerve. Now imagine you had exactly 30s to make it to the other end of the hallway to catch an elevator, and it takes 29s to get there. Neither of you can pass and miss your elevators because of it. Now imagine you were supposed to pass an envelope to a person on the 3d floor, who had 30s to read it and enter it in a spreadhseet. Now he misses his deadline too. And imagine the entire organization was micromanaged like this. One disruption can cause the whole operation to grind to a halt! That's kinda how this works.

Basically this means the console froze and had to reboot. In the PS3 this is often preceded by graphical artifacting. The cause is often a solder joint on the RSX (BGA or Bumps). Generally these errors are seen in the early stages of a GPU failure. However CPU failures cannot be ruled out. They are just less likely.

Speculation:

As the impedance of propagating solder cracks increases, the digital logic core has a harder time calibrating the FlexIO during BitTraining. Once impedance reaches the limits of the compensation network, interference causes random issues during software execution. LiveLock conditions cause BE Attention signal to be driven High and the SYSCON shuts the console down (YLOD) with errors A0801601 and A0801701.

As the console cools the microscopic gaps in the solder can be physically reconnected by thermal warping. Warping is due to differences in the Coefficients of Thermal Expansion (CTE) between materials in the motherboard and processor. This expansion and contraction can reconnect the solder joints just enough to allow the console to boot. Or it may disconnect them.

  • If they reconnected, the console will boot until it experiances another 1601/1701 event.
  • It they do not reconnect, the console cannot complete BitTraining and will fail in POST with error A0403034. Often with an associated Data error, such as A0404401 (if the broken solder joint affected a Data line on one of the SPI lines). If there is no Data error, the broken joint only affected the voltage for the SPI line. Either RSX_VDDR or YC_RC_VDDIO.

If a YLOD turns into a GLOD after reball/reflow then 1601 (with or without 1701) could mean the RSX RAM was damaged. This is a loose association based on a few user reports.

1701

CELL attention

BE ATTENTION is an active-high output flag sent by the CPU to the SYSCON. During initialization & configuration it is used to request an operation by the SYSCON. When ATTN goes High the syscon reads the SPI Status Register to determine the cause of the Attention signal. It remains high until software resets the condition that caused it.

After Power On Reset the BE attention signal is driven low and is supposed to stay there! If there is a Checkstop error (14FF), Livelock Detection (1601), or PLL Unlock (1301) the CPU enters a fault condition and raises the Attention signal (1701) during operation. The SYSCON sees this and immediately shuts the console down with error code A0801701 and usually another error indicating the cause. One common way this happens is when a solder connection breaks while the system is on. This could be the BGA (Ball Grid Array) or the Solder Bumps under the die.

Going into more detail, BE Attention is used during Power On Reset (POR)...

  • To load CPU VID voltage from the VRM internal registers.
  • To Write configuration-ring data (Important CPU Config settings that should only be modified at boot, otherwise errors can occur).
  • To calibrate the FlexIO interface (BitTraining).

If Attention occurs during the Power ON State (Step# 80) it indicates an error condition. Basically, something is flagged by the Processor as abnormal. It's forced to attempt to resolve the problem before it can continue with whatever it was trying to do. If the error condition cannot be resolved, the CPU sends the ATTENTION signal to the SYSCON. The SYSCON immediately shuts off the console, then reads the SPI Status Register to determin the cause. Then it records the A0801701 in it's errorlog alogng with the specific cause (if it determined one). Errors that can cause the Attention include:

  • Unresolved Checkstop errors (14FF)
  • Livelock Detection (1601)
  • PLL Unlock Condition (1301)
  • BGA/Bump Defect that occurs while the Console was On (Step# 80). Subsequent attempts to power on the console would result in 3034/4xxx errors.

A user get this error code with a damaged hard drive. He was transfering some games via FTP, and his console turned off with Ylod. When he tried to turn on again, he get a Glod. Problem was fixed just by changing the HDD.

1701 has been reported from using homebrew apps that caused a software conflict. Uninstalling the software can resolve the issue. It that's not possible because the system is locked up, it may be necessary to restore the operating system (OS).

1802

RSX

A0201802 is the error the SYSCON will return when there is no RSX installed at all! Step# 20 is when the RSX is first Initialized. So if it's not responding that early in the Power On Sequence, then it's Dead-Dead or completely missing!

A0801802 is occuring after the console has booted (step# 80) and causes BE Attention (1701) alarm raised when a Checkstop error (14FF) occurs. Likely the 1802 was the hardware failure that caused the checkstop error. That causes BE ATTENTION to be driven High and the SYSCON shuts the console down with A0801802, A08014FF, and A0801701. That makes sense because the CPU couldn't continue with it's process when the RSX interrupt occurred. These errors have been seen in consoles that were repaired by an RSX reball/replacement.

1900 (RTC Voltage)

RTC voltage

1901 (RTC Oscilator)

RTC oscilator

1902 (RTC Access)

RTC access


Fatal Errors


  • This fatal error codes seems to be repeated up to 3 times for 3 special cases, as example, errors 2003, 2103, and 2203 are related with southbridge, the only thing that changes in the error code is the second digit (located inmediatly after the category 2). If at some point we find what means that second digit we can join the wiki page sections together (with titles: "2001 & 2101", "2002 & 2102", "2003 & 2103", etc...)

In other words, there are 3 groups: 20xx (composed by 13 errors), 21xx (composed by 13 errors), and 22xx (composed by 1 error). See Discussion

2001 (CELL)

CELL (IC1001)

2002 (RSX)

RSX (IC2001)

2003 (South Bridge)

South Bridge Error (IC3001)

2010 (Clock Subsystems)

Clock Generator Error (IC5001)

2011 (Clock CELL)

Clock Generator Error (IC5003)

2012 (Clock CELL)

Clock Generator Error (IC5002)

2013 (Clock CELL, RSX, South Bridge)

Clock Generator Error (IC5004)

2020 (HDMI)

HDMI Error (IC2502)

2022 (DVE)

DVE Error (IC2406, CXM4024R MultiAV controller for analog out)

2024 (AV)

This error tends to cause a delayed Yellow Light Of Death (10s - 1min). Sometimes described as a Green Light Of Death (GLOD) or Red Light Of Death (RLOD).

2124 and 2024 errors have been fixed by replacing both the AV and HDMI encoders. One user reported 2024/2124 errors resolved by replacing the HDMI encoder. Another removed the HDMI encoder and tested the console without it. That console primarily filled the errorlog with 2124 errors, but a few 2024's as well. So it is unclear if 2124 is specific to the HDMI Encoder or AV Encoder. It seems it could be either.

2030 (Thermal Sensor, CELL)

Speculation: 2030-33 errors reported in case of dodgy PWR/EJT daughter board.

2031 (Thermal Sensor, RSX)

2033 (Thermal Sensor, South Bridge)

2101 (CELL)

CELL (IC1001)

2102 (RSX)

RSX (IC2001) I had a short in the cell, in the nec tokin after removing it, changed the error.

2103 (South Bridge)

Southbridge Error (IC3001)

2110 (Clock Subsystems)

Clock Generator Error (IC5001)

This error can be caused by a 5V_MISC short to ground. One user had an A0022110 after replacing IC6105 (Buck Converter) and accidentally bridging the 5V voltage input. So check the 5V line for shorts.

This error has been resolved by a number of users who had a short on F6001. It is important to note that something usually causes that fuse to blow, like a short. So it's important to troubleshoot the board to find and repair the shorting component before replacing the fuse. Otherwise the new one will blow too.

One user, who resolved this error on his C model PS3, noted "very short YLOD. Error code shows 2110[...]Some earlier code shows 1001 and 1002." The 1001 & 1002 errors he noted in the log before the 2110 appeared may have been a clue that C6019 or C6020 (as they are in parallel) was deteriorating. Further investigation is needed to confirm this hypothesis, however. In his case, C6019 was shorting and caused F6001 to blow. This short overloaded F6001 and cut power to many Subsystems, such as the HDD, USB ports, South bridge, CPU, GPU, etc. Another user confirmed this. The error log was showing code 2110 and one entry earlier was showing code 1001. Checking both capacitors after removing them from the board, confirmed that one capacitor was reading 140 ohms and not reading as a capacitor, so it was working as a resistor causing extra load in the fuse.

One particularly noteworthy component is IC6020, which supplys +3.3v_MK_Vdd to the clock generator (IC5001). When F6001 blows, a 02 2110 is generated. A step number of 02 is very early in the power on sequence (POS), which explains why 2110 is triggered instead of another error code. Since the clock generator is critical for timing, it is one of the first things the SYSCON checks during the POS.

2111 (Clock CELL)

Clock Generator Error (IC5003)

2112 (Clock CELL)

Clock Generator Error (IC5002)

2113 (Clock CELL, RSX, South Bridge)

Clock Generator Error (IC5004)

2120 (HDMI)

HDMI Error (IC2502)

A0202120/A0213013 error combinations are common. They appear to be related to VDDIO. IC6301 is involved in the formation of +1.7V_MISC, which amonge other things provides input power to the DC-DC converters that output +1.2V_YC_RC_VDDIO, +1.5V_YC_RC_VDDA, +1.2V_SB_VDDC and +1.2V_SB_VDDR. Lack of voltage to these DC/DC converters downstream of IC6301 suggests F6302 has blown. A number of people have fixed these 2120/3013 errors by finding shorts at or near C6320 and replacing Fuse F6302. But there are many other SMD nearby that might cause these fuses to blow. So you will need to track the source of the short and fix it, or the fuse will just blow again.

A bad thermistor (TH2501) has been reported to cause A0002120.

A0802120 and A0902120 errors may be related to the actual HDMI transmitter (IC2502). Or they can be caused by BGA/Bump defects affecting VDDIO, on the RSX or CELL. BGA defects on the RSX VDDIO pads have been confirmed with a pressure test to have caused 2120 errors.

2122 (DVE)

DVE Error (IC2406, CXM4024R MultiAV controller for analog out)

2124 (AV)

This error tends to cause a delayed Yellow Light Of Death (10s - 1min). Sometimes described as a Green Light Of Death (GLOD) or Red Light Of Death (RLOD).

2124 and 2024 errors have been fixed by replacing both the AV and HDMI encoders. One user reported 2024/2124 errors resolved by replacing the HDMI encoder. Another removed the HDMI encoder and tested the console without it. That console primarily filled the errorlog with 2124 errors, but a few 2024's as well. So it is unclear if 2124 is specific to the HDMI Encoder or AV Encoder. It seems it could be either.

2130 (Thermal Sensor, CELL)

2131 (Thermal Sensor, RSX)

2133 (Thermal Sensor, South Bridge)

2203 (South Bridge)

South Bridge Error (IC3001)

2310


Fatal Boot Errors


3000

Power Failure

3001

12v Power Failure

Usually this caused by a bad Power Supply Unit (PSU).

Alternatively, a failure on the 12v_main line can cause it. Check fuses, capacitors, resistors, and IC's on the 12v line. Measure resistance of the large 2 prong 12v connector on the motherboard. It should read in the Kilo ohms range if there is sufficient separation. Otherwise you may have a short somewhere on the line.

3002

Power Failure

3003

VDDC CELL Power Failure

This error will occur in the case of a PWR failure on the main core voltage of the CPU. For example, if the filtering capacitors (NEC/TOKIN's) are severely damaged. There are other SMD's in that filter, so it could be related to them as well.

3004

VDDC RSX Power Failure

This error will occur in the case of a PWR failure on the main core voltage of the GPU. For example, if the filtering capacitors (NEC/TOKIN's) are severely damaged. There are other SMD's in that filter, so it could be related to them as well.

3010

CELL Error

Observation: A user triggered this error by injecting 3.3V into PWRGD (power good) of IC6103 (NCP5318 CPU Buck Controller). It generated error 20 1001 and 20 3010.

3011

CELL

3012

CELL

3013

BE_SPI DI/DO ERROR

CELL not communicating to syscon via SPI (1.2V MC2_VDDIO and 1.2V BE_VCS no output) = Possible shorts on the line, check C4001 and trailing caps. Possible dead CPU?

Another user had one on a CPU he damaged while deliding.

A0212120/A0213013 error combinations are common. They appear to be related to VDDIO. IC6301 is involved in the formation of +1.7V_MISC, which among other things provides input power to the DC-DC converters that output +1.2V_YC_RC_VDDIO, +1.5V_YC_RC_VDDA, +1.2V_SB_VDDC and +1.2V_SB_VDDR. Lack of voltage to these DC/DC converters downstream of IC6301 suggests F6302 has blown. A number of people have fixed these 2120/3013 errors by finding shorts at or near C6320 and replacing Fuse F6302. But there are many other SMD nearby that might cause these fuses to blow. So you will need to track the source of the short and fix it, or the fuse will just blow again.

One person reported A0202120/A0213013 when his CPU substrate (interposer) was cracked in half by a failed delid attempt.

3020

CELL

3030

CELL

3031

CELL

3032

CELL Error

+1.2v_YC_RC_VDDIO PWR Fail?

3033

CELL

3034

CELL / RSX Communication Error

This is the most common error seen in early Phat model PS3's with the hottest 90nm RSX and CELL processors. It is the hallmark of a BGA defect (such as a cracked solder ball). It is by no means limited to the early models, however. These arrors have been seen in every model of PS3 with varying frequency. The most reliable consoles appear to be those with a CPU/GPU of smaller manufacturing process, such as the Super Slim (SS) models (42xx and later) which have a 45nm CELL and 28nm RSX. The least reliable are the PS2 Backwards Compatable A-E Models, which have 90nm RSX/CELL.

The root cause is mechanical fatigue due to thermal cycling. The materials used to contruct the motherboard and processors have different properties. For example, the cooefficient of thermal expansion for FR4 Fiberglass used in the Motherboard and Processor Substrate is different than that of the copper BGA pads, which is different than that of the Lead-Free solder used to join them. This means they will expand and contract at different rates as the chip heats up and cools down, which applies shearing force to the BGA. Over many thermal cycle this deforms the solder balls and cause a defect (Such as a solder crack, torn trace, or the ball may pull away from the pad).

3034 is triggered when the voltage or data lines connecting the CPU/GPU are broken. There is often a data error (4XXX) that also appears, but not always. The most common cause is a BGA defect on the RSX, which usually requires a reball/reflow to repair. Something about the RSX construction or workload causes it to fail more frequently, but the CPU can fail too. However, it's not always a BGA defect. The bumps on either chip can fail, Flex IO traces (the data lines that connect the CPU/GPU) can be broken/scratched, or accumulated damage from wear and tear (electromigration) can also cause this error. The true percentage of consoles with BGA defects that can be fixed with a reball/reflow is unknown. However, there is evidence to suggest that the underfill used to reinforce the CPU/GPU die and RSX Ram bumps was not as effective when the PS3 was manufactured. This could explain many of the consoles who's reball fails prematurely afterwards.

If a reflow/reball of both the CPU/GPU fails, then the chip is beyond repair and needs replaced. The RSX can be replaced with the same model without modification. It can be replaced with a different model using a modchip (or new syscon modification) that injects the correct RSX ID during boot. This has been nicknamed a "Frankenstein Mod." Since they are married to each other, the CPU can only be replaced if also replacing the chipset (NAND/NOR and SYSCON Chips). Since the CPU can't as easily be replaced, a dead CPU is usually considered unrepairable.

3035

CELL and RSX

3036

CELL and RSX

3037

CELL and RSX

3038

CELL and RSX

3039

CELL and RSX

3040

Flash


Data Errors


  • This error codes seems to be repeated up to 5 times for 5 special cases, as example, errors 4001, 4101, 4201, 4301, 4401 are related to CELL, the only thing that changes in the error code is the second digit (located inmediatly after the category). If at some point we find what means that digit we can join the wiki page sections together (with titles: "4001, 4101, 4201, 4301, 4401", etc...)

4001

CELL

4002

RSX

4003

Southbridge

4011

CELL

4101

CELL

4102

RSX

4103

Southbridge

4111

CELL

4201

CELL

4202

RSX

4203

Southbridge

4211

CELL

4212

RSX

4221

CELL

4222

RSX

4231

CELL

4261

CELL

4301

CELL

4302

RSX

4303

Southbridge

4311

CELL

4312

RSX

4321

CELL

4322

RSX

4332

RSX

4341

CELL

4401

CELL or RSX

4402

CELL or RSX

4403

CELL or RSX

4411

CELL or RSX

4412

CELL or RSX

4421

CELL or RSX

4422

CELL or RSX

4432

CELL or RSX

4441

CELL or RSX