Synergistic Processing Unit (SPU): Difference between revisions
No edit summary |
m (moved SPU Infos to Secure Processing Unit (SPU): Minor cleanup) |
(No difference)
|
Revision as of 07:06, 8 April 2011
SPU Application Binary Interface Specification
All the informations are taken from this PDF.
Function Calling Sequence
The standard calling sequence requirements apply only to global functions. Local functions that are not reachable from other compilation units may use different conventions; however, using non-standard calling sequences is not recommended.
Register Usage
Register | Status | Usage |
R0 (LR) | Dedicated | Return Address / Link Register. This register contains the address to which a called function normally returns. It is volatile across function calls and must be saved by a non-leaf function. |
R1 (SP) | Dedicated | Stack pointer information. Word element 0 of the SP register contains the current stack pointer. The stack pointer is always 16-byte aligned, and it must always point to the lowest allocated valid stack frame and grow towards low addresses. The contents of the word at the stack-frame address always point to the previous allocated stack frame. Word element 1 of the SP register contains the number of bytes of Available Stack Space. |
R2 | Volatile | Environment pointer. This register is used as an environment pointer for languages that require one. |
R3-R74 | Volatile | First 72 quadwords of a function’s argument list and its return value. |
R75-R79 | Volatile | Scratch Registers. |
R80-R127 | Non-volatile | Local variable registers. These must be preserved across function calls. |
Stack Frame Layout
In addition to using registers, each function call may have a stack frame on the runtime stack. The runtime stack grows downward from high addresses.
+-----------------------------+ High Address +--->| Back Chain | | +-----------------------------+ | | Register Argument Save Area | | +-----------------------------+ | | General Register Save Area | | | (max. 48 * 16 bytes) | | +-----------------------------+ | | Local Variable Space | | +-----------------------------+ | | Parameter List Area | | +-----------------------------+ | | Link Register Save Area | | +-----------------------------+ +----| Back Chain | +-----------------------------+ Low Address <---- Stack Pointer (SP/R1) <--------- 128 bits ---------->
In the above figure, SP denotes the stack pointer (word element 0 of the general-purpose register R1) of the called function after it has executed the code that establishes its stack frame.
Argument Passing
For the SPU, up to 72 quadwords are passed in general-purpose registers, loaded sequentially into registers R3 through R74. If fewer than 72 argument registers are needed, the unneeded registers are not loaded, and any values that they contain when entering the called function are undefined. When arguments passed to a callee function will not fit into these 72 registers, the caller function must allocate additional space for these arguments in its Parameter List Area.
Program Initialization
When an SPU program is first entered, the contents of register R1 (SP) are initialized to the top of the stack. Generally, the top of the stack is a minimal stack located at the largest quadword address. A system with 256 KB of local storage initializes the stack pointer to 0x3FFD0. This address contains a Back Chain pointer to 0x3FFF0. The Back Chain pointer at 0x3FFF0 contains a NULL (0) pointer. Space is allocated for the entry function to save the Link Register (address 0x3FFE0). The contents of all other registers are unspecified. Thus, if a program requires registers to have specified values, it must explicitly set them.
+----------------------------+ +--->| Back Chain Pointer 0x0 | 0x3FFF0 | +----------------------------+ | | Link Register Save Area | 0x3FFE0 | +----------------------------+ +----| Back Chain Pointer 0x3FFF0 | 0x3FFD0 +----------------------------+ <------- Initial Stack Pointer | | ~ ~
SPU Assembly Language Specification
All the informations are taken from this PDF.
Notation and Conventions
Notation/Convention | Meaning |
ch | Channel number. Channels are specified as either $ch followed by a channel number (for example, $ch3) or a specific channel mnemonic. |
ra, rb, rc | Source register. Registers are specified as a dollar symbol ($) followed by a register number from 0 to127. For example, $38 refers to register 38. |
rt | Target register. Registers are specified as a dollar symbol ($) followed by a register number from 0 to127. For example, $38 refers to register 38. |
s3, s6 | 3-bit or 6-bit signed value, respectively. Encoded as a 7-bit signed immediate in which only a subset of the bits is used. |
s7 | 7-bit sign-extended value. |
s10 | 10-bit sign-extended value. |
s11 | 11-bit sign-extended value. |
s14 | 14-bit sign-extended value. |
s16 | 16-bit sign-extended value. |
s18 | Relative address computations. |
scale7 | 7-bit scale exponent. Values range from 0 to 127. |
spr | Special purpose register. |
u3, u5, u6 | 3-bit, 5-bit, or 6-bit unsigned value, respectively. Encoded as a 7-bit unsigned immediate in which only a subset of the bits is used. |
u7 | Unsigned 7-bit value. |
u14 | Unsigned 14-bit value. |
u16 | Unsigned 16-bit value. |
u18 | Unsigned 18-bit value. |
Instruction Set
Instruction/Usage | Description |
a rt, ra, rb | Add word. Each word element of register ra is added to the corresponding word element of register rb, and the results are placed in the corresponding word elements of register rt. |
absdb rt, ra, rb | Absolute difference of bytes. Each byte element of register ra is subtracted from the corresponding byte element of register rb. The absolute values of the results are placed in the corresponding elements of register rt. |
addx rt, ra, rb | Add word extended. Each word element of register ra, the corresponding word element of register rb, and the least significant bit of the corresponding word element of register rt are added, and the results are placed in the corresponding word elements of register rt. |
ah rt, ra, rb | Add halfword. Each halfword element of register ra is added to the corresponding halfword element of register rb, and the results are placed in the corresponding halfword elements of register rt. |
ahi rt, ra, s10 | Add halfword immediate. The sign-extended immediate value s10 is added to each halfword element of register ra, and the results are placed in the corresponding halfword elements of register rt. |
ai rt, ra, s10 | Add word immediate. The sign-extended immediate value s10 is added to each word elements of register ra, and the results are placed in the corresponding word elements of register rt. |
and rt, ra, rb | And. The value of register ra is logically ANDed with register rb, and the result is placed in register rt. |
andbi rt, ra, s10 | And byte immediate. The 8 least significant bits of s10 are logically ANDed with each byte element of register ra, and the results are placed in the corresponding elements of register rt. |
andc rt, ra, rb | And with complement. The value of register ra is logically ANDed with the complement of register rb, and the result is placed in register rt. |
andhi rt, ra, s10 | And halfword immediate. The sign-extended immediate value s10 is logically ANDed with each halfword element of register ra, and the results are placed in the corresponding elements of register rt. |
andi rt, ra, s10 | And word immediate. The sign-extended immediate value s10 is logically ANDed with each word element of register ra, and the results are placed in the corresponding elements of register rt. |
avgb rt, ra, rb | Average bytes. The corresponding byte elements of registers ra and rb are averaged ((a+b+1) >> 1), and the results are placed in the corresponding byte elements of register rt. |
bg rt, ra, rb | Borrow generate word. Each unsigned word element of register ra is compared to the corresponding unsigned word element of rb. If the value of ra is greater than that of rb, a 0 is placed in the corresponding element of rt; otherwise, a 1 is placed there. |
Please help to fill out! | Please help to fill out! |