Synergistic Processing Unit (SPU): Difference between revisions
(→Instruction Set: add reference with pipeline info) |
mNo edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
== SPU Application Binary Interface Specification == | == SPU Application Binary Interface Specification == | ||
All the informations are taken from this [https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/02E544E65760B0BF87257060006F8F20/$file/SPU_ABI-Specification_1.9.pdf PDF]. | All the informations are taken from this [https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/02E544E65760B0BF87257060006F8F20/$file/SPU_ABI-Specification_1.9.pdf PDF]. | ||
Line 576: | Line 575: | ||
IBM: | IBM: | ||
*[http://www.ibm.com/developerworks/power/library/pa-tacklecell2/ The little broadband engine that could: Mailboxes and interrupts] Uncover two means of communication between the SPE and the PPE -- mailboxes and signal notification. | *[http://www.ibm.com/developerworks/power/library/pa-tacklecell2/ The little broadband engine that could: Mailboxes and interrupts] Uncover two means of communication between the SPE and the PPE -- mailboxes and signal notification. | ||
* http://cell.scei.co.jp/pdf/CBE_Public_Registers_v15.pdf page (incl. SPE "Problem state memory map", "Privilege 1 memory map", "Privilege 2 memory map") | |||
VPOS: | |||
*http://djlee.org:8080/trac/LabWorks/browser/VPOS/working-sources/include/asm-cell/cell_memory_map_spe.h?rev=79 | |||
*http://djlee.org:8080/trac/LabWorks/browser/VPOS/working-sources/include/asm-cell/cell_memory_map.h?rev=79 | |||
{{Development}}<noinclude>[[Category:Main]]</noinclude> |
Latest revision as of 21:04, 5 February 2014
SPU Application Binary Interface Specification[edit | edit source]
All the informations are taken from this PDF.
Function Calling Sequence[edit | edit source]
The standard calling sequence requirements apply only to global functions. Local functions that are not reachable from other compilation units may use different conventions; however, using non-standard calling sequences is not recommended.
Register Usage[edit | edit source]
Register | Status | Usage |
---|---|---|
R0 (LR) | Dedicated | Return Address / Link Register. This register contains the address to which a called function normally returns. It is volatile across function calls and must be saved by a non-leaf function. |
R1 (SP) | Dedicated | Stack pointer information. Word element 0 of the SP register contains the current stack pointer. The stack pointer is always 16-byte aligned, and it must always point to the lowest allocated valid stack frame and grow towards low addresses. The contents of the word at the stack-frame address always point to the previous allocated stack frame. Word element 1 of the SP register contains the number of bytes of Available Stack Space. |
R2 | Volatile | Environment pointer. This register is used as an environment pointer for languages that require one. |
R3-R74 | Volatile | First 72 quadwords of a function’s argument list and its return value. |
R75-R79 | Volatile | Scratch Registers. |
R80-R127 | Non-volatile | Local variable registers. These must be preserved across function calls. |
Stack Frame Layout[edit | edit source]
In addition to using registers, each function call may have a stack frame on the runtime stack. The runtime stack grows downward from high addresses.
+-----------------------------+ High Address +--->| Back Chain | | +-----------------------------+ | | Register Argument Save Area | | +-----------------------------+ | | General Register Save Area | | | (max. 48 * 16 bytes) | | +-----------------------------+ | | Local Variable Space | | +-----------------------------+ | | Parameter List Area | | +-----------------------------+ | | Link Register Save Area | | +-----------------------------+ +----| Back Chain | +-----------------------------+ Low Address <---- Stack Pointer (SP/R1) <--------- 128 bits ---------->
In the above figure, SP denotes the stack pointer (word element 0 of the general-purpose register R1) of the called function after it has executed the code that establishes its stack frame.
Argument Passing[edit | edit source]
For the SPU, up to 72 quadwords are passed in general-purpose registers, loaded sequentially into registers R3 through R74. If fewer than 72 argument registers are needed, the unneeded registers are not loaded, and any values that they contain when entering the called function are undefined. When arguments passed to a callee function will not fit into these 72 registers, the caller function must allocate additional space for these arguments in its Parameter List Area.
Program Initialization[edit | edit source]
When an SPU program is first entered, the contents of register R1 (SP) are initialized to the top of the stack. Generally, the top of the stack is a minimal stack located at the largest quadword address. A system with 256 KB of local storage initializes the stack pointer to 0x3FFD0. This address contains a Back Chain pointer to 0x3FFF0. The Back Chain pointer at 0x3FFF0 contains a NULL (0) pointer. Space is allocated for the entry function to save the Link Register (address 0x3FFE0). The contents of all other registers are unspecified. Thus, if a program requires registers to have specified values, it must explicitly set them.
+----------------------------+ +--->| Back Chain Pointer 0x0 | 0x3FFF0 | +----------------------------+ | | Link Register Save Area | 0x3FFE0 | +----------------------------+ +----| Back Chain Pointer 0x3FFF0 | 0x3FFD0 +----------------------------+ <------- Initial Stack Pointer | | ~ ~
SPU Assembly Language Specification[edit | edit source]
All the informations are taken from SPU_Assembly_Language_Specification_1.7.pdf
Notation and Conventions[edit | edit source]
Notation/Convention | Meaning |
---|---|
ch | Channel number. Channels are specified as either $ch followed by a channel number (for example, $ch3) or a specific channel mnemonic. |
ra, rb, rc | Source register. Registers are specified as a dollar symbol ($) followed by a register number from 0 to127. For example, $38 refers to register 38. |
rt | Target register. Registers are specified as a dollar symbol ($) followed by a register number from 0 to127. For example, $38 refers to register 38. |
s3, s6 | 3-bit or 6-bit signed value, respectively. Encoded as a 7-bit signed immediate in which only a subset of the bits is used. |
s7 | 7-bit sign-extended value. |
s10 | 10-bit sign-extended value. |
s11 | 11-bit sign-extended value. |
s14 | 14-bit sign-extended value. |
s16 | 16-bit sign-extended value. |
s18 | Relative address computations. |
scale7 | 7-bit scale exponent. Values range from 0 to 127. |
spr | Special purpose register. |
u3, u5, u6 | 3-bit, 5-bit, or 6-bit unsigned value, respectively. Encoded as a 7-bit unsigned immediate in which only a subset of the bits is used. |
u7 | Unsigned 7-bit value. |
u14 | Unsigned 14-bit value. |
u16 | Unsigned 16-bit value. |
u18 | Unsigned 18-bit value. |
Instruction Set[edit | edit source]
Short reference with pipeline information
Instruction/Usage | Description |
---|---|
a, rt, ra, rb | Add word. Each word element of register ra is added to the corresponding word element of register rb, and the results are placed in the corresponding word elements of register rt. |
absdb rt, ra, rb | Absolute difference of bytes. Each byte element of register ra is subtracted from the corresponding byte element of register rb. The absolute values of the results are placed in the corresponding elements of register rt. |
addx rt, ra, rb | Add word extended. Each word element of register ra, the corresponding word element of register rb, and the least significant bit of the corresponding word element of register rt are added, and the results are placed in the corresponding word elements of register rt. |
ah rt, ra, rb | Add halfword. Each halfword element of register ra is added to the corresponding halfword element of register rb, and the results are placed in the corresponding halfword elements of register rt. |
ahi rt, ra, s10 | Add halfword immediate. The sign-extended immediate value s10 is added to each halfword element of register ra, and the results are placed in the corresponding halfword elements of register rt. |
ai rt, ra, s10 | Add word immediate. The sign-extended immediate value s10 is added to each word elements of register ra, and the results are placed in the corresponding word elements of register rt. |
and rt, ra, rb | And. The value of register ra is logically ANDed with register rb, and the result is placed in register rt. |
andbi rt, ra, s10 | And byte immediate. The 8 least significant bits of s10 are logically ANDed with each byte element of register ra, and the results are placed in the corresponding elements of register rt. |
andc rt, ra, rb | And with complement. The value of register ra is logically ANDed with the complement of register rb, and the result is placed in register rt. |
andhi rt, ra, s10 | And halfword immediate. The sign-extended immediate value s10 is logically ANDed with each halfword element of register ra, and the results are placed in the corresponding elements of register rt. |
andi rt, ra, s10 | And word immediate. The sign-extended immediate value s10 is logically ANDed with each word element of register ra, and the results are placed in the corresponding elements of register rt. |
avgb rt, ra, rb | Average bytes. The corresponding byte elements of registers ra and rb are averaged ((a+b+1) >> 1), and the results are placed in the corresponding byte elements of register rt. |
bg rt, ra, rb | Borrow generate word. Each unsigned word element of register ra is compared to the corresponding unsigned word element of rb. If the value of ra is greater than that of rb, a 0 is placed in the corresponding element of rt; otherwise, a 1 is placed there. |
bgx rt, ra, rb | Borrow generate word extended. Each word element of register ra is subtracted from the corresponding word element of register rb. An additional 1 is subtracted from the result if the least significant bit of word element rt is 0. If the result is less than 0, a 0 is placed in the corresponding element of register rt; otherwise, a 1 is placed there. |
bi ra | Branch indirect. Execution proceeds with the instruction at the address specified by word element 0 of register ra. The two least significant bits of the address are ignored. |
bid ra | Branch indirect, disable. Execution proceeds with the instruction at the address specified by word element 0 of register ra, and interrupts are disabled. The two least significant bits of this address are ignored. |
bie ra | Branch indirect, enable. Execution proceeds with the instruction at the address specified by word element 0 of register ra, and interrupts are enabled. The two least significant bits of the address are ignored. |
bihnz rc, ra | Branch indirect if not zero halfword. If halfword element 1 of register rc is 0, execution proceeds with the next sequential instruction; otherwise, execution proceeds at the address in word element 0 of register ra. The two least significant bits of this address are ignored. |
bihnzd rc, ra | Branch indirect if not zero halfword, disable. If halfword element 1 of register rc is 0, execution proceeds with the next sequential instruction; otherwise, the branch is taken, and execution proceeds at the address in word element 0 of register ra. The two least significant bits of this address are ignored. If the branch is taken, interrupts are disabled; otherwise, the interrupt enable state remains unchanged. |
bihnze rc, ra | Branch indirect if not zero halfword, enable. If halfword element 1 of register rc is 0, execution proceeds with the next sequential instruction; otherwise, the branch is taken, and execution proceeds at the address in word element 0 of register ra. The two least significant bits of this address are ignored. If the branch is taken, interrupts are enabled; otherwise, the interrupt enable state remains unchanged. |
bihz rc, ra | Branch indirect if zero halfword. If halfword element 1 of register rc is 0, execution proceeds at the address in word element 0 of register ra. The two least significant bits of this address are ignored. Otherwise, if the element rc is nonzero, execution proceeds with the next sequential instruction. |
bihzd rc, ra | Branch indirect if zero halfword, disable. If halfword element 1 of register rc is 0, the branch is taken, and execution proceeds at the address in word element 0 of register ra. The two least significant bits of this address are ignored. Otherwise, execution proceeds with the next sequential instruction. If the branch is taken, interrupts are disabled; otherwise, the interrupt enable state remains unchanged. |
bihze rc, ra | Branch indirect if zero halfword, enable. If halfword element 1 of register rc is 0, the branch is taken, and execution proceeds at the address in word element 0 of register ra. The two least significant bits of this address are ignored. Otherwise, if the element rc is nonzero, execution proceeds with the next sequential instruction. If the branch is taken, interrupts are enabled; otherwise, the interrupt enable state remains unchanged. |
binz rc, ra | Branch indirect if not zero word. If word element 0 of register rc is 0, execution proceeds with the next sequential instruction; otherwise, execution proceeds at the address in word element 0 of register ra. The two least significant bits of this address are ignored. |
binzd rc, ra | Branch indirect if not zero word, disable. If word element 0 of register rc is 0, execution proceeds with the next sequential instruction; otherwise, the branch is taken, and execution proceeds at the address in word element 0 of register ra. The two least significant bits of this address are ignored. If the branch is taken, interrupts are disabled; otherwise, the interrupt enable state remains unchanged. |
binze rc, ra | Branch indirect if not zero word, enable. If word element 0 of register rc is 0, execution proceeds with the next sequential instruction; otherwise, the branch is taken, and execution proceeds at the address in word element 0 of register ra. The two least significant bits of this address are ignored. If the branch is taken, interrupts are enabled; otherwise, the interrupt enable state remains unchanged. |
bisl rt, ra | Branch indirect and set link. The effective address of the next instruction is taken from word element 0 of register ra. The two least significant bits of this address are ignored. The address of the instruction following this instruction is placed into word element 0 of register rt, and all other word elements of rt are assigned a value of zero. |
bisld rt, ra | Branch indirect and set link, disable. The effective address of the next instruction is taken from word element 0 of register ra. The 2 least significant bits of this address are ignored. The address of the instruction following this instruction is placed into word element 0 of register rt, and all other word elements of rt are assigned a value of zero. Interrupts are also disabled. |
bisle rt, ra | Branch indirect and set link, enable. The effective address of the next instruction is taken from word element 0 of register ra. The two least significant bits of this address are ignored. The address of the instruction following this instruction is placed into word element 0 of register rt, and all other word elements of rt are assigned a value of zero. Interrupts are also enabled. |
bisled rt, ra | Branch indirect and set link on external data. The address of the instruction following this instruction is placed in word element 0 of register rt, and all other elements of register rt are assigned a value of zero. If the count of channel 0 is nonzero, execution continues at the effective address in word element 0 of register ra. The two least significant bits of this address are ignored. If the count of channel 0 is zero, execution continues with the next sequential instruction. |
bisledd rt, ra | Branch indirect and set link on external data, disable. The address of the instruction following this instruction is placed in word element 0 of register rt, and all other elements of register rt are assigned a value of zero. If the count of channel 0 is nonzero, the branch is taken, and execution continues at the effective address in word element 0 of register ra. The two least significant bits of this address are ignored. If the count of channel 0 is zero, execution continues with the next sequential instruction. If the branch is taken, interrupts are disabled; otherwise, the interrupt enable state remains unchanged. |
bislede rt, ra | Branch indirect and set link on external data, enable. The address of the instruction following this instruction is placed in word element 0 of register rt, and all other elements of register rt are assigned a value of zero. If the count of channel 0 is nonzero, the branch is taken, and execution continues at the effective address in word element 0 of register ra. The two least significant bits of this address are ignored. If the count of channel 0 is zero, execution continues with the next sequential instruction. If the branch is taken, interrupts are enabled; otherwise, the interrupt enable state remains unchanged. |
biz rc, ra | Branch indirect if zero word. If word element 0 of register rc is zero, execution proceeds at the effective address in word element 0 of register ra. The two least significant bits of this address are ignored. If word element 0 of rc is nonzero, execution proceeds with the next sequential instruction. |
bizd rc, ra | Branch indirect if zero word, disable. If word element 0 of register rc is zero, the branch is taken, and execution proceeds at the effective address in word element 0 of register ra. The two least significant bits of this address are ignored. If word element 0 of rc is nonzero, execution proceeds with the next sequential instruction. If the branch is taken, interrupts are disabled; otherwise, the interrupt enable state remains unchanged. |
bize rc, ra | Branch indirect if zero word, enable. If word element 0 of register rc is zero, the branch is taken, and execution proceeds at the effective address in word element 0 of register ra. The two least significant bits of this address are ignored. If word element 0 of rc is nonzero, execution proceeds with the next sequential instruction. If the branch is taken, interrupts are enabled; otherwise, the interrupt enable state remains unchanged. |
br s18 | Branch relative. Execution proceeds with the instruction addressed by the sum of the current instruction address and the sign-extended value of s18. The two least significant bits of s18 are ignored. |
bra s18 | Branch absolute. Execution proceeds with the instruction addressed by the sign-extended value of s18. The two least significant bits of s18 are ignored. |
brasl rt, s18 | Branch absolute and set link. Execution proceeds with the instruction addressed by the sign-extended value of s18. The two least significant bits of s18 are ignored. The instruction following the current instruction is placed in word element 0 of register rt, and all other elements of rt are assigned a value of zero. |
brhnz rc, s18 | Branch if not zero halfword. If the halfword element 1 of register rc is nonzero, execution proceeds with the instruction addressed by the sum of the current instruction address and the sign-extended value of s18. The two least significant bits of s18 are ignored. If halfword element 1 of rc is zero, execution proceeds with the next sequential instruction. |
brhz rc, s18 | Branch if zero halfword. If the halfword element 1 of register rc is zero, execution proceeds with the instruction addressed by the sum of the current instruction address and the sign-extended value of s18. The 2 least significant bits of s18 are ignored. If the halfword element 1 of register rc is nonzero, execution proceeds with the next sequential instruction. |
brnz rc, s18 | Branch if not zero word. If the word element 0 of register rc is nonzero, execution proceeds with the instruction addressed by the sum of the current instruction address and the sign-extended value of s18. The two least significant bits of s18 are ignored. If word element 0 of register rc is zero, execution proceeds with the next sequential instruction. |
brsl rt, s18 | Branch relative and set link. Execution proceeds with the instruction addressed by the sum of the current instruction address and the sign-extended value of s18. The two least significant bits of s18 are ignored. The instruction following the current instruction is placed in word element 0 of register rt, and all other elements of rt are assigned a value of zero. |
brz rc, s18 | Branch if zero word. If the word element 0 of register rc is zero, execution proceeds with the instruction addressed by the sum of the current instruction address and the sign-extended value of s18. The 2 least significant bit of s18 are ignored. If word element 0 of register rc is nonzero, execution proceeds with the following instruction. |
cbd rt, u7(ra) | Generate controls for byte insertion (d-form). A control mask is generated that can be used by the shufb instruction to insert a byte at the effective address computed by the sum of register ra and the unsigned value u7. The control mask is placed in register rt. |
cbx rt, ra, rb | Generate controls for byte insertion (x-form). A control mask is generated that can be used by the shufb instruction to insert a byte at the effective address computed by the sum of registers ra and rb. The control mask is placed in register rt. |
cdd rt, u7(ra) | Generate controls for doubleword insertion (d-form). A control mask is generated that can be used by the shufb instruction to insert a doubleword at the effective address computed by the sum of register ra and unsigned value u7. The control mask is placed in register rt. |
cdx rt, ra, rb | Generate controls for doubleword insertion (x-form). A control mask is generated that can be used by the shufb instruction to insert a doubleword at the effective address computed by the sum of registers ra and rb. The control mask is placed in register rt. |
ceq rt, ra, rb | Compare equal word. Each word element of register ra is compared with the corresponding word element of register rb. If the two elements are equal, all ones are placed in the corresponding word element of register rt. Otherwise, if the two elements are not equal, zero is placed in the corresponding word element of register rt. |
ceqb rt, ra, rb | Compare equal byte. Each byte element of register ra is compared with the corresponding byte element of register rb. If the two elements are equal, all ones are placed in the corresponding byte element of register rt. Otherwise, if the elements are not equal, zero is placed in the corresponding byte element of register rt. |
ceqbi rt, ra, s10 | Compare equal byte immediate. Each byte element of register ra is compared with the eight least significant bits of s10. If the two values are equal, all ones are placed in the corresponding byte element of register rt. Otherwise, if the values are not equal, zero is placed in the corresponding byte element of register rt. |
ceqh rt, ra, rb | Compare equal halfword. Each halfword element of register ra is compared with the corresponding halfword element of register rb. If the two elements are equal, all ones are placed in the corresponding halfword element of register rt. Otherwise, if the elements are not equal, zero is placed in the corresponding halfword element of register rt. |
ceqhi rt, ra, s10 | Compare equal halfword immediate. Each halfword element of register ra is compared with the 16-bit sign-extended value s10. If the two values are equal, all ones are placed in the corresponding halfword element of register rt. Otherwise, if the values are not equal, zero is placed in the corresponding halfword element of register rt. |
ceqi rt, ra, s10 | Compare equal word immediate. Each word element of register ra is compared with the 32-bit sign-extended value s10. If the two values are equal, all ones are placed in the corresponding word element of register rt. Otherwise, if the values are not equal, zero is placed in the corresponding word element of register rt. |
cflts rt, ra, scale7 | Convert floating to signed integer. Each floating-point element of register ra is multiplied by 2scale7, converted to a signed 32-bit integer, and placed in the corresponding word element of register rt. Values outside of the range from 231 to 231-1 are clamped (saturated to the nearest bound). |
cfltu rt, ra, scale7 | Convert floating to unsigned integer. Each floating-point element of register ra is multiplied by 2scale7, converted to an unsigned 32-bit integer, and placed in the corresponding word elements of register rt. Values outside of the range from 0 to 232-1 are clamped (saturated to the nearest bound). |
cg rt, ra, rb | Carry generate word. Each word element of register ra is added to the corresponding word element of register rb. The carry out is placed in the least significant bit of the corresponding word element of register rt, and 0 is placed in the remaining bits of rt. |
cgt rt, ra, rb | Compare greater than word. Each word element of register ra is compared with the corresponding word element of register rb. If the word in ra is greater than the corresponding word in rb, all ones are placed in the corresponding word element of register rt. Otherwise, if the word in ra is less than or equal to the corresponding word in rb, zeros are placed in the corresponding word element of register rt. |
cgtb rt, ra, rb | Compare greater than byte. Each byte element of register ra is compared with the corresponding byte element of register rb. If the byte in ra is greater than the corresponding byte in rb, all ones are placed in the corresponding byte element of register rt. Otherwise, if the byte in ra is less than or equal to the corresponding byte in rb, zeros are placed in the corresponding byte element of register rt. |
cgtbi rt, ra, s10 | Compare greater than byte immediate. Each byte element of register ra is compared with the eight least significant bits of s10. If the byte in ra is greater than the corresponding byte in rb, all ones are placed in the corresponding byte element of register rt. Otherwise, if the byte in ra is less than or equal to the corresponding byte in rb, zeros are placed in the corresponding byte element of register rt. |
cgth rt, ra, rb | Compare greater than halfword. Each halfword element of register ra is compared with the corresponding halfword element of register rb. If the halfword in ra is greater than the corresponding halfword in rb, all ones are placed in the corresponding halfword element of register rt. Otherwise, if the halfword in rb is less than or equal to the corresponding halfword in rb, zeros are placed in the corresponding halfword element of register rt. |
cgthi rt, ra, s10 | Compare greater than halfword immediate. Each halfword element of register ra is compared with the 16-bit sign-extended value s10. If the halfword in ra is greater than s10, all ones are placed in the corresponding halfword element of register rt. Otherwise, if the halfword in ra is less than or equal to s10, zeros are placed in the corresponding halfword element of register rt. |
cgti rt, ra, s10 | Compare greater than word immediate. Each word element of register ra is compared with the 32-bit sign-extended value s10. If the word in ra is greater than s10, all ones are placed in the corresponding word element of register rt. Otherwise, if the word in ra is less than or equal to s10, zeros are placed in the corresponding word element of register rt. |
cgx rt, ra, rb | Carry generate word extended. For each word element in registers ra and rb, a carry out is generated by summing the element of register ra, the corresponding element of rb, and the least significant bit of rt. The carry out is placed in the least significant bit of the corresponding word element of rt, and zeros are placed in the remaining bits. |
chd rt, u7(ra) | Generate controls for halfword insertion (d-form). A control mask is generated that can be used by the shufb instruction to insert a halfword at the effective address computed by the sum of register ra and the unsigned value u7. The control mask is placed in register rt. |
chx rt, ra, rb | Generate controls for halfword insertion (x-form). A control mask is generated that can be used by the shufb instruction to insert a halfword at the effective address computed by the sum of registers ra and rb. The control mask is placed in register rt. |
clgt rt, ra, rb | Compare logical greater than word. Each word element of register ra is logically compared with the corresponding word element of register rb. If the word in ra is greater than the corresponding word in rb, all ones are placed in the corresponding word element of register rt. Otherwise, if the word in ra is less than or equal to the corresponding word in rb, zeros are placed in the corresponding word element of register rt. |
clgtb rt, ra, rb | Compare logical greater than byte. Each byte element of register ra is logically compared with the corresponding byte element of register rb. If the byte in ra is greater than the corresponding byte in rb, all ones are placed in the corresponding byte element of register rt. Otherwise, if the byte in ra is less than or equal to the corresponding byte in rb, zeros are placed in the corresponding byte element of register rt. |
clgtbi rt, ra, s10 | Compare logical greater than byte immediate. Each byte element of register ra is logically compared with the 8 least significant bits of s10. If the byte in ra is greater than the value in s10, all ones are placed in the corresponding byte element of register rt. Otherwise, if the byte in ra is less than or equal to the byte in s10, zeros are placed in the corresponding byte element of register rt. |
clgth rt, ra, rb | Compare logical greater than halfword. Each halfword element of register ra is logically compared with the corresponding halfword element of register rb. If the halfword in ra is greater than the corresponding halfword in rb, all ones are placed in the corresponding halfword element of register rt. Otherwise, if the halfword in ra is less than or equal to the corresponding halfword in rb, zeros are placed in the corresponding halfword element of register rt. |
clgthi rt, ra, s10 | Compare logical greater than halfword immediate. Each halfword element of register ra is logically compared with the 16-bit sign-extended value s10. If the halfword in ra is greater than the value in s10, all ones are placed in the corresponding halfword element of register rt. Otherwise, if the halfword in ra is less than or equal to the value in s10, zeros are placed in the corresponding halfword element of register rt. |
clgti rt, ra, s10 | Compare logical greater than word immediate. Each word element of register ra is logically compared with the 32-bit sign-extended value s10. If the word in ra is greater than the value in s10, all ones are placed in the corresponding word element of register rt. Otherwise, if the word element in ra is less than or equal to the value in s10, zeros are placed in the corresponding word element of register rt. |
clz rt, ra | Count leading zeros. The number of zeros to the left of the first 1 in each word element of register ra is counted, and the resulting count is placed in the corresponding element of register rt. |
cntb rt, ra | Count ones in bytes. The number of ones in each byte element of register ra is counted, and the resulting count is placed in the corresponding element of register rt. |
csflt rt, ra, scale7 | Convert signed integer to floating. Each signed word element of register ra is converted to floating-point, multiplied by 2-scale7, and placed in the corresponding floating-point elements of register rt. |
cuflt rt, ra, scale7 | Convert unsigned integer to floating. Each unsigned word element of register ra is converted to floating-point, multiplied by 2-scale7, and placed in the corresponding floating point elements of register rt. |
cwd rt, u7(ra) | Generate controls for word insertion (d-form). A control mask is generated that can be used by the shufb instruction to insert a word at the effective address computed by the sum of register ra and the unsigned value u7. The control mask is placed in register rt. |
cwx rt, ra, rb | Generate controls for word insertion (x-form). A control mask is generated that can be used by the shufb instruction to insert a word at the effective address computed by the sum of registers ra and rb. The control mask is placed in register rt. |
dfa rt, ra, rb | Double floating add. Each double floating-point element of register ra is added to the corresponding double floating-point element of register rb, and the results are placed in the corresponding elements of register rt. |
dfm rt, ra, rb | Double floating multiply. Each double floating-point element of register ra is multiplied by the corresponding double floating-point element of register rb, and the results are placed in the corresponding elements of register rt. |
dfma rt, ra, rb | Double floating multiply and add. Each double floating-point element of register ra is multiplied by the corresponding double floating-point element of register rb, and the corresponding double floating-point element of register rt is then added to the product. The results are placed in the corresponding elements of register rt. |
dfms rt, ra, rb | Double floating multiply and subtract. Each double floating-point element of register ra is multiplied by the corresponding double floating-point element of register rb, and the corresponding double floating-point element of register rt is subtracted from the product. The results are placed in the corresponding elements of register rt. |
dfnma rt, ra, rb | Double floating negative multiply and add. Each double floating-point element of register ra is multiplied by the corresponding double floating-point element of register rb, and the corresponding double floating-point element of register rt is added to the product. Each result is negated and placed in the corresponding element of register rt. |
dfnms rt, ra, rb | Double floating negative multiply and subtract. Each double floating-point element of register ra is multiplied by the corresponding double floating-point element of register rb, and the product is subtracted from the corresponding double floating-point element of register rt. The results are placed in corresponding elements of register rt. |
dfs rt, ra, rb | Double floating subtract. Each double floating-point element of register rb is subtracted from the corresponding double floating-point element of register ra, and the results are placed in the corresponding elements of register rt. |
dsync | Synchronize data. All pending store operations to local storage memory are completed before the processor proceeds to the next instruction. |
eqv rt, ra, rb | Equivalent. The value in register ra is logically exclusive ORed with the value in register rb, and the complement of the result is placed in register rt. |
fa rt, ra, rb | Floating add. Each floating-point element of register ra is added to the corresponding floating-point element of register rb, and the results are placed in the corresponding elements of register rt. |
fceq rt, ra, rb | Floating compare equal. Each floating-point element of register ra is compared with the corresponding floating-point element of register rb. If the two elements are equal, all ones are placed in the corresponding word element of register rt. Otherwise, if they are not equal, zeros are placed in the corresponding word element of register rt. |
fcgt rt, ra, rb | Floating compare greater than. Each floating-point element of register ra is compared with the corresponding floating-point element of register rb. If the element in ra is greater than the corresponding element in rb, all ones are placed in the corresponding word element of register rt. Otherwise, if the element in ra is less than or equal to the corresponding element in rb, zeros are placed in the corresponding word element of register rt. |
fcmeq rt, ra, rb | Floating compare magnitude equal. The absolute value of each floating-point element of register ra is compared with the absolute value of the corresponding floating-point element of register rb. If the elements are equal, all ones are placed in the corresponding word element of register rt. Otherwise, if they are not equal, zeros are placed in the corresponding word elements of register rt. |
fcmgt rt, ra, rb | Floating compare greater than. The absolute value of each floating-point element of register ra is compared with the absolute value of the corresponding floating-point element of register rb. If the value in ra is greater than the corresponding value in rb, all ones are placed in the corresponding word element of register rt. Otherwise, if the value for ra is less than or equal to the corresponding value for rb, zeros are placed in the corresponding word element of register rt. |
fesd rt, ra | Floating extend single to double. Each even single precision floating-point element of register ra is converted to double precision and then placed in the corresponding element of register rt. |
fi rt, ra, rb | Floating interpolate. Each floating-point element of register ra is interpolated to produce a more accurate estimate, using the base and step contained in the corresponding element of register rb, where rb is in the output format of a frest or frsqest instruction. The interpolated result is placed in the corresponding element of register rt. |
fm rt, ra, rb | Floating multiply. Each floating-point element of register ra is multiplied by the corresponding floating-point element of register rb, and the products are placed in the corresponding elements of register rt. |
fma rt, ra, rb, rc | Floating multiply and add. Each floating-point element of register ra is multiplied by the corresponding floating-point element of register rb, and the corresponding floating-point element of register rc is then added to the product. The results are placed in corresponding elements of register rt. |
fms rt, ra, rb, rc | Floating multiply and subtract. Each floating-point element of register ra is multiplied by the corresponding floating-point element of register rb, and the corresponding floating-point element of register rc is subtracted from the product. The results are placed in the corresponding elements of register rt. |
fnms rt, ra, rb, rc | Floating negative multiply and subtract. Each floating-point element of register ra is multiplied by the corresponding floating-point element of register rb, and the product is subtracted from the corresponding floating-point element of register rc. The results are placed in the corresponding elements of register rt. |
frds rt, ra | Floating round double to single. Each double floating-point element of register ra is rounded to single precision and placed in the corresponding even element of register rt. At the same time, a zero is placed in the corresponding odd element of rt. |
frest rt, ra | Floating reciprocal estimate. A base and step is computed for estimating the reciprocal of each floating-point element of register ra, and the result is placed in the corresponding element of register rt. The result returned by this instruction is intended as an operand to the fi instruction. |
frsqest rt, ra | Floating reciprocal square root estimate. A base and step is computed for estimating the reciprocal of the square root for each floating-point element of register ra, and the result is placed in the corresponding element of register rt. The result returned by this instruction is intended as an operand to the fi instruction. |
fs rt, ra, rb | Floating subtract. Each floating-point element of register rb is subtracted from the corresponding floating-point element of register ra, and the results are placed in the corresponding elements of register rt. |
fscrrd rt | Floating-point status control register read. The contents of the Floating Point Status and Control Register (FPSCR) are read and placed in register rt. |
fscrwr ra | Floating-point status control register write. The 128-bit register ra is written into the Floating-Point Status and Control Register (FPSCR). Register rc is a false target and no value is ever written to it. If register rc is not specified, register 0 is used as the false target. |
fscrwr rc, ra | |
fsm rt, ra | Form select mask for words. The 4 least significant bits of word element 0 of register ra are used to create a mask by replicating each bit 32 times. The 128-bit result is returned in register rt. |
fsmb rt, ra | Form select mask for bytes. The 16 least significant bits of word element 0 of register ra are used to create a mask by replicating each bit 8 times. The 128 bit result is returned in register rt. |
fsmbi rt, u16 | Form select mask for byte immediate. The 16 bits of u16 are used to create a mask by replicating each bit 8 times. The 128-bit result is returned in register rt. |
fsmh rt, ra | Form select mask for halfwords. The 8 least significant bits of word element 0 of register ra are used to create a mask by replicating each bit 16 times. The 128-bit result is returned in register rt. |
gb rt, ra | Gather bits from words. A 4-bit value is formed by concatenating the least significant bit of each word element of register ra. The 4-bit value is then placed in the least significant bits of word element 0 of register rt, and zeros are placed in the remaining bits. |
gbb rt, ra | Gather bits from bytes. A 16-bit value is formed by concatenating the least significant bit of each byte element of register ra. The 16-bit value is then placed in the least significant bits of word element 0 of register rt, and zeros are placed in the remaining bits. |
gbh rt, ra | Gather bits from halfwords. An 8-bit value is formed by concatenating the least significant bit of each halfword element of register ra. The 8-bit value is then placed in the least significant bits of word element 0 of register rt, and zeros are placed in the remaining bits. |
hbr s11, ra | Hint for branch (r-form). An instruction prefetch is allowed to occur at the branch target address contained in word element 0 of register ra, for the branch instruction that is addressed by the sum of the address of this instruction and the sign-extended value s11. The two least significant bits of s11 are ignored. |
hbra s11, s18 | Hint for branch (a-form). An instruction prefetch is allowed to occur at the branch target address specified by the sign-extended value s18, for the branch instruction addressed by the sum of the address of this instruction and the sign-extended value s11. The two least significant bits of s11 and s18 are ignored. |
hbrp | Hint for branch, prefetch (r-form). A slot in the fetch unit is reserved for an in line prefetch. This instruction translates to an hbr instruction that has the P feature bit set. The field in the hbr instruction that contains the offset to the branch instruction is set to zero. |
hbrr s11, s18 | Hint for branch relative. An instruction prefetch is allowed to occur at the branch target that is addressed by the sum of the address of this instruction and the sign-extended value s18, for the branch instruction that is addressed by the sum of the address of this instruction and the sign-extended value s11. The two least significant bits of s18 and s11 are ignored. |
heq ra, rb | Halt if equal. If word element 0 of registers ra and rb are equal, the processor is halted. Register rt is a false target and is never written to. If register rt is not specified, register 0 is used as the false target. |
heq rt, ra, rb | |
heqi ra, s10 | Halt if equal immediate. If word element 0 of register ra equals the sign extended value of s10, the processor is halted. Register rt is a false target, and no value is ever written to it. If register rt is not specified, register 0 is used as the false target. |
heqi rt, ra, s10 | |
hgt ra, rb | Halt if greater than. If signed word element 0 of register ra is greater than word element 0 of register rb, the processor is halted. Register rt is a false target, and no value is ever written to it. If register rt is not specified, register 0 is used as the false target. |
hgt rt, ra, rb | |
hgti ra, s10 | Halt if greater than immediate. If signed word element 0 of register ra is greater than the sign-extended value s10, the processor is halted. Register rt is a false target, and no value is ever written to it. If register rt is not specified, register 0 is used as the false target. |
hgti rt, ra, s10 | |
hlgt ra, rb | Halt if logically greater than. If unsigned word element 0 of register ra is greater than unsigned word element 0 of register rb, the processor is halted. Register rt is a false target, and no value is ever written to it. If register rt is not specified, register 0 is used as the false target. |
hlgt rt, ra, rb | |
hlgti ra, s10 | Halt if logically greater than immediate. If unsigned word element 0 of register ra is logically greater than the sign-extended value s10, the processor is halted. Register rt is a false target, and no value is ever written to it. If register rt is not specified, register 0 is used as the false target. |
hlgti rt, ra, s10 | |
il rt, s16 | Immediate load word. The sign-extended value s16 is loaded into each of the word elements of rt. |
ila rt, u18 | Immediate load address. The unsigned value u18 is loaded into each of the word elements of rt. |
ilh rt, u16 | Immediate load halfword. The value u16 is loaded into each of the 8 halfword elements of rt. |
ilhu rt, u16 | Immediate load halfword upper. The value u16 is loaded into the 16 most significant bits of each of the 4 word elements of rt. |
iohl rt, u16 | Immediate OR halfword lower. Immediate OR the value u16 with each of the word elements of rt. |
iretd | Interrupt return, disable. Execution proceeds with the instruction addressed by machine state save/restore register 0 (SRR0). Interrupts are disabled. Register ra is a false source, and its contents are ignored. If ra is not specified, register 0 is used as a false source. |
iretd ra | |
irete | Interrupt return, enable. Execution proceeds with the instruction addressed by machine state save/restore register 0 (SRR0). Interrupts are enabled. Register ra is a false source, and its contents are ignored. If ra is not specified, register 0 is used as a false source. |
irete ra | |
iret | Interrupt return. Execution proceeds with the instruction addressed by machine state save/restore register 0 (SRR0). Register ra is a false source, and its contents are ignored. If ra is not specified, register 0 is used as a false source. |
iret ra | |
lnop | Nop operation (load). A no-operation is performed on the load pipeline. |
lqa rt, s18 | Load quadword (a-form). A quadword is loaded into register rt from the effective address specified by the sign-extended value s18. The two least significant bits of s18 are ignored. |
lqd rt, s14(ra) | Load quadword (d-form). A quadword is loaded into register rt from the effective address computed by the sum of register ra and the sign-extended value s14. The four least significant bits of s14 are ignored. |
lqr rt, s18 | Load quadword instruction relative (a-form). A quadword is loaded into register rt from the effective address specified by the sum of the current instruction address and s18. The two least significant bits of s18 are ignored. |
lqx rt, ra, rb | Load quadword (x-form). A quadword is loaded into register rt from the effective address computed by the sum of registers ra and rb. |
mfspr rt, spr | Move from special purpose register. The contents of the specified special purpose register spr are moved to the word element 0 of register rt. |
mpy rt, ra, rb | Multiply. The signed 16 least significant bits of the corresponding word elements of registers ra and rb are multiplied, and the 32-bit products are placed in the corresponding word elements of register rt. |
mpya rt, ra, rb, rc | Multiply and add. The signed 16 least significant bits of the corresponding word elements of registers ra and rb are multiplied, and the 32-bit products are then added to the corresponding word elements of register rc. The results are placed in the corresponding elements of register rt. |
mpyh rt, ra, rb | Multiply high. The most significant 16 bits of the word elements of register ra are multiplied by the 16 least significant bits of the corresponding elements of register rb. The 32-bit products are then shifted left by 16 bits and placed in the corresponding word elements of register rt. |
mpyhh rt, ra, rb | Multiply high high. The signed 16 most significant bits of the word elements of registers ra and rb are multiplied, and the 32-bit products are placed in the corresponding word elements of register rt. |
mpyhha rt, ra, rb | Multiply high high and add. The signed 16 most significant bits of the word elements of registers ra and rb are multiplied. The 32-bit products are then added to the corresponding word elements of register rt, and the sums are placed in register rt. |
mpyhhau rt, ra, rb | Multiply high high unsigned and add. The unsigned 16 most significant bits of the word elements of registers ra and rb are multiplied, and the 32-bit products are then added to the corresponding word elements of register rt, and the sums are placed in register rt. |
mpyhhu rt, ra, rb | Multiply high high unsigned. The unsigned 16 most significant bits of the word elements of registers ra and rb are multiplied, and the 32-bit products are then placed in the corresponding word elements of register rt. |
mpyi rt, ra, s10 | Multiply immediate. The 16 least significant bits of each of the word elements of register ra are multiplied by the sign-extended value s10. The 32-bit products are then placed in the corresponding word elements of register rt. |
mpys rt, ra, rb | Multiply and shift right. The most significant 16 bits of corresponding word elements of registers ra and rb are multiplied, and the 16 most significant bits of the 32-bit products are placed in the least significant bits of the corresponding word elements of register rt. |
mpyu rt, ra, rb | Multiply unsigned. The unsigned 16 least significant bits of the corresponding word elements of registers ra and rb are multiplied, and the 32-bit products are placed in the corresponding word elements of register rt. |
mpyui rt, ra, s10 | Multiply unsigned immediate. The 16 least significant bits of each of the word elements of register ra is multiplied by the sign extended value s10. Both operands are treated as unsigned. The 32-bit products are placed in the corresponding word elements of register rt. |
mtspr spr, ra | Move to special purpose register. The contents of word element 0 of register rt are moved to the special purpose register spr. |
nand rt, ra, rb | Nand. The value of register ra is logically ANDed with register rb, and the complement of the result is placed in register rt. |
nop | Nop operation (execute). A no-operation is performed on the execute pipeline. Register rt is a false target, and no value is ever written to it. If register rt is not specified, register 0 is used as the false target. |
nop rt | |
nor rt, ra, rb | Nor. The value of register ra is logically ORed with register rb, and the complement of the result is placed in register rt. |
or rt, ra, rb | Or. The value of register ra is logically ORed with register rb, and the result is placed in register rt. |
orbi rt, ra, s10 | Or byte immediate. The 8 least significant bits of s10 are logically ORed with each byte element of register ra, and the results are placed in the corresponding elements of register rt. |
orc rt, ra, rb | Or with complement. The value of register ra is logically ORed with the complement of register rb, and the result is placed in register rt. |
orhi rt, ra, s10 | Or halfword immediate. The sign-extended value s10 is logically ORed with each halfword element of register ra, and the results are placed in the corresponding elements of register rt. |
ori rt, ra, s10 | Or word immediate. The sign-extended value s10 is logically ORed with each word element of register ra, and the results are placed in the corresponding elements of register rt. |
orx rt, ra | Or word across. The four word elements of register ra are logically ORed, and the result is placed in word element 0 of register rt. Word elements 1, 2, and 3 of register rt are assigned a value of zero. |
rchcnt rt, ch | Read channel count. The channel count of the channel ch is read, and the count placed in register rt. |
rdch rt, ch | Read channel. The contents of the channel ch are read, and the contents placed in register rt. |
rot rt, ra, rb | Rotate word. The contents of each word element of register ra are rotated left according to the corresponding word element of register rb. The results are placed in the corresponding word elements of register rt. |
roth rt, ra, rb | Rotate halfword. The contents of each halfword element of register ra are rotated left according to the corresponding halfword element of register rb. The results are placed in the corresponding halfword elements of register rt. |
rothi rt, ra, s7 | Rotate halfword immediate. The contents of each halfword element of register ra are rotated left according to the 4 least significant bits of s7. The results are placed in the corresponding halfword elements of register rt. |
rothm rt, ra, rb | Rotate and mask halfword. The contents of each halfword element of register ra are right shifted according to the two's complement of the 5 least significant bits of the corresponding halfword element of register rb. The results are placed in the corresponding halfword elements of register rt. |
rothmi rt, ra, s6 | Rotate and mask halfword immediate. The contents of each halfword element of register ra are right shifted according to the two's complement of the signed value s6. The results are placed in the corresponding halfword elements of register rt. |
roti rt, ra, s7 | Rotate word immediate. The contents of each word element of register ra are rotated left according to the signed value s7. The results are placed in the corresponding word elements of register rt. |
rotm rt, ra, rb | Rotate and mask word. The contents of each word element of register ra are right shifted according to the two's complement of the 6 least significant bits of the corresponding word element of register rb. The results are placed in the corresponding word elements of register rt. |
rotma rt, ra, rb | Rotate and mask algebraic word. The contents of each word element of register ra are right shifted according to the two's complement of the 6 least significant bits of the corresponding word element of register rb. Copies of the sign bit are shifted in from the left. The results are placed in the corresponding word elements of register rt. |
rotmah rt, ra, rb | Rotate and mask algebraic halfword. The contents of each halfword element of register ra are right shifted according to the two's complement of the 5 least significant bits of the corresponding halfword element of register rb. Copies of the sign bit are shifted in from the left. The results are placed in the corresponding halfword element of register rt. |
rotmahi rt, ra, s6 | Rotate and mask algebraic halfword immediate. The contents of each halfword element of register ra are right shifted according to the signed value s6. Copies of the sign bit are shifted in from the left. The results are placed in the corresponding halfword elements of register rt. |
rotmai rt, ra, s7 | Rotate and mask algebraic word immediate. The contents of each word element of register ra are right shifted according to the two's complement of the signed value s7. Copies of the sign bit are shifted in from the left. The results are placed in the corresponding word elements of register rt. |
rotmi rt, ra, s7 | Rotate and mask word immediate. The contents of each word element of register ra are right shifted according to the two's complement of the signed value s7. The results are placed in the corresponding word elements of register rt. |
rotqbi rt, ra, rb | Rotate quadword by bits. The contents of register ra are rotated left by the number of bits specified by the 3 least significant bits of word element 0 of register rb. The result is placed in register rt. |
rotqbii rt, ra, u3 | Rotate quadword by bits immediate. The contents of register ra are rotated left by the number of bits according to the value u3. The result is placed in register rt. |
rotqby rt, ra, rb | Rotate quadword by bytes. The contents of register ra are rotated left by the number of bytes specified by the 4 least significant bits of word element 0 of register rb. The result is placed in register rt. |
rotqbybi rt, ra, rb | Rotate quadword by bytes from bit shift count. The contents of register ra are rotated left by the number of bytes specified by bits 25–28 of word element 0 of register rb. The result is placed in register rt. |
rotqbyi rt, ra, s7 | Rotate quadword by bytes immediate. The contents of register ra are rotated left by the number of bytes according to the signed value s7. The result is placed in register rt. |
rotqmbi rt, ra, rb | Rotate and mask quadword by bits. The contents of register ra are shifted right by the number of bits specified by the two's complement of the 3 least significant bits of word element 0 of register rb. The result is placed in register rt. |
rotqmbii rt, ra, s3 | Rotate and mask quadword by bits immediate. The contents of register ra are shifted right by the number of bits specified by the two's complement of the signed value s3. The result is placed in register rt. |
rotqmby rt, ra, rb | Rotate and mask quadword by bytes. The contents of register ra are shifted right by the number of bytes specified by the two's complement of the 5 least significant bits of word element 0 of register rb. The result is placed in register rt. |
rotqmbybi rt, ra, rb | Rotate and mask quadword by bytes from bit shift count. The contents of register ra are shifted right by the number of bytes specified by the two's complement of bits 24–28 of word element 0 of register rb. The result is placed in register rt. |
rotqmbyi rt, ra, s6 | Rotate and mask quadword by bytes immediate. The contents of register ra are shifted right by the number of bytes specified by the two's complement of the signed value s6. The result is placed in register rt. |
selb rt, ra, rb, rc | Select bits. Each bit of register rc whose value is 0 selects the corresponding bit from register ra. A bit whose value is 1 selects the corresponding bit from register rb. The quadword result is placed in register rt. |
sf rt, ra, rb | Subtract from word. Each word element of register ra is subtracted from the corresponding word element of register rb, and the results are placed in the corresponding word elements of register rt. |
sfh rt, ra, rb | Subtract from halfword. Each halfword element of register ra is subtracted from the corresponding halfword element of register rb, and the results are placed in the corresponding word elements of register rt. |
sfhi rt, ra, s10 | Subtract from halfword immediate. Each halfword element of register ra is subtracted from the sign-extended value s10, and the results are placed in the corresponding halfword elements of register rt. |
sfi rt, ra, s10 | Subtract from word immediate. Each word element of register ra is subtracted from the sign-extended value s10, and the results are placed in the corresponding word elements of register rt. |
sfx rt, ra, rb | Subtract from word extended. Each word element of register ra is subtracted from the corresponding word element of register rb. An additional 1 is subtracted from the result if the least significant bit of word element rt is 0. The results are placed in the corresponding word elements of register rt. |
shl rt, ra, rb | Shift left word. The contents of each word element of register ra are shifted left according to the 6 least significant bits of the corresponding word element of register rb. The results are placed in the corresponding word elements of register rt. |
shlh rt, ra, rb | Shift left halfword. The contents of each halfword element of register ra are shifted left according to the 5 least significant bits of the corresponding halfword element of register rb. The results are placed in the corresponding halfword elements of register rt. |
shlhi rt, ra, u5 | Shift left halfword immediate. The contents of each halfword element of register ra are shifted left according to unsigned value u5. The results are placed in the corresponding halfword elements of register rt. |
shli rt, ra, u6 | Shift left word immediate. The contents of each word element of register ra are shifted left according to the unsigned value u6. The results are placed in the corresponding word element of register rt. |
shlqbi rt, ra, rb | Shift left quadword by bits. The contents of register ra are shifted left by the number of bits specified by the 3 least significant bits of word element 0 of register rb. The result is placed in register rt. |
shlqbii rt, ra, u3 | Shift left quadword by bits immediate. The contents of register ra are shifted left by the number of bites specified by the unsigned value u3. The result is placed in register rt. |
shlqby rt, ra, rb | Shift left quadword by bytes. The contents of register ra are shifted left by the number of bytes specified by the 5 least significant bits of word element 0 of register rb. The result is placed in register rt. |
shlqbybi rt, ra, rb | Shift left quadword by bytes from bit shift count. The contents of register ra are shifted left by the number of bytes specified by bits 24–28 of word element 0 of register rb. The result is placed in register rt. |
shlqbyi rt, ra, u5 | Shift left quadword by bytes immediate. The contents of register ra are shifted left by the number of bytes specified by the unsigned value u5. The result is placed in register rt. |
shufb rt, ra, rb, rc | Shuffle bytes. Each byte of register rc is used to select a byte from either register ra or register rb or a constant (0, 0x80, or 0xFF). The results are placed in the corresponding bytes of register rt. |
stop u14 | Stop and signal. Execution is stopped, the current address is written to the SPU NPC register, the value u14 is written to the SPU status register, and an interrupt is sent to the PowerPC Processor Unit (PPU). |
stopd ra, rb, rc | Stop and signal with dependencies. Execution is stopped after register dependencies are met. This involves writing the current address to the SPU NPC register, writing the value 0x3FFF to the SPU status register, and interrupting the PPU. |
stqa rc, s18 | Store quadword (a-form). The quadword in register rc is stored at the effective address specified by the sign-extended value s18. The two least significant bits of s18 are ignored. |
stqd rc, s14(ra) | Store quadword (d-form). The quadword in register rc is stored at the effective address computed by the sum of register ra and the sign-extended value s14. The four least significant bits of s14 are ignored. |
stqr rc, s18 | Store quadword instruction relative (a-form). The quadword in register rc is stored at the effective address specified by the sum of the current instruction address and s18. The two least significant bits of s18 are ignored. |
stqx rc, ra, rb | Store quadword (x-form). The quadword in register rc is stored at the effective address computed by the sum of registers ra and rb. |
sumb rt, ra, rb | Sum bytes into halfword. The 4 bytes of each word element of register ra are summed and placed in the corresponding odd halfword elements of register rt, and the 4 bytes of each word element of register rb are summed and placed in the corresponding even halfword elements of register rt. |
sync | Synchronize. The processor waits until all pending store instructions have been completed before it fetches the next sequential instruction. |
syncc | Synchronize channel. The processor waits until the channel is ready and all pending store instructions have been completed before it fetches the next sequential instruction. |
wrch ch, ra | Write channel. The contents of register ra are written to the channel ch. |
xor rt, ra, rb | Xor. The value of register ra is logically exclusive ORed with register rb and the result is placed in register rt. |
xorbi rt, ra, s10 | Exclusive or byte immediate. The 8 least significant bits of s10 are logically exclusive ORed with each byte element of register ra, and the results are placed in the corresponding elements of register rt. |
xorhi rt, ra, s10 | Exclusive or halfword immediate. The sign-extended 16 least significant bits of s10 are logically exclusive ORed with each halfword element of register ra, and the results are placed in the corresponding elements of register rt. |
xori rt, ra, s10 | Exclusive or word immediate. The sign-extended value of s10 is logically exclusive ORed with each word element of register ra, and the results are placed in the corresponding elements of register rt. |
xsbh rt, ra | Extend sign byte to halfword. The least significant 8 bits of each halfword element of register ra are sign extended to 16-bits and placed in the corresponding halfword element of register rt. |
xshw rt, ra | Extend sign halfword to word. The least significant 16 bits of each word element in register ra are sign extended to 32-bits and placed in the corresponding word element of register rt. |
xswd rt, ra | Extend sign word to doubleword. The least significant 32 bits of each doubleword element in register ra are sign extended to 64-bits and placed in the corresponding doubleword element of register rt. |
Other references[edit | edit source]
- Synergistic Processor Unit(SPU) Instruction Set Architecture (Version 1.2 / January 27, 2007)
- SPU Assembly Language Specification (Version 1.4 / October 11, 2006)
- C/C++ Language Extensions for Cell Broadband Engine™ Architecture (Version 2.3 / December 4, 2006)
- SIMD Math Library Specification for Cell Broadband Engine™ Architecture (Version 1.0 / November 6, 2006)
- SPU Application Binary Interface Specification (Version 1.6 / December 4, 2006)
IBM:
- The little broadband engine that could: Mailboxes and interrupts Uncover two means of communication between the SPE and the PPE -- mailboxes and signal notification.
- http://cell.scei.co.jp/pdf/CBE_Public_Registers_v15.pdf page (incl. SPE "Problem state memory map", "Privilege 1 memory map", "Privilege 2 memory map")
VPOS:
- http://djlee.org:8080/trac/LabWorks/browser/VPOS/working-sources/include/asm-cell/cell_memory_map_spe.h?rev=79
- http://djlee.org:8080/trac/LabWorks/browser/VPOS/working-sources/include/asm-cell/cell_memory_map.h?rev=79