03.Instruction Set Architecture(ISA)

Instruction Set Architecture(ISA)

The “contract” between software and hardware

Functional definition of operations, modes, and storage locations supported by hardware
Precise description of how software can invoke and access them

Strictly speaking, ISA is the architecture

Informally, architecture is also used to talk about the big picture of implementation
Better to call this microarchitecture

Microarchitecture (微架构)

ISA specifies what hardware does, not how it does

No guarantees regarding
- How operations are implemented
- Which operations are fast and which are slow
- Which operations take more power and which take less
These issues are determined by the microarchitecture
- Microarchitecture = how hardware implements architecture
- All Pentiums implement the x86 architecture

与ISA有关的内容

The Von Neumann model
- Implicit(隐含) structure of all modern ISAs
Format
- Length and encoding
Operations
Operand model
- Where are operands(操作数) stored and how do address them?
Datatypes and operations
Control

我们用的例子: MIPS ISA

1. Von Neumann Model

Implicit model of all modern ISAs
Key: program counter (PC)
- Defines total order of dynamic instructions
  - Next PC is PC++ unless insn says otherwise
- Order and named storage define computation
  - Value flows from insn X to Y via storage A iff…
  - X names A as output, Y names A as input…
  - And Y after X in total order
Processor logically executes loop at left
- Instruction execution assumed atomic
- Instruction X finishes before insn X+1 starts

2. Instruction Format

Length (长度)

Fixed length(固定长度)
- 32 or 64 bits
- Simple implementation: compute next PC using only PC
- Code density
Variable length
- Complex implementation
- Code density
Compromise: two lengths
- Example: MIPS_16

Encoding (编码)

A few simple encodings simplify decoder implementation
Complex encoding can improve code density

MIPS Format (指令格式)

Length

32-bits
MIPS_16: 16-bit variants of common instructions for density

Encoding

3 formats, simple encoding
Q: how many operation types can be encoded in 6-bit opcode?

R Format (寄存器类型)

e.g., add $1, $2, $3

000000

00010

00011

00001

00000

100000

alu-rr

zero

add/signed

注: $1 表示 1 号寄存器. 忽视 shamt.

I Format (立即数类型)

All loads and stores use I-format
Assembly: lw $1, 100($2)
Machine:

100011

00010

00001

0000000001100100

100 (in binary)

注: 100($2) 表示基址为 2 号寄存器, 加上 100 偏移量. lw 为 load word, 取 100($2) 地址的值, 置于 $1.

ALU ops with immediates

addi $1, $2, 100
001000 00010 00001 0000000001100100

注: [00001] <= [00010] + 0000000001100100, 即 $1 = $2 + 0000000001100100.

Conditional branches

beq $1, $2, 7
000100 00001 00010 0000 0000 0000 0111
PC = PC + (0000 0000 0000 0111 << 2) // word offset

注: 条件跳转指令, if $1 == $2, goto 7. 7 为偏移量.

J Format (跳转类型)

Direct Jump:

opcode

addr

Jump to:

New PC = 4 MSB of PC || addr || 00
4+26+2 = 32 bits for jump target

注: MSB 为高四位(most significant bit).

3. Operations

Operation type encoded in instruction opcode
Many types of operations
- Integer arithmetic: add, sub, mul, div, mod/rem (signed/unsigned)
- FP arithmetic: add, sub, mul, div, sqrt
- Integer logical: and, or, xor, not, sll, srl, sra
- ...
What other operations might be useful?
More operation types == better ISA?
DEC VAX computer had LOTS of operation types
- E.g., instruction for polynomial evaluation (no joke!)
- But many of them were rarely/never used

4. Operand Model (操作数模型)

If you’re going to add, you need at least 3 operands
- Two source operands, one destination operand
Question #1: Where can operands come from?
Question #2: And how are they specified?
Running example: A = B + C
- Several options for answering both questions
Discuss: Memory-Only & Registers
Optional: Accumulator & Stack

Operand Model I: Memory Only

Memory only

mem[A] = mem[B] + mem[C]  // add A,B,C

Operand Model II: Accumulator

Accumulator : implicit single-element stack

ACC = mem[B]  // load B
ACC = ACC + mem[C]  // add C
mem[A] = ACC  // store A

注: Accumulator 相比 Memory Only, 减少了访存.

Operand Model III: Stack

Stack : top of stack (TOS) is implicit in instructions

stack[TOS++] = mem[B]  // push B
stack[TOS++] = mem[C]  // push C
stack[TOS++] = stack[--TOS] + stack[--TOS]  // add
mem[A] = stack[--TOS]  // pop A

注: Stack 相比 Accumulator, 有更多的存储单元(Acc 只有 1 个).

Operand Model IV: Registers

General - purpose registers : multiple explicit accumulators

R1 = mem[B]  // load R1,B
R1 = R1 + mem[C]  // add R1, C
mem[A] = R1  // store R1,A

Load - store : GPR and only loads/stores access memory

R1 = mem[B]  // load R1,B
R2 = mem[C]  // load R2,C
R1 = R1 + R2  // add R1,R1,R2
mem[A] = R1  // store R1,A

注: Registers 相比 Stack, 读写位置任意.

Operand Model: Pros and Cons

Metric I: static code size
- Number of instructions needed to represent program, size of each
- Evaluation: memory only(1) < register(3) < load-store(4)
Metric II: data memory traffic
- Number of bytes moved to and from memory
- Evaluation: load-store > register > memory only
Metric III: instruction latency
- Want low latency to execute instructions
- Evaluation: load-store > register > memory only

现状: Most current ISAs are load-Store

MIPS Operand Model

MIPS is load-store
- 32 32-bit integer registers
  - Actually 31: r0 is hardwired to value 0 -> why?
- 32 32-bit FP registers
  - Can also be treated as 16 64-bit FP registers
- HI, LO: destination registers for multiply/divide
Integer register conventions
- Allows separate function-level compilation and fast function calls

Memory Addressing (内存寻址)

ISAs assume “virtual” address size
- Either 32 or 64 bits
- Program can name 2^32 bytes (4GB) or 2^64 bytes (16PB)
- ISA point? no room for even one address in a 32-bit instruction
Addressing mode : way of specifying address
- Direct: ld R1,(R2) R1=mem[R2]
- Displacement: ld R1,8(R2) R1=mem[R2+8]
- Indexed: ld R1,(R2,R3) R1=mem[R2+R3]
- Memory-indirect: ld R1,@(R2) R1=mem[mem[R2]]
- Auto-update: ld R1,8(R2) R2+=8; R1=mem[R2]
- Scaled: ld R1,(R2,R3,32,8) R1=mem[R2+R3*32+8]

What high-level program idioms are these used for?

MIPS Addressing Modes: Rationality

MIPS implements only displacement

Why? Experiment on VAX (ISA with every mode) found distribution
Disp: 61%, reg-ind: 19%, scaled: 11%, mem-ind: 5%, other: 4%
80% use displacement or register indirect (=displacement 0)

How about the remain 20%?

I-type instructions: 16-bit displacement

Is 16-bits enough?
Yes! VAX experiment showed 1% accesses use displacement >16

Addressing Issue: Endian - ness

Byte Order (字节序)

Big Endian: byte 0 is 8 most significant bits
- IBM 360/370, Motorola 68k, MIPS, SPARC, HP PA-RISC
Little Endian: byte 0 is 8 least significant bits
- Intel 80x86, DEC Vax, DEC/Compaq Alpha

Addressing Issue: Alignment

Alignment: require that objects fall on address that is multiple of their size

32-bit integer
Aligned if address % 4 = 0 [% is symbol for “mod”]
Aligned: lw @XXXX00
Not: lw @XXXX10

Question: what to do with unaligned accesses (uncommon case)?

Support in hardware? Makes all accesses slow
Trap to software routine? Possibility
MIPS? ISA support: unaligned access using two instructions:
- lw @XXXX10 = lwl @XXXX10 / lwr @XXXX10

5. Datatypes

Datatypes
- Software view: property of data
- Hardware view: data is just bits, property of operations
Hardware datatypes
- Integer: 8 bits (byte), 16b (half), 32b (word), 64b (long)
- IEEE754 FP: 32b (single-precision), 64b (double-precision)
- Packed integer: treat 64b int as 8 8b int’s or 4 16b int’s

MIPS Datatypes (and Operations)

Datatypes: all the basic ones (byte, half, word, FP)
- All integer operations read/write 32-bits
  - No partial dependences on registers
- Only byte/half variants are load-store
  - lb, lbu, lh, lhu, sb, sh
- Loads sign-extend (or not) byte/half into 32-bits
Operations: all the basic ones
- Signed/unsigned variants for integer arithmetic
- Immediate variants for all instructions
  - add, addu, addi, addiu
Regularity/orthogonality: all variants available for all operations
- Makes compiler’s “life” easier

6. Control

6.1 Control Instructions I

One issue: testing for conditions

Option I: Compare and branch instructions
blti $1,10,target
- Simple, – two ALUs: one for condition, one for target address
Option II: Implicit condition codes
subi $2,$1,10 // sets “negative” CC
bn target
- Condition codes set “for free”, – implicit dependence is tricky
Option III: Condition registers, separate branch insns slti $2,$1,10
bnez $2,target
- Additional instructions, one ALU per, explicit dependence

MIPS Conditional Branches

MIPS uses combination of options II and III
- Compare 2 registers and branch: beq, bne
  - Equality and inequality only
  - Don’t need an adder for comparison
- Compare 1 register to zero and branch: bgtz, bgez, bltz, blez
  - Greater/less than comparisons
  - Don’t need adder for comparison
- Set explicit condition registers: slt, sltu, slti, sltiu, etc.
Why?

6.2 Control Instructions II

Another issue: Computing targets

Option I: PC-relative
- Position-independent within procedure
- Used for branches and jumps within a procedure
Option II: Absolute
- Position independent outside procedure
- Used for procedure calls
Option III: Indirect (target found in register)
- Needed for jumping to dynamic targets
- Used for returns, dynamic procedure calls, switches, ???
How far do you need to jump?
- Typically not so far within a procedure (they don’t get that big)
- Further from one procedure to another

MIPS Control Instructions

MIPS uses all three

PC-relative -> conditional branches: bne, beq, blez, etc.
- 16-bit relative offset, <0.1% branches need more
- PC = PC + 4 + immediate if condition is true (else PC=PC+4)
Absolute -> unconditional jumps: j target
- 26-bit offset (can address 2^28 words < 2^32 -> what gives?)
Indirect -> Indirect jumps: jr $rd

6.3 Control Instructions III

Another issue: how to support procedure calls?
- We “link” (remember) address of the calling instruction + 4 (current PC + 4) so we can return to it after the procedure
MIPS
- Implicit return address register is $ra(=$31)
- Direct jump-and-link: jal address
  -> $ra = PC+4; PC = address
- Can then return from call with: jr $ra
- Or can call with indirect jump-and-link: jalr $rd, $rs -> $rd = PC+4; PC = $rs // explicit return address register
- Then return with: jr $rd

Control习语: If - Then - Else

Understanding programs helps with architecture
- Know what common programming idioms look like in assembly
- Why? How can you MCCFif you don’t know what CC is?
First control idiom: if - then - else

if (A < B) A++; // A in $s1
else B++; // B in $s2

What's the MIPS format?

      slt $s3,$s1,$s2 // if $s1<$s2, then $s3=1
      beqz $s3,else // branch to else if !condition
      addi $s1,$s1,1
      j join // jump to join
else: addi $s2,$s2,1
join:

Control习语: Arithmetic For Loop

Second idiom: for loop with arithmetic induction

int A[100], sum, i, N;
for (i=0; i<N; i++){  // assume: i in $s1, N in $s2
  sum += A[i];  // &A[i] in $s3, sum in $s4
}

      sub $s1,$s1,$s1  // initialize i to 0
loop: slt $t1,$s1,$s2  // if i<N, then $t1=1
      beqz $t1,exit // test for exit at loop header
      lw $t1,0($s3)  // $t1 = A[i]  (not &A[i])
      add $s4,$s4,$t1 // sum = sum + A[i]
      addi $s3,$s3,4  // increment &A[i] by sizeof(int)
      addi $s1,$s1,1  // i++
      j loop  // backward jump
exit:

RISC vs. CISC

RISC: reduced-instruction set computer
- Coined by P+H in early 80’s
CISC: complex-instruction set computer
- Not coined by anyone, term didn’t exist before “RISC”
Religious war (one of several) started in mid 1980’s
- RISC (MIPS, Alpha) “won” the technology battles
- CISC (IA32 = x86) “won” the commercial war
  - Compatibility a stronger force than anyone (but Intel) thought
  - Intel beat RISC at its own game … more on this soon

Intel 80x86 ISA (aka x86 or IA - 32 now)

Long history
Binary compatibility across generations <- IBM 360
1978: 8086, 16-bit, registers have dedicated uses
1980: 8087, added floating point (stack)
1982: 80286, 24-bit
1985: 80386, 32-bit, new instrs  GPR almost
1989-95: 80486, Pentium, Pentium II
1997: Added MMX instructions (for graphics)
1999: Pentium III
2002: Pentium 4
2004: “Nocona” 64-bit extension (to keep up with AMD)

Intel x86: The Penultimate CISC (VAX ultimate)

Variable length instructions: 1-16 bytes
Few registers: 8 and each one has a special purpose
Multiple register sizes: 8,16,32 bit (for backward compatibility)
Accumulators for integer instrs, and stack for FP instrs
Multiple addressing modes: indirect, scaled, displacement
Register-register, memory - register insns
Condition codes
Instructions for memory stack management (push, pop)
Instructions for manipulating strings (entire loop in one instruction)

Summary: yuck!

80x86 Registers and Addressing Modes

Eight 32-bit registers (not truly general purpose)
- EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI
Six 16-bit Registers for code, stack, & data
2-address ISA
- One operand is both source and destination
NOT a Load/Store ISA
- One operand can be in memory

80x86 Addressing Modes

Register Indirect
- mem[reg]
- not ESP or EBP
Base + displacement (8 or 32 bit)
- mem[reg + const]
- not ESP or EBP
Base + scaled index
- mem[reg + (2^scale x index)]
- scale = 0,1,2,3
- base any GPR, index not ESP
Base + scaled index + displacement
- mem[reg + (2^scale x index) + displacement]
- scale = 0,1,2,3
- base any GPR, index not ESP

Condition Codes

x86 ISA has condition codes
Special HW register that has values set as side effect of instruction execution
Example conditions
- Zero
- Negative
Example use

subi $t0, $t0, 1

bz loop

80x86 Instruction Encoding

Variable size 1-byte to 16-bytes
Jump (JE) 2-bytes
Push 1-byte
Add Immediate 5-bytes
W bit says 32-bits or 8-bits
D bit indicates direction
- memory -> reg or reg -> memory
- movw EBX, [EDI + 45]
- movw [EDI + 45], EBX

Decoding x86 Instructions

Is a nightmare!
Instruction length is variable from 1 to 16 bytes!
Prefixes, postfixes
Crazy “formats” -> register specifiers move around
But key instructions not terrible
Yet, everything must work correctly

How x86 Won Anyway

X86 won because it was the first 16-bit chip by 2 years
IBM put it into its PCs because no competing choice
Software written to x86 so x86 is the standard
Hard to complete with Intel
- X86 is difficult ISA to implement
- Intel can amortize(缓冲) design effort over vast sales
- Intel uses RISC “underneath”
Moore’s law has helped in a big way
- Most engineering problems can be solved with more transistors

Previous02.Performance Analysis Next04.流水线基础和性能分析

Last updated 5 years ago

hashtagInstruction Set Architecture(ISA)

hashtagMicroarchitecture (微架构)

hashtag与ISA有关的内容

hashtag1. Von Neumann Model

hashtag2. Instruction Format

hashtagMIPS Format (指令格式)

hashtagR Format (寄存器类型)

hashtagI Format (立即数类型)

hashtagJ Format (跳转类型)

hashtag3. Operations

hashtag4. Operand Model (操作数模型)

hashtagOperand Model I: Memory Only

hashtagOperand Model II: Accumulator

hashtagOperand Model III: Stack

hashtagOperand Model IV: Registers

hashtagOperand Model: Pros and Cons

hashtagMIPS Operand Model

hashtagMemory Addressing (内存寻址)

hashtagMIPS Addressing Modes: Rationality

hashtagAddressing Issue: Endian - ness

hashtagAddressing Issue: Alignment

hashtag5. Datatypes

hashtagMIPS Datatypes (and Operations)

hashtag6. Control

hashtag6.1 Control Instructions I

hashtagMIPS Conditional Branches

hashtag6.2 Control Instructions II

hashtagMIPS Control Instructions

hashtag6.3 Control Instructions III

hashtagControl习语: If - Then - Else

hashtagControl习语: Arithmetic For Loop

hashtagRISC vs. CISC

hashtagIntel 80x86 ISA (aka x86 or IA - 32 now)

hashtagIntel x86: The Penultimate CISC (VAX ultimate)

hashtag80x86 Registers and Addressing Modes

hashtag80x86 Addressing Modes

hashtagCondition Codes

hashtag80x86 Instruction Encoding

hashtagDecoding x86 Instructions

hashtagHow x86 Won Anyway