05.MIPS单周期微架构
Last updated
Last updated
“Magic” memory and register file
Combinational read
output of the read data port is a combinational function of the register file contents and the corresponding(相应) read select port
Synchronous write
the selected register is updated on the positive edge clock transition when write enable is asserted
Cannot affect read output in between clock edges
Single-cycle, synchronous memory
Contrast this with memory that tells when the data is ready
i.e., Ready bit: indicating the read or write is done
5 generic steps
Instruction fetch (IF)(取指)
Instruction decode and register operand fetch (ID/RF)
Execute/Evaluate memory address (EX/AG)
Memory operand fetch (MEM)
Store/writeback result (WB)
完整的MIPS数据通路
Assembly (e.g., register-register signed addition)
ADD rd_reg rs_reg rt_reg
Machine encoding
Semantics
ALU Datapath
Assembly (e.g., register-immediate signed additions)
ADDI rt_reg rs_reg immediate_16
Machine encoding
Semantics
ALU Datapath
如何完善数据通路?
注: 增加两个多路选择器, 以完成不同的指令.
Assembly (e.g., load 4-byte word)
LW rt_reg offset_16 (base_reg)
Machine encoding
Semantics
LW 数据通路
Assembly (e.g., store 4-byte word)
SW rt_reg offset_16(base_reg)
Machine encoding
Semantics
SW数据通路
注: rt 在 Machine encoding 的第二个 5-bit 位置, 所以 Read data 2 == [rt].
注: 整合了所有的指令. 不同之处在于多了下方的多路选择器(MemtoReg).
Assembly
J immediate_26
Machine encoding
Semantics
无条件跳转指令的数据通路
注: 取 PC+4 的高四位 [31:28] 和指令的 26 位立即数合并, 再补 00 (也可以左移两位), 最后赋值给 PC.
Assembly (e.g., branch if equal)
BEQ rs_reg rt_reg immediate_16
Machine encoding
Semantics (assuming no branch delay slot)
Conditional Branch Datapath (for you to finish)
注: 先给潜在目标(PC) +4, 与低16位做符号扩展, 扩展前先 x4 移 2 位. 得到的结果为潜在的 target, 赋值给 PCSrc. 译码寄存器读两个寄存器的值, ALU 做减法比较两者是否相等(结果是否为 0). 如果相等, target 赋值给 PC, 否则 PC + 4.
As combinational function of Inst=MEM[PC]
注: 通常由 opcode 和 funct 组合产生控制信号.
Consider
All R-type and I-type ALU instructions
LW and SW
BEQ, BNE, BLEZ, BGTZ
J, JR, JAL, JALR
-
When De-asserted
When asserted
Equation
RegDest
GPR write select according to rt, i.e., inst[20:16]
GPR write select according to rd
, i.e., inst[15:11]
opcode
==0
ALUSrc
2^nd ALU input from 2^nd GPR read port
2^nd ALU input from signextended 16-bit immediate
(opcode
!=0) && (opcode
!=BEQ) && (opcode
!=BNE)
MemtoReg
Steer ALU result to GPR write port
steer memory load to GPR wr. port
opcode
==LW
RegWrite
GPR write disabled
GPR write enabled
(opcode
!=SW) && (opcode
!=Bxx) && (opcode
!=J) && (opcode
!=JR))
MemRead
Memory read disabled
Memory read port return load value
opcode
==LW
MemWrite
Memory write disabled
Memory write enabled
opcode
==SW
PCSrc_1
According to PCSrc_2
next PC is based on 26-bit immediate jump target
(opcode
==J)
(opcode
==JAL)
PCSrc_2
next PC = PC + 4
next PC is based on 16-bit immediate branch target
(opcode
==Bxx) && “bcond is satisfied”
case opcode
‘0’ -> select operation according to funct
‘ALUi’ -> selection operation according to opcode
‘LW’ -> select addition
‘SW’ -> select addition
‘Bxx’ -> select bcond generation function
__ -> don’t care
Example ALU operations
ADD, SUB, AND, OR, XOR, NOR, etc.
bcond on equal, not equal, LE zero, GT zero, etc.
注: R 类指令 PC+4, PCsrc==0. 不是 JMP 指令, PCsrc2=Br Trken 回传 PC. 寄存器目标为 1-15, 读出 Reg data 1 和 2, 结合功能码进行译码操作. 不需要写入 MEM, 写回至 Write Data 结束.
注: I 类 PC+4 的部分与 R 类类似, 不介绍. ALU 内容来自reg和立即数. 结果也是写回至 Write data.
注: 不改变数据流, PCsrc 处相同. ALU 输入来自 reg1(基址) 和 立即数(偏移量). 此时进行访存, read data(MEM) 写回至 Write data.
注: SW 与 LW 方向相反. PCsrc 处相同. SW 不需要写寄存器(Write reg/data). ALU 输入数据来自 reg1 和 立即数, 赋值给 add(MEM), reg2 的五位为写入 data(MEM) 的内容.
注: 条件分支的不跳(条件不满足)称为 not taken. 先对 reg 中的 data 1 和 2 比较, 不满足结果为 0. 仍然为 PC+4(不跳转).
注: 跳转即为 reg data 1 和 2 相等, 结果为 1. 所以 PCsrc2 = BrTaken(BrTaken 为低 16 位符号拓展左移两位, 再加上 PC+4). 最后 PCsrc2 -> PC.
注: JMP 比较简单. 取指译码 PCsrc1==JMP. 把 PC 的高 4 位和指令的第 26 位合并, 再补两位, 组成 32 位跳转目标地址. 最后 PCsrc1 -> PC.
注: 控制信号是怎么产生的?
组合逻辑 和 时序逻辑.
Combinational Logic -> Hardwired Control
Idea: Control signals generated combinationally based on instruction
Necessary in a single-cycle microarchitecture…
Sequential Logic -> Sequential/Microprogrammed Control
Idea: A memory structure contains the control signals associated with an instruction
Control Store
Every instruction takes 1 cycle to execute
CPI (Cycles per instruction) is strictly 1
How long each instruction takes is determined by how long the slowest instruction takes to execute
Even though many instructions do not need that long to execute
Clock cycle time of the microarchitecture is determined by how long it takes to complete the slowest instruction
Critical path of the design is determined by the processing time of the slowest instruction
Let ’ s go back to the basics
All five phases of the instruction processing cycle take a single machine clock cycle to complete
Instruction fetch (IF)
Instruction decode and register operand fetch (ID/RF)
Execute/Evaluate memory address (EX/AG)
Memory operand fetch (MEM)
Store/writeback result (WB)
Do each of the above phases take the same time (latency) for all instructions?
Assume
memory units (read or write): 200 ps
ALU and adders: 100 ps
register file (read or write): 50 ps
other combinational logic: 0 ps