05.MIPS单周期微架构

CPU中的状态元素

一些假设 (not practical)

  • “Magic” memory and register file

  • Combinational read

    • output of the read data port is a combinational function of the register file contents and the corresponding(相应) read select port

  • Synchronous write

    • the selected register is updated on the positive edge clock transition when write enable is asserted

  • Cannot affect read output in between clock edges

  • Single-cycle, synchronous memory

    • Contrast this with memory that tells when the data is ready

    • i.e., Ready bit: indicating the read or write is done

MIPS的指令处理过程

5 generic steps

  • Instruction fetch (IF)(取指)

  • Instruction decode and register operand fetch (ID/RF)

  • Execute/Evaluate memory address (EX/AG)

  • Memory operand fetch (MEM)

  • Store/writeback result (WB)

完整的MIPS数据通路

算术和逻辑指令的单周期数据通路

R-Type ALU Instructions

  • Assembly (e.g., register-register signed addition)

    • ADD rd_reg rs_reg rt_reg

  • Machine encoding

  • Semantics

if MEM[PC] == ADD rd rs rt
    GPR[rd] <— GPR[rs] + GPR[rt]
    PC <— PC + 4

ALU Datapath

I-Type ALU Instructions

  • Assembly (e.g., register-immediate signed additions)

    • ADDI rt_reg rs_reg immediate_16

  • Machine encoding

  • Semantics

if MEM[PC] == ADDI rt rs immediate
    GPR[rt] <— GPR[rs] + sign-extend (immediate)
    PC <— PC + 4

ALU Datapath

Datapath for R and I-Type ALU Insts

如何完善数据通路?

注: 增加两个多路选择器, 以完成不同的指令.

数据转移指令的单周期数据通路

Load Instructions

  • Assembly (e.g., load 4-byte word)

    • LW rt_reg offset_16 (base_reg)

  • Machine encoding

  • Semantics

if MEM[PC]==LW rt offset_16(base)
    EA = sign-extend(offset) + GPR[base]  // 有效地址 = 偏移量(符号拓展) + 基址寄存器
    GPR[rt] <— MEM[ translate(EA) ]  // rt寄存器 <— 有效地址(地址转换)
    PC <— PC + 4

LW 数据通路

Store Instructions

  • Assembly (e.g., store 4-byte word)

    • SW rt_reg offset_16(base_reg)

  • Machine encoding

  • Semantics

if MEM[PC]==SW  rt offset_16(base)
    EA = sign-extend(offset) + GPR[base]
    MEM[ translate(EA) ] <— GPR[rt]
    PC <— PC + 4

SW数据通路

注: rt 在 Machine encoding 的第二个 5-bit 位置, 所以 Read data 2 == [rt].

Load-Store数据通路

非控制流指令的数据通路

注: 整合了所有的指令. 不同之处在于多了下方的多路选择器(MemtoReg).

控制流指令的单周期数据通路

无条件跳转指令

  • Assembly

    • J immediate_26

  • Machine encoding

  • Semantics

if MEM[PC]==J immediate_26
target = { (PC+4)[31:28], immediate_26, 2’b00 }
PC <— target

无条件跳转指令的数据通路

注: 取 PC+4 的高四位 [31:28] 和指令的 26 位立即数合并, 再补 00 (也可以左移两位), 最后赋值给 PC.

条件分支指令

  • Assembly (e.g., branch if equal)

    • BEQ rs_reg rt_reg immediate_16

  • Machine encoding

  • Semantics (assuming no branch delay slot)

if MEM[PC]==BEQ rs rt immediate_16
    target = PC + 4 + sign-extend(immediate) x 4
    if GPR[rs]==GPR[rt] then  PC <— target
    else  PC <— PC + 4

Conditional Branch Datapath (for you to finish)

注: 先给潜在目标(PC) +4, 与低16位做符号扩展, 扩展前先 x4 移 2 位. 得到的结果为潜在的 target, 赋值给 PCSrc. 译码寄存器读两个寄存器的值, ALU 做减法比较两者是否相等(结果是否为 0). 如果相等, target 赋值给 PC, 否则 PC + 4.

单周期控制逻辑

单周期硬布线控制

  • As combinational function of Inst=MEM[PC]

注: 通常由 opcode 和 funct 组合产生控制信号.

  • Consider

    • All R-type and I-type ALU instructions

    • LW and SW

    • BEQ, BNE, BLEZ, BGTZ

    • J, JR, JAL, JALR

单个比特控制信号

ALU Control

  • case opcode

    • ‘0’ -> select operation according to funct

    • ‘ALUi’ -> selection operation according to opcode

    • ‘LW’ -> select addition

    • ‘SW’ -> select addition

    • ‘Bxx’ -> select bcond generation function

    • __ -> don’t care

  • Example ALU operations

    • ADD, SUB, AND, OR, XOR, NOR, etc.

    • bcond on equal, not equal, LE zero, GT zero, etc.

R-Type ALU

注: R 类指令 PC+4, PCsrc==0. 不是 JMP 指令, PCsrc2=Br Trken 回传 PC. 寄存器目标为 1-15, 读出 Reg data 1 和 2, 结合功能码进行译码操作. 不需要写入 MEM, 写回至 Write Data 结束.

I-Type ALU

注: I 类 PC+4 的部分与 R 类类似, 不介绍. ALU 内容来自reg和立即数. 结果也是写回至 Write data.

LW

注: 不改变数据流, PCsrc 处相同. ALU 输入来自 reg1(基址) 和 立即数(偏移量). 此时进行访存, read data(MEM) 写回至 Write data.

SW

注: SW 与 LW 方向相反. PCsrc 处相同. SW 不需要写寄存器(Write reg/data). ALU 输入数据来自 reg1 和 立即数, 赋值给 add(MEM), reg2 的五位为写入 data(MEM) 的内容.

Branch (Not Taken)

注: 条件分支的不跳(条件不满足)称为 not taken. 先对 reg 中的 data 1 和 2 比较, 不满足结果为 0. 仍然为 PC+4(不跳转).

Branch (Taken)

注: 跳转即为 reg data 1 和 2 相等, 结果为 1. 所以 PCsrc2 = BrTaken(BrTaken 为低 16 位符号拓展左移两位, 再加上 PC+4). 最后 PCsrc2 -> PC.

Jump

注: JMP 比较简单. 取指译码 PCsrc1==JMP. 把 PC 的高 4 位和指令的第 26 位合并, 再补两位, 组成 32 位跳转目标地址. 最后 PCsrc1 -> PC.

What is in the Control Box

注: 控制信号是怎么产生的? 组合逻辑时序逻辑.

  • Combinational Logic -> Hardwired Control

    • Idea: Control signals generated combinationally based on instruction

    • Necessary in a single-cycle microarchitecture…

  • Sequential Logic -> Sequential/Microprogrammed Control

    • Idea: A memory structure contains the control signals associated with an instruction

    • Control Store

MIPS单周期微架构的评测

A Single-Cycle Microarchitecture: Analysis

  • Every instruction takes 1 cycle to execute

    • CPI (Cycles per instruction) is strictly 1

  • How long each instruction takes is determined by how long the slowest instruction takes to execute

    • Even though many instructions do not need that long to execute

  • Clock cycle time of the microarchitecture is determined by how long it takes to complete the slowest instruction

    • Critical path of the design is determined by the processing time of the slowest instruction

What is the Slowest Instruction to Process

  • Let ’ s go back to the basics

  • All five phases of the instruction processing cycle take a single machine clock cycle to complete

    • Instruction fetch (IF)

    • Instruction decode and register operand fetch (ID/RF)

    • Execute/Evaluate memory address (EX/AG)

    • Memory operand fetch (MEM)

    • Store/writeback result (WB)

  • Do each of the above phases take the same time (latency) for all instructions?

单周期数据通路分析

  • Assume

    • memory units (read or write): 200 ps

    • ALU and adders: 100 ps

    • register file (read or write): 50 ps

    • other combinational logic: 0 ps

找出关键路径

R-Type and I-Type ALU

LW ALU

SW ALU

Branch Taken ALU

Jump ALU

Last updated