01.Introduction
The Computer Revolution
Progress in computer technology
Underpinned(巩固) by Moore’s Law
Makes novel applications feasible(可行)
Computers in automobiles
Cell phones
Human genome project
World Wide Web
Search Engines
Computers are pervasive(无处不在)
Classes of Computers
Personal Mobile Device(PMD)
e.g. smartphones, tablet computers
Emphasis on energy efficiency and real-time
Desktop Computer
Emphasis on price-performance
Servers
Emphasis on availability, scalability, throughput(吞吐率)
Clusters / Warehouse Scale Computers
Used for “Software as a Service(SaaS)”
Emphasis on availability and price-performance
Embedded Computers
Emphasis: price
The PostPC Era
Personal Mobile Device(PMD)
BaIery operated
Connects to the Internet
Hundreds of dollars
Smart phones, tablets, electronic glasses
Cloud computing
Warehouse Scale Computers(WSC)
Sogware as a Service(SaaS)
Porcon of sogware run on a PMD and a porcon run in the Cloud
Amazon and Google
What is Computer Architecture
Application <-> Physics
二者之间的巨大差距很难通过一步来连接
In its broadest definition, computer architecture is the design of the abstraction/implementation layers that allow us to execute information processing applications efficiently using manufacturing technologies
Abstractions in Modern Computing Systems
注: 红色部分为计算机体系结构所关注的内容.
Great Ideas in Computer Architectures
Design for Moore’s Law
Use abstraction to simplify design
Make the common case fast
Performance via parallelism
Performance via pipelining
Performance via prediction
Hierarchy of memories
Dependability via redundancy
Sequential Processor Performance
课程内容
Computer Organization
Basic Pipelined Processor
Components of a Computer
Same components for all kinds of computer
Desktop, server, embedded
Input/output includes
User-interface devices
Display, keyboard, mouse
Storage devices
Hard disk, CD/DVD, flash
Network adapters
For communicacng with other computers
Computer Architecture
Instruction Level Parallelism
Superscalar
Very Long Instruction Word(VLIW)
Long Pipelines(Pipeline Parallelism)
Advanced Memory and Caches
Data Level Parallelism
Vector
GPU
Thread Level Parallelism
Multithreading
Multiprocessor
Multicore
Manycore
Architecture vs. Microarchitecture
“Architecture”/Instruction Set Architecture:
Programmer visible state(Memory & Register)
Operations(Instructions and how they work)
Execution Semantics(语义学)(interrupts)
Input/Output
Data Types/Sizes
Instruction Set Architecture
注: 一个软件与硬件之间的契约.
Properces of a good abstraccon
Lasts through many generations(portability)
Used in many different ways(generality)
Provides convenient functionality to higher levels
Permits an efficient Implementations at lower levels
Microarchitecture/Organization:
Trade offs on how to implement ISA for some metric(Speed, Energy, Cost)
Examples: Pipeline depth, number of pipelines, cache size, silicon area, peak power, execution ordering, bus widths, ALU widths
软件的发展
Up to 1955, Libraries of numerical routines
Floating point operations
Transcendental functions
Matrix manipulation, equation solvers, ...
1955-60, High level Languages - Fortran 1956 Operating Systems
Assemblers, Loaders, Linkers, Compilers
Accounting programs to keep track of usage and charges
IBM的兼容性问题
By early 1960’s, IBM had 4 incompatible lines of computers!
701 ⇒ 7094 650 ⇒ 7074 702 ⇒ 7080 1401 ⇒ 7010
Each system had its own
Instruction set
I/O system and Secondary Storage: magnetic tapes, drums and disks
assemblers, compilers, libraries, ...
market niche business, scientific, real time, ...
这会导致什么问题? 软件兼容性问题
=> IBM 360 解决该问题
IBM 360: A General-Purpose Register(GPR) Machine
Processor State
16 General-Purpose 32-bit Registers
may be used as index and base register
Register 0 has some special properties
4 Floating Point 64-bit Registers
A Program Status Word(PSW)
PC, Condition codes, Control flags
A 32-bit machine with 24-bit addresses
But no instruction contains a 24-bit address!
Data Formats
8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-words
IBM 360: Initial Implementations
-
Model 30
Model 70
Storage
8K - 64 KB
256K - 512 KB
Datapath
8-bit
64-bit
Circuit Delay
30 nsec/level
5 nsec/level
Local Store
Main Store
Transistor Registers
Control Store
Read only 1µsec
Conventional circuits
IBM 360 instruction set architecture(ISA) completely hid the underlying technological differences between various models.
Milestone: The first true ISA designed as portable hardware-software interface!
With minor modifications it still survives today!
IBM 360: Over 50 years later…
The zSeries z14 Microprocessor
5.2 GHz in IBM 14nm SOI CMOS technology
6.1 billion transistors in 696 mm
64-bit virtual addressing
original S/360 was 24-bit, and S/370 was 31-bit extension
10-core design
6-fetch/cycle
10-issue/cycle out-of-order superscalar pipeline
Out-of-order memory accesses
Redundant datapaths
every instruction performed in two arallel datapaths and results compared
128KB L1 I-cache, 128KB L1 D-cache on-chip
2MB private I-cache L2 per core
4MB private D-cache L2 per core
On-Chip 128MB eDRAM L3 cache
Up to 672MB eDRAM L4
Same Architecture Different Microarchitecture
AMDPhenomX4
IntelAtom
IBMPOWER7
X86InstructionSet
X86InstructionSet
PowerInstructionSet
QuadCore
SingleCore
EightCore
125W
2W
200W
Decode3Instructions/Cycle/Core
Decode2Instructions/Cycle/Core
Decode6Instructions/Cycle/Core
64KBL1ICache,64KBL1DCache
32KBL1ICache,24KBL1DCache
32KBL1ICache,32KBL1DCache
512KBL2Cache
512KBL2Cache
256KBL2Cache
Out-of-order
In-order
Out-of-order
2.6GHz
1.6GHz
4.25GHz
Architectural Challenges
Massive(ca.4X)increaseinconcurrency
Mulccore(4-<100)àManycores(100s–1ks)
Heterogeneity
System-level(accelerators)vschiplevel(embedded)
Computepowerandmemoryspeedchallenges(twowalls)
500xcomputepowerand30xmemoryof2PFHW
Memoryaccess'melagsfurtherbehind
Last updated