# CSE 560 Computer Systems Architecture

Technology

# **Technology Unit Overview**

- · Technology basis
  - Transistors
  - Transistor scaling (Moore's Law)
- · The metrics
  - Cost
  - Transistor speed
  - Power
  - · Reliability

How do the metrics change with transistor scaling? How do these changes affect the job of a computer architect?

Τ

4



# **Technology Generations**

**1950-1959** Vacuum Tubes

**1960-1968** Transistors

1969-1977 Integrated Circuit (multiple transistors on chip) 1978-1999 LSI & VLSI (10Ks & 100Ks transistors on chip) 2000-20xx VLSI (millions, now billions transistors on chip)





5

6

# The Silicon in Silicon Valley MOS: metal-oxide-semiconductor N-Type Silicon: negative free-carriers (free electrons) P-Type Silicon: positive free-carriers (holes)







Moore's Law: Technology Scaling
 Channel length: characteristic parameter (short → fast)
 Aka "feature size" or "technology"
 Currently: 0.003 micron (μm), 3 nanometers (nm)
 Moore's Law: aka "technology scaling"
 Continued miniaturization (≈ channel length)
 + Improves: switching speed, power/transistor, area(cost)/transistor
 - Reduces: transistor reliability

**Technology Trends** 1E+7 1E+6 1 million 1E+5 SpecINT) 1E+4 1E+3 1 thousand 1E+2 1 hundred 1E+1 1E+0 [Bresniker et al., 2015] 1E-1 1985 1990 1995 2000 12

11



 Desktop
 Laptop
 Netbook
 Phone

 \$
 \$100-\$300
 \$150-\$350
 \$50-\$100
 \$10-\$20

 % of total
 10-30%
 10-20%
 20-30%
 20-30%

 Other costs
 Memory, display, power supply/battery, storage, software

Cost

• We are concerned about *chip cost* 

· Metric: \$

- Unit cost: costs to manufacture individual chips
- Startup cost: cost to design chip, build the manufacturing facility

CPU = fraction of cost, so is profit (Intel's, Dell's)

14 15

Cost



Unit Cost: Integrated Circuit (IC)

- Cost / wafer is constant, f(wafer size, number of steps)
- · Chip (die) cost related to area
  - Larger chips → fewer chips/wafer
    - $\rightarrow$  fewer *working* ones



- Chip cost ~ chip area<sup>α</sup>
  - $\alpha = 2$  to 3
- · Why? random defects
- Wafer yield: % wafer that is chips
- · Die yield: % chips that work
  - · Yield is increasingly non-binary, fast vs. slow chips

OS

17

21

23

--

# **Fixed Costs**

- · For new chip design
  - Design & verification: ~\$100M (500 person-years @ \$200K per)
  - Amortized over "proliferations", e.g., Xeon/Celeron cache variants
- For new (smaller) technology generation
  - ~\$3B for a new fab
  - · Amortized over multiple designs
  - Amortized by "rent" from companies w/o their own fabs
- Intel's tick-tock (smaller → better)

92

**Transistor Speed** 

21

18

# Moore's Speed Effect #1: Transistor Speed

**Transistor length:** "process generation" 45nm = transistor gate length Source

### **Shrink** transistor length:

- + ↓resistance of channel (shorter)
- + \$\frac{1}{2}gate/source/drain capacitance

Result: switching speed 1 linearly as gate length 1

• much of past performance gains

But 2<sup>nd</sup>-order effects more complicated

- Process variation across chip increasing
  - Some transistors slow, some fast
  - · Increasingly active research area: dealing with this

Diagrams © Krste Asanovic, MIT

Drain

Bulk Si

# Moore's Speed Effect #2: More Transistors

Linear shrink in each of 2 dimensions

- 180 nm, 130 nm, 90 nm, 65 nm, 45 nm, 32 nm, 22 nm, 14 nm, 10 nm, 7 nm, 5 nm, 3 nm, ...
- Each generation is a 1.414 linear shrink
- Results in 2x more transistors (1.414 x 1.414)

More transistors → increased performance

- Job of computer architect: figure out what to do with the ever-increasing # of transistors
- Examples: caches, branch predictors, exploiting parallelism at all levels

23

# Moore's Speed Effect #3: Psychological

Moore's Curve: common interpretation of Moore's Law

- "CPU performance doubles every 18 months"
- Self fulfilling prophecy: 2x in 18 months is ~1% per week
  - Q: Would you add a feature that improved performance 20% if it would delay the chip 8 months?
- Processors under Moore's Curve (arrive too late) fail spectacularly
  - · E.g., Intel's Itanium, Sun's Millennium

24 25

# Power/Energy Increasingly Important

- · Battery life for mobile devices
  - · Laptops, phones, cameras
- Tolerable temperature for devices without active cooling
  - · Power means temperature, active cooling means cost
  - No fan in a cell phone, no market for a hot cell phone
- Electric bill for compute/data centers
  - Pay for power twice: once in, once out (to cool)
- **Environmental concerns** 
  - "Computers" account for growing fraction of energy consumption

Power: energy per unit time

Energy: total amount of energy stored/used

• Related to "performance" (also a "per unit time" metric)

**Energy & Power** 

· Battery life, electric bill, environmental impact

- Power impacts power supply, cooling needs (cost)
- · Peak power vs. average power

**Power & Energy** 

• E.g., camera power "spikes" when you take a picture

Two sources:

27

29

- Dynamic power: active switching of transistors
- Static power: transistors leak even when inactive

26

# How to Reduce Dynamic Power

- Target each component: P<sub>dynamic</sub> ~ N x C x V<sup>2</sup> x f x A
- Reduce number of transistors (N)
  - Use fewer transistors/gates
- Reduce capacitance (C)
  - · Smaller transistors (Moore's law)
- Reduce voltage (V)
  - · Quadratic reduction in energy consumption!
  - But also slows transistors (transistor speed is ~ to V)
- Reduce frequency (f)
  - Slow clock frequency MacBook Air
- Reduce activity (A)
  - "Clock gating" disable clocks to unused parts of chip
  - · Don't switch gates unnecessarily

How to Reduce Static Power

- Target each component: P<sub>static</sub> ~ N x V x e<sup>-Vt</sup>
- Reduce number of transistors (N)
  - · Use fewer transistors/gates
- Reduce voltage (V)
  - · Linear reduction in static energy consumption
  - But also slows transistors (transistor speed is ~ to V)
- Disable transistors (also targets N)
  - "Power gating" disable power to unused parts (long time to power up)
- · Power down units (or entire cores) not being used
- **Dual V<sub>t</sub>** use a mixture of high and low  $V_t$  transistors (slow for SRAM)
- Requires extra fabrication steps (cost)
- Low-leakage transistors
  - · High-K/Metal-Gates in Intel's 45 nm process

# Moore's Effect on Power

- + Reduces power/transistor
  - Reduced sizes and surface areas reduce capacitance (C)
- Increases power density and total power
  - · By increasing transistors/area and total transistors
  - Faster transistors  $\rightarrow$  higher frequency  $\rightarrow$  more power
  - Hotter transistors leak more (thermal runaway)
- What to do? Reduce voltage [486 (5V) → Core2 (1.1V)]
  - + \$\frac{1}{2}\$ dynamic power quadratically, static power linearly
    - Keeping V<sub>t</sub> the same and reducing frequency (F)
    - Lowering V<sub>t</sub> and increasing leakage exponentially

• or new techniques like high-K and dual- $V_T$ Intel found a solution for High-k and metal gate 30 33



High-k Dielectric reduces leakage substantially 1.2nm SiO<sub>2</sub> 3.0nm High-k Silicon substrate Silicon substrate Benefits compared to current process technologies High-k vs. SiO<sub>2</sub> Much faster Capacitance 60% greater **Gate dielectric** > 100x reduction Far cooler intel leakage

**Continuation of Moore's Law** 

90 nm

300

Cu

SiO<sub>2</sub>

Poly-

65 nm

300

Cu

Strained Si

Poly-

45 nm

300

Cu

32 nm

300

Cu

Strained Si Straine Si

High-k

Metal

0.13μm

200/300

Cu

Si

SiO

Poly-silico

Introduction targeted at this time

22 nm

300

High-k

Metal

1st Production

0.25um

200

AI

Si

SiO,

Poly-

0.18µm

200

AI

Si

SiO

Poly-silicon

Process Generatior

Wafer Size

Channel

Inter-connect

Gate electrode

34 35





# Technology Basis for Reliability

### · Transient faults

- A bit "flips" randomly, temporarily
- Cosmic rays etc. (more common at higher altitudes!)

### · Permanent (hard) faults

- A gate or memory cell wears out, breaks and stays broken
- Temperature & electromigration slowly deform components
- Solution for both: redundancy to detect and tolerate

40 41

# Moore's Good Effect on Reliability

- Scaling makes devices less reliable
- + Scaling increases device density to enable **redundancy**
- Examples
  - Error correcting code for memory (DRAM), \$s (SRAM)
  - Core-level redundancy: paired-execution, hot-spare, etc.
  - Intel's Core i7 (Nehalem) uses 8 transistor SRAM cells
     Versus the standard 6 transistor cells
- · Big open questions
  - Can we protect logic efficiently? (w/o 2-3x overhead)
  - Can architectural techniques help hardware reliability?
  - · Can software techniques help?

42

### Moore's Law in the Future

- · Won't last forever, approaching physical limits
  - · But betting against it has proved foolish in the past
  - Likely to "slow" rather than stop abruptly
- · Transistor count will likely continue to scale
  - "Die stacking" is on the cusp of becoming main stream
  - · Uses the third dimension to increase transistor count
- But transistor performance scaling?
  - · Running into physical limits
  - Example: gate oxide is less than 10 silicon atoms thick!
    - Can't decrease it much further
  - · Power is becoming a limiting factor

Moore's Bad Effect on Reliability

# - Transient faults:

- · Small (low charge) transistors are more easily flipped
- Even low-energy particles can flip a bit now

### - Permanent faults:

- Small transistors and wires deform and break more quickly
- · Higher temperatures accelerate the process

Wasn't a problem until ~10 years ago (except in satellites)

- · Memory (DRAM): these dense, small devices hit first
- Then on-chip memory (SRAM)
- Logic is starting to have problems...

.

Summary

43

# Summary of Device Scaling

- + Reduces unit cost
  - But increases startup cost
- + Increases performance
  - · Reduces transistor/wire delay
  - Gives us more transistors with which to increase performance
- + Reduces local power consumption
  - Quickly undone by increased integration, frequency
  - Aggravates power-density and temperature problems
- Aggravates reliability problem
  - + But gives us the transistors to solve it via redundancy

44 47