

# Lecture 4&5 CMOS Circuits

# Xuan 'Silvia' Zhang Washington University in St. Louis

http://classes.engineering.wustl.edu/ese566/

# Worst-Case V<sub>OL</sub>



a) Identify the worst-case input combination(s) for  $V_{OL}$ . b) Calculate the worst-case value of  $V_{OL}$ . (Assume that all pull-down transistors have the same body bias and initially, that  $V_{OL} \approx 5\% V_{DD}$ .)





# Combinational Logic (Delay Analysis)

Sequential Circuits

Memory

### **RC Delay**



- Lumped Model
  - C only
  - RC model



**Figure 4.11** Distributed versus lumped capacitance model of wire.  $C_{lumped} = L \times c_{wire}$ , with *L* the length of the wire and  $c_{wire}$  the capacitance per unit length. The driver is modeled as a voltage source and a source resistance  $R_{driver}$ .

$$C_{lumped} \frac{\mathrm{d}V_{out}}{\mathrm{d}t} + \frac{V_{out} - V_{in}}{R_{driver}} = 0$$

$$\tau = R_{driver} \times C_{lumped}$$

$$V_{out}(t) = (1 - e^{-t/\tau}) V$$

$$t_{50\%} = 0.69 \times 10 \text{ K}\Omega \times 11 \text{ pF} = 76 \text{ nsec}$$
  
 $t_{90\%} = 2.2 \times 10 \text{ K}\Omega \times 11 \text{ pF} = 242 \text{ nsec}$ 

### Elmore Delay Formula



- Assumptions regarding the RC network
  - the network has a single input node
  - all the capacitors are between a node and the ground
  - the network does not contain any resistive loops (tree)
- Unique resistive path
  - path resistance  $R_{44} = R_1 + R_3 + R_4$
  - shared path resistance  $R_{ik} = \sum R_j \Rightarrow (R_j \in [path(s \rightarrow i) \cap path(s \rightarrow k)])$







#### Example: RC Ladder/Chain







$$\tau_{DN} = \sum_{i=1}^{N} C_i \sum_{j=1}^{i} R_j = \sum_{i=1}^{N} C_i R_{ii}$$

#### What if distributed?

$$\begin{aligned} \tau_{DN} &= \left(\frac{L}{N}\right)^2 (rc + 2rc + \dots + Nrc) = (rcL^2) \frac{N(N+1)}{2N^2} = RC \frac{N+1}{2N} \\ \tau_{DN} &= \frac{RC}{2} = \frac{rcL^2}{2} \end{aligned}$$

#### **Distributed RC Line**





### **Calculate Wire Delay**



- Rule of Thumb
  - RC delay should only be considered when t<sub>pRC</sub>>>t<sub>pgate</sub>



 RC delay should only be considered when the rise/fall time at the line input is smaller than RC: t<sub>rise</sub><RC</li>

| Voltage range                 | Lumped RC network | Distributed RC network |
|-------------------------------|-------------------|------------------------|
| $0 \rightarrow 50\% (t_p)$    | 0.69 <i>RC</i>    | 0.38 RC                |
| 0 → 63% (τ)                   | RC                | 0.5 RC                 |
| $10\% \rightarrow 90\% (t_r)$ | 2.2 RC            | 0.9 RC                 |
| 0% → 90%                      | 2.3 RC            | 1.0 RC                 |

#### **Inverter Propagation Delay**



- Simplified switch model
  - find equivalent resistance

$$R_{eq} = \frac{1}{V_{DD}/2} \int_{V_{DD}/2}^{V_{DD}} \frac{V}{I_{DSAT}(1+\lambda V)} dV \approx \frac{3}{4} \frac{V_{DD}}{I_{DSAT}} \left(1 - \frac{7}{9} \lambda V_{DD}\right)$$
  
with  $I_{DSAT} = k' \frac{W}{L} \left((V_{DD} - V_T) V_{DSAT} - \frac{V_{DSAT}^2}{2}\right)$ 

$$t_{pHL} = \ln(2)R_{eqn}C_L = 0.69R_{eqn}C_L$$
  
$$t_{pLH} = 0.69R_{eqp}C_L$$

$$t_p = \frac{t_{pHL} + t_{pLH}}{2} = 0.69 C_L \left(\frac{R_{eqn} + R_{eqp}}{2}\right)$$



-dv

 $t_p =$ 

### Minimize Inverter Delay

- Reduce C<sub>L</sub>
  - keep the drain diffusion areas as small as possible
- Increase W/L ratio
  - will minimize the delay until intrinsic capacitance dominate → "self-loading"

Β/

- Increase V<sub>DD</sub>
  - reliability concerns
- NMOS/PMOS ratio

$$C_{L} = (C_{dp1} + C_{dn1}) + (C_{gp2} + C_{gn2}) + C_{W}$$

$$C_L = (1+\beta)(C_{dn1}+C_{gn2}) + C_W$$

$$t_p = \frac{0.69}{2} ((1+\beta)(C_{dn1}+C_{gn2})+C_W) \left(R_{eqn} + \frac{R_{eqp}}{\beta}\right)$$
$$= 0.345 ((1+\beta)(C_{dn1}+C_{gn2})+C_W) R_{eqn} \left(1+\frac{r}{\gamma}\right)$$

$$\beta_{opt} = \sqrt{r\left(1 + \frac{C_w}{C_{dn1} + C_{gn2}}\right)}$$





#### Sizing Inverter for Performance

• Inverter delay model

$$t_{p} = 0.69R_{eq}(C_{int} + C_{ext})$$
  
= 0.69R\_{eq}C\_{int}(1 + C\_{ext}/C\_{int}) = t\_{p0}(1 + C\_{ext}/C\_{int})

• Size scaling factor (S)

$$t_p = 0.69(R_{ref}/S)(SC_{iref})(1 + C_{ext}/(SC_{iref}))$$
$$= 0.69R_{ref}C_{iref}\left(1 + \frac{C_{ext}}{SC_{iref}}\right) = t_{p0}\left(1 + \frac{C_{ext}}{SC_{iref}}\right)$$



### Sizing Inverter Chain

• Intrinsic delay

$$C_{int} = \gamma C_g$$
  
$$t_p = t_{p0} \left( 1 + \frac{C_{ext}}{\gamma C_e} \right) = t_{p0} (1 + f/\gamma)$$

• Inverter delay chain





#### **Optimal Number of Inverters in the Chain**



$$\gamma + \sqrt[N]{F} - \frac{\sqrt[N]{F}\ln F}{N} = 0$$

or equivalently  $f = e^{(1 + \gamma/f)}$ 

factor y in an inverter chain.

Table 5.3  $t_{opt}/t_{p0}$  versus x for various driver configurations.

| F      | Unbuffered | Two Stage | Inverter Chain |
|--------|------------|-----------|----------------|
| 10     | 11         | 8.3       | 8.3            |
| 100    | 101        | 22        | 16.5           |
| 1000   | 1001       | 65        | 24.8           |
| 10,000 | 10,001     | 202       | 33.1           |



#### **Examples: Inverter Sizing and Delay**









$$t_p = 0.69R_{dr}C_{int} + (0.69R_{dr} + 0.38R_w)C_w + 0.69(R_{dr} + R_w)C_{fan}$$
  
= 0.69R\_{dr}(C\_{int} + C\_{fan}) + 0.69(R\_{dr}c\_w + r\_wC\_{fan})L + 0.38r\_wc\_wL^2

#### **Propagation Delay of Complex Logic Gates**





ullet

ullet

| Input Data<br>Pattern        | Delay<br>(psec) |
|------------------------------|-----------------|
| <i>A</i> = <i>B</i> = 0→1    | 69              |
| A = 1, B= 0→1                | 62              |
| <i>A</i> = 0→1, <i>B</i> = 1 | 50              |
| <i>A=B</i> =1→0              | 35              |
| <i>A</i> =1, <i>B</i> = 1→0  | 76              |
| A= 1→0, B = 1                | 57              |





(b) RC equivalent model

 $V_{DD}$ 

Ā

Internal cap matters



Figure 6.11 Four input NAND gate and its RC model.

 $t_{pHL} = 0.69(R_1 \cdot C_1 + (R_1 + R_2) \cdot C_2 + (R_1 + R_2 + R_3) \cdot C_3 + (R_1 + R_2 + R_3 + R_4) \cdot C_L)$ 

### Sizing Combinational Network for Performance



- Inverter delay  $t_p = t_{p0} \left(1 + \frac{C_{ext}}{\gamma C_g}\right) = t_{p0} (1 + f/\gamma)$
- Complex logic delay  $t_p = t_{p0}(p + gf/\gamma)$ 
  - p: ratio of the intrinsic (unloaded) delay of the complex gate and the simple inverter. Affected by both topology and layout style
  - g: logic effort
  - f: electrical effort

| Gate type         | р          |  |
|-------------------|------------|--|
| Inverter          | 1          |  |
| n-input NAND      | п          |  |
| n-input NOR       | п          |  |
| n-way multiplexer | 2 <i>n</i> |  |
| XOR, NXOR         | $n2^{n-1}$ |  |

Logic Effort (g)



• For a given capacitive load, complex gates have to work harder than an inverter to produce similar response  $\frac{V_{DD}}{V_{DD}}$   $\frac{V_{DD}}{V_{DD}}$ 



Table 6.4 Logic efforts of common logic gates, assuming a PMOS/NMOS ratio of 2.

|             | Number of Inputs |     |     |          |
|-------------|------------------|-----|-----|----------|
| Gate Type   | 1                | 2   | 3   | n        |
| Inverter    | 1                |     |     |          |
| NAND        |                  | 4/3 | 5/3 | (n+2)/3  |
| NOR         |                  | 5/3 | 7/3 | (2n+1)/3 |
| Multiplexer |                  | 2   | 2   | 2        |
| XOR         |                  | 4   | 12  |          |

### **Optimal Sizing of Combinational Network**

• Gate effort

h=fg  
$$t_p = \sum_{j=1}^{N} t_{p,j} = t_{p0} \sum_{j=1}^{N} \left( p_j + \frac{f_j g_j}{\gamma} \right)$$

• Optimal delay condition

$$f_1g_1 = f_2g_2 = \dots = f_Ng_N$$

$$F = f_1f_2\dots f_N = C_L/C_{g_1}$$

$$G = g_1g_2\dots g_N$$

$$h = \sqrt[N]{FG} = \sqrt[N]{H},$$

$$G = 1 \times \frac{5}{3} \times \frac{5}{3} \times 1 = \frac{25}{9}$$

$$D = t_{p_0} \left(\sum_{j=1}^N p_j + \frac{N(\sqrt[N]{H})}{\gamma}\right)$$





Outline





### Level-Sensitive Latch













SR Latch



• Basic NOR latch



21

**Other SR Latches** 



• Clocked



• NAND SR latch



**Edge-Sensitive Flip-Flop** 







# Combinational Logic (Delay Analysis)

Sequential Circuits

Memory

### Static RAM



- Applications
  - CPU register file, cache, embedded memory, DSP
- Characteristics
  - 6 transistor per cell, other topologies
  - no need to refresh
  - access time ~ cycle time
  - no charge to leak
  - faster, more area, more expensive

## **SRAM Operation**

- Standby
  - word line de-asserted
- Read
  - precharge bit lines
  - assert WL
  - BL rise/drop slightly
- Write
  - apply value to BL
  - assert WL
  - input drivers stronger







source: semiengineering.com





source: semiengineering.com



# Questions?

### Comments?

#### Discussion?

### Homework #3



- Posted on class website
- Due on 2/6 at 2:30pm
- Solution will be posted on 2/5 evening
- Use it as an exercise to prepare for exam
- Will release excerpts from textbook on BlackBoard

### In-Class Exam

- 2/6 in the lecture room
- Starts at 2:40pm and ends at 4:00pm
- Designed to be completed in 60min
- 75% material similar to HW0 and HW1
- 25% material similar to HW2 and HW3



### **Design Tool Tutorials**



• Standard-cell based design flow



- Functional Simulation
  - tool: Synopsys VCS
  - simulate your HDL (eg. Verilog) code to verify functionality
- Logic Synthesis
  - tool: Synopsys Design Compiler (DC)
  - convert/synthesize behavioral/RTL level HDL to gatelevel netlist (i.e. connectivity list)
- Physical Design (Place & Route)
  - tool: Cadence Encounter
  - given the gate-level netlist, place and route the design to complete an IC chip in its final physical form









### Lab1: Design Tool Tutorials

- Will be posted on 2/7 before the Wed lecture
- TA will give hand-on introduction on 2/8
- Please bring your laptop
- Please set up your SEAS account
- Please send your Github ID to Yunfei
- Please walk through the Linuxlab tutorial
- Please read Lab1 before the lecture, so you can ask questions
- Due on 2/22 at 2:30pm



#### Acknowledgement

# Jan Rabaey, "Digital Integrated Circuits", 2006 Cornell University, ECE 5745