VLSI PD Essentials

Monday, August 26, 2019

Notes 4 Power planning and Placement

Power planning

Power planning is done to provide uniform supply voltage and ground to all cells in the design, for it work in the design. They are given via different layers above the diffusion and the well areas. Different layers are connected through vias that make seemless power supply to all cells from highest to the lowest.

Core Power management

VDD and VSS rings are formed around the core and macros.
Power straps are created in the core area to tap power from core rings.
Standard cell rails are created to tap power from power straps to std cell power/ground pins.

I/O Power Management

IO rings for power are established through IO cell abuttment and through IO filler cells.

Power planning pre-requisite:

Make sure all the I/O ports are placed and fixed
Make sure all the macros are placed and fixed.
Make sure all power pins are connected to PG nets logically.

Apply Global Net connection :

Connecting the cell PG terms to PG nets.

4 basic elements :

Pads
Rings
Straps
Rails

IR drop and EM Analysis:

Drop happens in supply voltage when traverses through the power network.
Depends on :

Power requirement of the design
Power network structure.

Electro Migration(EM) checks on power network depends on :

Design current requirement
Width of power meshes

IR Drop Causes :

Lesser number of PG Stripes
Lesser width of core ring width
Lesser width of PG Stripes
Not choosing the right metal layers
missing power-vias
not having multi-cut vias

Placement

Placement is an important stage in the PnR process. The placement of instances should be such that it should be near to the border. So it covers least distance hence timing is fast. If the instance is placed away from the input the time taken to reach the flip-flop is more that is, the transition time of the rising edge of a positive edge triggered flip flop increases hence it becomes weaker.

What is placement ?

Giving a legal location to all standard cells in design.

Which means the timing and congestion are met
Automatic placement of large number of standard cells, meets timing requirement, given timing constraint, tool will find optimized placement.
Macros are placed manually if they are in less number, automatic placement will not give uniform core area.

Exact placement of the modules to meet PPR - timing, routability and power
Steps of placement:

Logic Connectivity
Trial Route Connectivity - time checking

Timing optimization based on trial- route
Legalization and placement freezing

Pre -Place

It is a process of placing cells before asking the tool to place all the standard cells.
There are certain standard cells which needs to placed across the core - area to meet the foundary requirements.
Sometimes, we place few standard cells to control the placement.

Types of pre -placed cells these are also called physical cells because only poly is present :

End- cap cells - end capacitance charge and discharge the input and output
Tap- cells - Continuity of n-well and prevent latch - up problems and provide continuous power to the floating standard cells n-well
I/O buffers - transition time and signal integrity maintain due to placed at I/O ports
Spare-cells - extra cells placed but not connected for future requirement
others
Filler cells -to fill rest to the area and provide continuity- they are dummy poly
Decap cells- provide continuous supply to the cell if there is load fluctuation
Tie cells - avoids antenna effect and protects gate, preventing cell damage

Tie high cells
Tie low cells

Placement Methods :

Timing driven
Congestion driven
Power driven
Area driven

The rise time and fall time is not fast because of parasitics in the interconnect which is refered to resistors and capacitors. If cells are placed far from inputs and clocks these parameters are affected :

Leakage power
Internal power
Dynamic power

Adding buffers at each stage improves the signal transition. For a 40nm technology, minimum width of the wire is 0.1 Um and length is 40 Um. Thus, the W/L varies accordingly, and hence affects the I(source to drain) current. The transition time due to interconnect capacitance increases to 600ps.

Signal Degradation can be explained by,

Q=CxV;

I=dQ/dt;

dQ/dt = C x dV/dt;

dV/dt = transition time;

I = C x dV/dt;

dV/dt = I/C;

where C is the output capacitance or interconnect capacitance.

Characteristics of a buffer :

improves signal quality - As the drive strength of the buffer increases, implies width increases, hence current increases which flows through the buffer, hence driving capability increases
Affects timing
Affects signal quality
Through buffer more current flows which allows capacitor to discharge/charge fast
reduce the load capacitor/interconnect capacitor
Power consumption increases as the number of buffer increases
Area increases

In PnR process, number of cells increases due to buffer and inverter pair placement as a result number of gate count increases.

We place buffer / inverter pair to reduce the skew,

Skew = Earliest time of clock reaching a load/instance(C1) - Latest time taken by positive edge of clock to reach a different load/instance(Cn);

where Cn > C1;

Therefore, preplacement is done so the load provided to all outputs should be same.

Scan Chain reordering is a method used to find the data travelling path from start register to the end register.

Reordering routing congestion is done to select the shortest path from start instance to end instance, rather than the turn around path travelling some/all intermediate instances, to reduce congestion and timing.

Adding cell -padding abutted on both sides of the macros to provide spaces which eliminates possibility of overlapping and joining.

Judging placement quality :

Unplaced cells - should be 0
Cells overlap - should be 0
Utilization
Timing
Congestion
Number of instances after optimization
total area after optimization

Congestion :

Congestion = Required routing tracks/Available routing tracks;

Pins inside GRC decide required tracks
technology decides available tracks
If GRC with ratio > 1 then design is congested

Looking at congestion maps we can fix it :

add placement blockages
partial placement blockage to reduce local cell density
reordering scan chain to reduce congestion
placement with congestion - driven and area_recovery options
continue the iterations untill good congestion results
add cell- padding to the cells which is causing the congestion.

Notes 3 Floorplan

Floorplan

What is floorplan?
Floor plan is planning your floor to get optimised congetion and meet timing requirement with adequate power supply to all pins of all instances.

Planning your floor - floor is die
Die Size Estimation
IO and macro placement
Channel length estimation
Planning power distribution

Steps

Die size estimation - The die is made out of a wafer, this size estimation is done to put maximum gates as the technology follows Moore's law.
IO port placement - The IO port placement is done on the die to core boundary.
Macro placement - According to data - lines connectivity and hierarchy the macros are placed, which appropriate channel width to accommodate all the connections and placed in such a way to reduce wire length.
Row creation - Rows are created to place the standard cells around the macros are placed and fixed.
Power routing/ Power planning - Then power grids are provided to each instances according to their requirement.

Floorplan can be judged by the Quality of Result (QoR) parameters are set to consider a floorplan to be optimal is the trade-off between

Power consumed - Due to IR, electromigration, Antenna effects, wire length
Performance at a particular frequency - timing is met, critical nets are met are checked
Area(congestion) - reduce usage of buffers and spare cells to as minimum as possible.

Floorplan is important in ASIC/Soc design

Procedure to perform floorplan

Calculate the die size- check whether it can accomodate all the gate level area + additional area for blocakages, etc.)

Challenges

Routing (Congestion of netlist (functional element))
Utilisation ratio should be 70% = Core Area/Die Area

IO and macro placement
Channel length Estimation - for interconnection of pins of instances
Power planning distribution- check IR drop which result in Voltage reduction, hence delay which impacts performance, impacts other elements not getting enough power. All metals has IR drop due to intrinsic resistance of metal with robust power grid we can reduce the impact of IR drop

Floorplan types by design limitation

Based on outer boundary of die

Core limited design : The chip size is limited by the core size
Pad limited design : The chip size is limited by the pad area in the design.

Latency - amount of time for clock to reach from one point to another point.

For clock port placement

Reduce latency

For data port placement

Macro placement

Same hierarchy macros are kept together which results in optimization of power, buffer, timing

Channel Length estimation

DRC violation- when the design violates the technology file rules then it is difficult to implement fabrication and further processing impacting signal integrity.

timing transition is met maximum fanout is met
Physical constraints - minimum spacing, minimum width are met

Physical constraints - DRC

Wire shorted
Physical constraint violated wire open

As the metal layers increases the thickness increases

Area = thickness x width;

Area increases which is inversely proportional to Resistance

Therefore, Power is directly proportional to Resistance, hence it decreases.

PLL - is used in the design due to its useful characters

minimizes the clock edges
defined phase relation
less gitter in edges of clock
due to feedback mechanism

IO pads in a chip-

EHD protection (protection against hand warmth)
Signal Integrity

Poly is always present in the vertical. So macros are placed only horizontally.

I/O Port Placement

Assigning Physical location for the I/O ports present in the design
Typical I/O port location format:

TCL file
DEF
Excel sheet

Core Area

Core Area is defined for the placement of standard cells and hard macros.
Standard cell rows are created in the core area for standard cell placement.
Height of such row is Cell Row Height(CRH).
Height of Standard Cells is equal to or integral multiple of CRH.
These Std cell rows could be stacked, abutted, flipped as the technology and design may require.

Macro Placement

Macro placement is done based on Connectivity information.

Macros to IO cells
Macro to Macro
Macro placement is very critical for congestion and timing
Macro placement should result in uniform standard cell area.

Macro Placement Requires

Fly-line Analysis
Data-flow Analysis
Design Module Hierarchy Analysis
Channel length calculation

Types of Macros

Memories -RAM, ROM
PLL/DLL
ADC/DAC
DSP Cores
ARM Cores
Graphics Cores

Macro Placement Guidelines

Group the macros based on

Instance Hierarchy
Connectivity(fly-lines)
Data-flow Diagram

Never in middle or center except they are timing critical

Reason :

Macro's have blockages(routing) upto M5 leading to congestion
More buffers added due to routing detour
More delay & power
IR drop issues

Place the macros where no I/O ports
place the macros where less number of I/O ports are there
To place macros where more I/O ports are present to provide more routing space
Provide uniform cell area

Placement blockages

What are they?

totally or partially block cell placement in an area
can prevent any cell or particular type of cells from being placed
helps reduce congestion & timing costs

Benefits :

To prevent (or reduce) cell placement in local areas
To reduce congestion in local areas

General floor planning guidelines

Place the macros close to the boundary of the core
see that macro pins face core side
group the macros belonging to same logic block
keep sufficient channels between macros
may have to control the cell placement around macros
avoid notches in floorplan

Congestion:

Design area is divided into regions - called Global Route Cells(GRC)
Based on technology each GRC has fixed number of tracks in every layer
Demand tracks - Available Tracks = GRC Overflow
This overflow situation leads to Routing Congestion

Reasons:

High Cell Density
High pin density
Blockages
Large Macros Nearby

Solutions :

Reduce local cell & pin density
remove unwanted blockages
review macro placement

Floorplan Qualification :

Number I/O ports short
All macro's in placement grid
No macro's overlapping
Do power routing
No PG nets DRC & LVS violations
Perform quick placement
Analyze the results, if not happy continue all the above steps to qualify the floorplan.

Notes 2 Design Inputs

Design Inputs

Parasitic Capacitance

gds - sent to foundary to fabricate chips tsmc is the largest foundary.

ASIC/SoC method

Front end

Developing chip -Architecting chip, VHDL verilog, RTL, verification on netlist, Validation when chip is back.

Backend

pnr
post layout STA
Physical Verification

AND gate in netlist format

module AND(A,B,Z)

input A,B;

output Z;

assign Z=A&B

end

module AND (A,B,Z)

input A,B;

output Z;

AND 2x1 i_and (.i0(A),.i1(B),.O(z))

PnR -> Gate to physical level in file format gds tool.

AND gate needs 3 PMOS , 3 NMOS

PnR steps

Design Input

library models creating milky way library
Timing - Delay when signal is appearing to ICC compiler. Information of every cell of standard cell library.
Sequential Circuit - f(Zprevious output, present input)
Combination circuit

sequential element- is a memory element here for simplicity we take D Flip flop it has 1 bit memory storage.

sequential circuit = sequential element + combinational circuit;

Flip -flops are edge triggered - +ve edge triggered and -ve edge triggered. Edge triggered means the ouput arrrives with respect to clock and not with respect to input.

3 timing Arcs are present in the Flip- flop

clock to Q delay
Setup time
Hold time

Block Level Implementation- Inputs

Library models -Timing (LIB), Physical(LEF)
Technology data (LEF)-metal layers,drc rules
Design Netlist(.verilog or vhdl)
Timing constraints(.sdc)
Block size and shape
Pins location(Pad location at top level)
Blockage information- Placement blockage to avoid placement at some areas, routing blockage to avoid routing in some areas
Power network- Pushdown from top level, build from bottom-up

Block Level implementation -Outputs

DEF-Design representation with placement and routing
LEF-Abstracted model of subchip for top level use with pins location/Blockages.
ILM-Abstracted timing model for top level use
Spef- RC parameters based on real Routing for each net in design
Spice netlist- For running LVS
GDSII- physical data (Binary format) understood by different tools.
Netlist

Block Level Implementation

Load all inputs required into implementation tool
Run sanity checks
Save design for next step

Design Import - Sanity Checks

Library Checks
Netlist Checks
SDC Checks
Netlist vs SDC checks

Design Import - Sanity Checks

Library Checks -

All cells have timing data(.libs for all cells)
All cells have Physical data(.LEF for all cells)

Netlist Checks

Floating nets/pins
Multi driven nets
Black- boxed modules
Combinational loops

Design Import - Sanity Checks

SDC checks

Clocks reaching all clock pins of flops
Ports missing input/output delays
Ports missing slew/Load constraints
Multiple clocks driving same register

Netlist vs. SDC checks

Pre-Layout Timing

LEF-Library Exchange Format provides 2 views - FRAM view and LEF view

LEF is mainly used for showing physical information of cell

Physical view -also has 2 views 1. Abstract view - only required information 2. Layout view - circuit fabrication on silicon, all information is present.

Netlist contains

Number of standard cell, components,logically connected
memory analog input
their connection

Timing Constrainst

Clock information - ICC tool to optimise timing requirement met by clock - setup and hold
Environment information - load, interaction with others, signal strength
IO delay information- delay in input and output ports to send to others modules without deterioration
Block size and shape

Notes 1 -CMOS Fundamentals

Hello Electro Geeks,

Welcome back!!!

To learn more everyday it is necessary to read and gain more knowledge. This post is about my notes which is frequently asked upon. In this post I would like to familiarize you regarding the basics of the PnR flow, what components are used, why those are used and fundamentals of these components and theory.

CMOS FUNDAMENTALS

In the design of the chip we generally use standard cells NAND gates, in MOS logic gate.

There are 2 universal gates which can replicate each logic equation : NAND and NOR gates.
We use NAND gate due to its area occupancy is less than the NOR gate, as we know the PMOS takes more area than the NMOS due to its lower conductivity. NAND transition time is also less than NOR.
There are many devices which has been out after the ENIAC was the invention of a transistor :
BJT- Bipolar Junction Transistor. But due to high power consumption it was disregarded.

MOS - Metal Oxide Semiconductor device

MOS characteristics

It is a device used as a switch and amplifier. Switch -when it in linear -[1] and cutoff region -[0], Amplifier - in saturation region
Through MOS all digital gates can be designed.
Usually we go with CMOS -Complementary MOS , having PMOS and NMOS.

Why we choose CMOS over BJT?

In designing we take care of 3 factors optimization : Power, Performance, Area. BJT qualifies in Performance and area, but fails in power optimization because it is a current controlled device intakes huge power consumption run a gate which puts an upper limit to the number of gate used.
Whereas MOS is a voltage controlled device gives :huge integration densities and less power consumption compared to only-pmos and only nmos, CMOS optimize all the 3 factors.
Number of terminals are 4 : Source, Gate, Drain, Substrate/body

Pic 1

Mos is a gate controlled terminal, when gate voltage of an nmos is greater than the threshold voltage , input is transferred to out. Hence, gate controls whether the device is ON or OFF
gate ON means : Source - gives carriers and drain collects carriers. Connectivity is established and current flows.
Source and drain are interchangable in MOS.

Pic 2

Power Dissipation

Pic 3

NMOS

Vth(Threshold voltage)=V(gate voltage) "device just turned ON"
NMOS is pull down network
therefore, Vb(Voltage of body) = 0
Vb directly proportional to Vth

Condition:

If Vth is low device is much faster and has maximum performance.
If Vth increases Static power dissipation decreases and goes into power saver mode, gives slower device performance which is not required.

Pic 4

PMOS

Works when Vg < 0
Vout =1 when Vg=0
Here Vb is inversely proportional to Vth
Hence Vb should be at a higher potential to keep Vth lower.

Body Electric fields

There are 2 E-fields in the body vertical and horizontal
Vertical : provides channel formation
Horizontal : provides current flow

Voltage is directly proportional to the technology

Because as technology decreases distance between the 2 port drain and source decreases.
As you know E= V/d
As d decreases Electric field increases, if V is kept same and it is sufficient to burn the device.
So the solution is to reduce the Voltage whilst decreasing the distance od source and drain hence technology.

Parameters to optimize are:

Package
Cost
Performance
Reliability

MOS power dissipation affects the above parameters hence this step of reducing power dissipation is most important.

MOS power dissipation

Static Power dissipation
Dynamic power dissipation
Leakage power

MOS behaves as a resistor when it is ON it has different intrinsic capacitance and inductance component. Current flowing through MOS contributes to power dissipation.

As the technology node decreases, Vth decreases, which inturn increases static power dissipation.
Subthreshold conduction increases.
Degrades the noise margin, giving logic voltages no longer are equal to supply rails.
Leakage current increase static power dissipation.
Dynamic power dissipation is not affected by it.
Trade - off is needed for supply and the threshold voltage for each operating device.
Least static power can be achieved by put the non-active device in stand by mode.
Designing mutually exclusive pull-up and pull-down network to eliminate static power dissipation.
Using Psedo -NMOS and DCVSL logic can reduce static power dissipation and reduce area requirement of the logic.

Static Power Dissipation

Battery drain out
MOS pn junction is formed which in intrinsic diode formation in reverse bias cause leakage current and power dissipation.

Dynamic Power Dissipation

Usually Dynamic Power Dissipation > Static Power Dissipation
Caused by charging and discharging of capacitor
1/2 C(Vdd)^2
Dynamic Power Dissipation decreases as Vdd decreases.

Short circuit power (Internal power)

When switching 0-1 or 1-0 a time exist when both nmos and pmos are ON
Power consumed is the internal power and current finds a short with Vdd and gnd with 2 resistor components dissipates power.

CMOS

Due to slow rise time, there occurs charging and discharging of parasitic capacitance present at gate.
source and drain forms the 2 plate of the capacitor and dielectric is formed by the gate with SiO2 poly oxide layer.
As gate area increases power dissipation decreases.

When Technology decreases device becomes faster, area decreases causing increase in dynamic and static power dissipation.

Short channel length effects
less noise margin
electro migration causes channel break

ASIC/ SoC PHYSICAL DESIGN FLOW

SEMICUSTOM FLOW

3 major tools used are :

ICC for PnR place n route by Synopses
Invance by Cadence
Blast by Mento graphics