Monday, August 26, 2019

Notes 4 Power planning and Placement

Power planning 

Power planning is done to provide uniform supply voltage and ground to all cells in the design, for it work in the design. They are given via different layers above the diffusion and the well areas. Different layers are connected through vias that make seemless power supply to all cells from highest to the lowest.


  • Core Power management
  1. VDD and VSS rings are formed around the core and macros.
  2. Power straps are created in the core area to tap power from core rings.
  3. Standard cell rails are created to tap power from power straps to std cell power/ground pins.
  • I/O Power Management
  1.  IO rings for power are established through IO cell abuttment and through IO filler cells.
Power planning pre-requisite:

  • Make sure all the I/O ports are placed and fixed
  • Make sure all the macros are placed and fixed.
  • Make sure all power pins are connected to PG nets logically.
Apply Global Net connection :
  • Connecting the cell PG terms to PG nets.
4 basic elements :
  • Pads
  • Rings
  • Straps
  • Rails
IR drop and EM Analysis:

  • Drop happens in supply voltage when traverses through the power network.
  • Depends on :
  1. Power requirement of the design
  2. Power network structure.
EM
  • Electro Migration(EM) checks on power network depends on :
  1. Design current requirement 
  2. Width of power meshes
IR Drop Causes :

  • Lesser number of PG Stripes
  • Lesser width of core ring width
  • Lesser width of PG Stripes
  • Not choosing the right metal layers
  • missing power-vias
  • not having multi-cut vias
Placement

Placement is an important stage in the PnR process. The placement of instances should be such that it should be near to the border. So it covers least distance hence timing is fast. If the instance is placed away from the input the time taken to reach the flip-flop is more that is, the transition time of the rising edge of a positive edge triggered flip flop increases hence it becomes weaker.

What is placement ?

  • Giving a legal location to all standard cells in design.
  1. Which means the timing and congestion are met
  2. Automatic placement of large number of standard cells, meets timing requirement, given timing constraint, tool will find optimized placement. 
  3. Macros are placed manually if they are in less number, automatic placement will not give uniform core area.
  • Exact placement of the modules to meet PPR - timing, routability and power
  • Steps of placement: 
  1. Logic Connectivity
  2. Trial Route Connectivity - time checking
  • Timing optimization based on trial- route
  • Legalization and placement freezing
Pre -Place

  • It is a process of placing cells before asking the tool to place all the standard cells.
  • There are certain standard cells which needs to placed across the core - area to meet the foundary requirements.
  • Sometimes, we place few standard cells to control the placement.
Types of pre -placed cells these are also called physical cells because only poly is present :

  • End- cap cells - end capacitance charge and discharge the input and output
  • Tap- cells - Continuity of n-well and prevent latch - up problems and provide continuous power to the floating standard cells n-well
  • I/O buffers - transition time and signal integrity maintain due to placed at I/O ports
  • Spare-cells - extra cells placed but not connected for future requirement
  • others
  • Filler cells -to fill rest to the area and provide continuity- they are dummy poly
  • Decap cells- provide continuous supply to the cell if there is load fluctuation
  • Tie cells - avoids antenna effect and protects gate, preventing cell damage
  1. Tie high cells
  2. Tie low cells
Placement Methods :
  • Timing driven
  • Congestion driven
  • Power driven
  • Area driven


The rise time and fall time is not fast because of parasitics in the interconnect which is refered to resistors and capacitors. If cells are placed far from inputs and clocks these parameters are affected :

  • Leakage power
  • Internal power
  • Dynamic power
Adding buffers at each stage improves the signal transition. For a 40nm technology, minimum width of the wire is 0.1 Um and length is 40 Um. Thus, the W/L varies accordingly, and hence affects the I(source to drain) current. The transition time due to interconnect capacitance increases to 600ps.

Signal Degradation can be explained by,

Q=CxV;
I=dQ/dt;
dQ/dt = C x dV/dt;
dV/dt = transition time;
I = C x dV/dt;
dV/dt = I/C;

where C is the output capacitance or interconnect capacitance.

Characteristics of a buffer :
  • improves signal quality - As the drive strength of the buffer increases, implies width increases, hence current increases which flows through the buffer, hence driving capability increases
  • Affects timing
  • Affects signal quality
  • Through buffer more current flows which allows capacitor to discharge/charge fast
  • reduce the load capacitor/interconnect capacitor
  • Power consumption increases as the number of buffer increases
  • Area increases
In PnR process, number of cells increases due to buffer and inverter pair placement as a result number of gate count increases.

We place buffer / inverter pair to reduce the skew,

Skew = Earliest time of clock reaching a load/instance(C1) - Latest time taken by positive edge of clock to reach a different load/instance(Cn); 

where Cn > C1;

Therefore, preplacement is done so the load provided to all outputs should be same.

Scan Chain reordering is a method used to find the data travelling path from start register to the end register.

Reordering routing congestion is done to select the shortest path from start instance to end instance, rather than the turn around path travelling some/all intermediate instances, to reduce congestion and timing.

Adding cell -padding abutted on both sides of the macros to provide spaces which eliminates possibility of overlapping and joining.

Judging placement quality :
  • Unplaced cells - should be 0
  • Cells overlap - should be 0
  • Utilization
  • Timing
  • Congestion
  • Number of instances after optimization
  • total area after optimization
Congestion :

Congestion = Required routing tracks/Available routing tracks;
  • Pins inside GRC decide required tracks
  • technology decides available tracks
  • If GRC with ratio > 1 then design is congested
Looking at congestion maps we can fix it :
  • add placement blockages
  • partial placement blockage to reduce local cell density
  • reordering scan chain to reduce congestion
  • placement with congestion - driven and area_recovery options
  • continue the iterations untill good congestion results
  • add cell- padding to the cells which is causing the congestion.



Notes 3 Floorplan

 Floorplan

What is floorplan?
Floor plan is planning your floor to get optimised congetion and meet timing requirement with adequate power supply to all pins of all instances.

  • Planning your floor - floor is die
  • Die Size Estimation
  • IO and macro placement
  • Channel length estimation
  • Planning power distribution
Steps

  • Die size estimation - The die is made out of a wafer, this size estimation is done to put maximum gates as the technology follows Moore's law.
  • IO port placement - The IO port placement is done on the die to core boundary.
  • Macro placement - According to data - lines connectivity and hierarchy the macros are placed, which appropriate channel width to accommodate all the connections and placed in such a way to reduce wire length.
  • Row creation - Rows are created to place the standard cells around the macros are placed and fixed. 
  • Power routing/ Power planning - Then power grids are provided to each instances according to their requirement.



Floorplan can be judged by the Quality of Result (QoR) parameters are set to consider a floorplan to be optimal is the trade-off between

  1. Power consumed - Due to IR, electromigration, Antenna effects, wire length
  2. Performance at a particular frequency - timing is met, critical nets are met are checked
  3. Area(congestion) - reduce usage of buffers and spare cells to as minimum as possible.
Floorplan is important in ASIC/Soc design

Procedure to perform floorplan
  1. Calculate the die size- check whether it can accomodate all the gate level area + additional area for blocakages, etc.)
Challenges
  1. Routing (Congestion of netlist (functional element))
  2. Utilisation ratio should be 70% = Core Area/Die Area
  • IO and macro placement 
  • Channel length Estimation - for interconnection of pins of instances
  • Power planning distribution- check IR drop which result in Voltage reduction, hence delay which impacts performance, impacts other elements not getting enough power. All metals has IR drop due to intrinsic resistance of metal with robust power grid we can reduce the impact of IR drop
Floorplan types by design limitation

  • Based on outer boundary of die
  1. Core limited design : The chip size is limited by the core size
  2. Pad limited design : The chip size is limited by the pad area in the design. 
Latency - amount of time for clock to reach from one point to another point.

For clock port placement
  1. Reduce latency
For data port placement

Macro placement

  1. Same hierarchy macros are kept together which results in optimization of power, buffer, timing
Channel Length estimation

DRC violation- when the design violates the technology file rules then it is difficult to implement fabrication and further processing impacting signal integrity.

  1. timing transition is met maximum fanout is met
  2. Physical constraints - minimum spacing, minimum width are met
Physical constraints - DRC

  1. Wire shorted
  2. Physical constraint violated wire open
As the metal layers increases the thickness increases 

Area = thickness x width;

Area increases which is inversely proportional to Resistance

Therefore, Power is directly proportional to Resistance, hence it decreases.

PLL - is used in the design due to its useful characters
  1. minimizes the clock edges
  2. defined phase relation
  3. less gitter in edges of clock
  4. due to feedback mechanism

IO pads in a chip-

  1. EHD protection (protection against hand warmth)
  2. Signal Integrity
Poly is always present in the vertical. So macros are placed only horizontally.

I/O Port Placement

  • Assigning Physical location for the I/O ports present in the design
  • Typical I/O port location format:
  1. TCL file
  2. DEF
  3. Excel sheet
Core Area
  • Core Area is defined for the placement of standard cells and hard macros.
  • Standard cell rows are created in the core area for standard cell placement.
  • Height of such row is Cell Row Height(CRH).
  • Height of Standard Cells is equal to or integral multiple of CRH.
  • These Std cell rows could be stacked, abutted, flipped as the technology and design may require.
Macro Placement

  • Macro placement is done based on Connectivity information.
  1. Macros to IO cells
  2. Macro to Macro
  3. Macro placement is very critical for congestion and timing
  4. Macro placement should result in uniform standard cell area.
  • Macro Placement Requires
  1. Fly-line Analysis
  2. Data-flow Analysis
  3. Design Module Hierarchy Analysis
  4. Channel length calculation
Types of Macros
  • Memories -RAM, ROM
  • PLL/DLL
  • ADC/DAC
  • DSP Cores
  • ARM Cores
  • Graphics Cores
Macro Placement Guidelines
  • Group the macros based on
  1. Instance Hierarchy
  2. Connectivity(fly-lines)
  3. Data-flow Diagram
  • Never in middle or center except they are timing critical
Reason :
  • Macro's have blockages(routing) upto M5 leading to congestion
  • More buffers added due to routing detour
  • More delay & power
  • IR drop issues
  1. Place the macros where no I/O ports
  2. place the macros where less number of I/O ports are there
  3. To place macros where more I/O ports are present to provide more routing space
  4. Provide uniform cell area
Placement blockages

What are they?
  • totally or partially block cell placement in an area
  • can prevent any cell or particular type of cells from being placed
  • helps reduce congestion & timing costs
Benefits :
  1. To prevent (or reduce) cell placement in local areas
  2. To reduce congestion in local areas
General floor planning guidelines
  • Place the macros close to the boundary of the core
  • see that macro pins face core side
  • group the macros belonging to same logic block
  • keep sufficient channels between macros
  • may have to control the cell placement around macros
  • avoid notches in floorplan
Congestion:

  • Design area is divided into regions - called Global Route Cells(GRC)
  • Based on technology each GRC has fixed number of tracks in every layer
  • Demand tracks - Available Tracks = GRC Overflow
  • This overflow situation leads to Routing Congestion
Reasons:
  • High Cell Density
  • High pin density
  • Blockages
  • Large Macros Nearby
Solutions :
  • Reduce local cell & pin density
  • remove unwanted blockages
  • review macro placement
Floorplan Qualification : 
  • Number I/O ports short
  • All macro's in placement grid
  • No macro's overlapping
  • Do power routing
  • No PG nets DRC & LVS violations
  • Perform quick placement
  • Analyze the results, if not happy continue all the above steps to qualify the floorplan.







Notes 2 Design Inputs

Design Inputs
Parasitic Capacitance

gds - sent to foundary to fabricate chips tsmc is the largest foundary.
ASIC/SoC method
  • Front end
  1. Developing chip -Architecting chip, VHDL verilog, RTL, verification on netlist, Validation when chip is back.
  • Backend
  1. pnr
  2. post layout STA
  3. Physical Verification
AND gate in netlist format

module AND(A,B,Z)
input A,B;
output Z;
assign Z=A&B
end

OR 

module AND (A,B,Z)
input A,B;
output Z;
AND 2x1 i_and (.i0(A),.i1(B),.O(z))

PnR -> Gate to physical level in file format gds tool.

AND gate needs 3 PMOS , 3 NMOS

PnR steps
  1. Design Input
  • library models creating milky way library
  • Timing - Delay when signal is appearing to ICC compiler. Information of every cell of standard cell library.
  • Sequential Circuit - f(Zprevious output, present input)
  • Combination circuit
sequential element- is a memory element here for simplicity we take D Flip flop it has 1 bit memory storage.
sequential circuit = sequential element + combinational circuit;

Flip -flops are edge triggered - +ve edge triggered and -ve edge triggered. Edge triggered means the ouput arrrives with respect to clock and not with respect to input.

3 timing Arcs are present in the Flip- flop 
  1. clock to Q delay
  2. Setup time 
  3. Hold time
Block Level Implementation- Inputs

  • Library models -Timing (LIB), Physical(LEF)
  • Technology data (LEF)-metal layers,drc rules
  • Design Netlist(.verilog or vhdl)
  • Timing constraints(.sdc)
  • Block size and shape
  • Pins location(Pad location at top level)
  • Blockage information- Placement blockage to avoid placement at some areas, routing blockage to avoid routing in some areas
  • Power network- Pushdown from top level, build from bottom-up
Block Level implementation -Outputs
  • DEF-Design representation with placement and routing
  • LEF-Abstracted model of subchip for top level use with pins location/Blockages.
  • ILM-Abstracted timing model for top level use
  • Spef- RC parameters based on real Routing for each net in design
  • Spice netlist- For running LVS
  • GDSII- physical data (Binary format) understood by different tools.
  • Netlist
Block Level Implementation
  • Load all inputs required into implementation tool
  • Run sanity checks
  • Save design for next step
Design Import - Sanity Checks
  • Library Checks
  • Netlist Checks
  • SDC Checks
  • Netlist vs SDC checks
Design Import - Sanity Checks
  • Library Checks - 
  1. All cells have timing data(.libs for all cells)
  2. All cells have Physical data(.LEF for all cells)
  • Netlist Checks
  1. Floating nets/pins
  2. Multi driven nets
  3. Black- boxed modules
  4. Combinational loops
Design Import - Sanity Checks
  • SDC checks
  1. Clocks reaching all clock pins of flops
  2. Ports missing input/output delays
  3. Ports missing slew/Load constraints
  4. Multiple clocks driving same register
  • Netlist vs. SDC checks
  1. Pre-Layout Timing
LEF-Library Exchange Format provides 2 views - FRAM view and LEF view
LEF is mainly used for showing physical information of cell

Physical view -also has 2 views 1. Abstract view - only required information 2. Layout view - circuit fabrication on silicon, all information is present.

Netlist contains
  1. Number of standard cell, components,logically connected
  2. memory analog input
  3. their connection
Timing Constrainst
  1. Clock information - ICC tool to optimise timing requirement met by clock - setup and hold
  2. Environment information - load, interaction with others, signal strength
  3. IO delay information- delay in input and output ports to send to others modules without deterioration
  4. Block size and shape


Notes 1 -CMOS Fundamentals

Hello Electro Geeks,

Welcome back!!!

To learn more everyday it is necessary to read and gain more knowledge. This post is about my notes which is frequently asked upon. In this post I would like to familiarize you regarding the basics of the PnR flow, what components are used, why those are used and fundamentals of these components and theory.

CMOS FUNDAMENTALS

In the design of the chip we generally use standard cells NAND gates, in MOS logic gate.

There are 2 universal gates which can replicate each logic equation : NAND and NOR gates.
We use NAND gate due to its area occupancy is less than the NOR gate, as we know the PMOS takes more area than the NMOS due to its lower conductivity. NAND transition time is also less than NOR.
There are many devices which has been out after the ENIAC was the invention of a transistor :
BJT- Bipolar Junction Transistor. But due to high power consumption it was disregarded.

MOS - Metal Oxide Semiconductor device

MOS characteristics

  • It is a device used as a switch and amplifier. Switch -when it in linear -[1] and cutoff region -[0], Amplifier - in saturation region
  • Through MOS all digital gates can be designed.
  • Usually we go with CMOS -Complementary MOS , having PMOS and NMOS.
Why we choose CMOS over BJT?
  • In designing we take care of 3 factors optimization : Power, Performance, Area. BJT qualifies in Performance and area, but fails in power optimization because it is a current controlled device intakes huge power consumption run a gate which puts an upper limit to the number of gate used.
  • Whereas MOS is a voltage controlled device gives :huge integration densities and less power consumption compared to only-pmos and only nmos, CMOS optimize all the 3 factors.
  • Number of terminals are 4 : Source, Gate, Drain, Substrate/body

Pic 1

  • Mos is a gate controlled terminal, when gate voltage of an nmos is greater than the threshold voltage , input is transferred to out. Hence, gate controls whether the device is ON or OFF
  • gate ON means : Source - gives carriers and drain collects carriers. Connectivity is established and current flows.
  • Source and drain are interchangable in MOS.

Pic 2

Power Dissipation


Pic 3

NMOS
  • Vth(Threshold voltage)=V(gate voltage) "device just turned ON"
  • NMOS is pull down network 
  • therefore, Vb(Voltage of body) = 0
  • Vb directly proportional to Vth
Condition: 
  • If Vth is low device is much faster and has maximum performance.
  • If Vth increases Static power dissipation decreases and goes into power saver mode, gives slower device performance which is not required.

Pic 4
PMOS
  • Works when Vg < 0
  • Vout =1 when Vg=0
  • Here Vb is inversely proportional to Vth
  • Hence Vb should be at a higher potential to keep Vth  lower.
Body Electric fields
  • There are 2 E-fields in the body vertical and horizontal
  • Vertical : provides channel formation
  • Horizontal : provides current flow
Voltage is directly proportional to the technology

  • Because as technology decreases distance between the 2 port drain and source decreases.
  • As you know E= V/d
  • As d decreases Electric field increases, if V is kept same and it is sufficient to burn the device.
  • So the solution is to reduce the Voltage whilst decreasing the distance od source and drain hence technology.
Parameters to optimize are:
  1. Package 
  2. Cost
  3. Performance
  4. Reliability
MOS power dissipation affects the above parameters hence this step of reducing power dissipation is most important.

MOS power dissipation
  1. Static Power dissipation
  2. Dynamic power dissipation
  3. Leakage power
MOS behaves as a resistor when it is ON it has different intrinsic capacitance and inductance component. Current flowing through MOS contributes to power dissipation.

  • As the technology node decreases, Vth decreases, which inturn increases static power dissipation.
  • Subthreshold conduction increases.
  • Degrades the noise margin, giving logic voltages no longer are equal to supply rails.
  • Leakage current increase static power dissipation.
  • Dynamic power dissipation is not affected by it.
  • Trade - off is needed for supply and the threshold voltage for each operating device.
  • Least static power can be achieved by put the non-active device in stand by mode.
  • Designing mutually exclusive pull-up and pull-down network to eliminate static power dissipation.
  • Using Psedo -NMOS and DCVSL logic can reduce static power dissipation and reduce area requirement of the logic.
Static Power Dissipation
  • Battery drain out
  • MOS pn junction is formed which in intrinsic diode formation in reverse bias cause leakage current and power dissipation.
Dynamic Power Dissipation
  • Usually Dynamic Power Dissipation > Static Power Dissipation
  • Caused by charging and discharging of capacitor
  • 1/2 C(Vdd)^2
  • Dynamic Power Dissipation decreases as Vdd decreases.
Short circuit power (Internal power)
  • When switching 0-1 or 1-0 a time exist when both nmos and pmos are ON
  • Power consumed is the internal power and current finds a short with Vdd and gnd with 2 resistor components dissipates power.
CMOS
  • Due to slow rise time, there occurs charging and discharging of parasitic capacitance present at gate.
  • source and drain forms the 2 plate of the capacitor and dielectric is formed by the gate with SiO2 poly oxide layer.
  • As gate area increases power dissipation decreases.
When Technology decreases device becomes faster, area decreases causing increase in dynamic and static power dissipation.
  • Short channel length effects
  • less noise margin
  • electro migration causes channel break

ASIC/ SoC PHYSICAL DESIGN FLOW
SEMICUSTOM FLOW




3 major tools used are :

  • ICC for PnR place n route by Synopses
  • Invance by Cadence
  • Blast by Mento graphics