DAAD Program Akademischer Neuaufbau Südosteuropa Embedded System Design Course SOC Design: From...
-
Upload
alexus-riden -
Category
Documents
-
view
219 -
download
2
Transcript of DAAD Program Akademischer Neuaufbau Südosteuropa Embedded System Design Course SOC Design: From...
DAAD Program „Akademischer Neuaufbau Südosteuropa“
Embedded System Design Course
SOC Design: From System to Transistor
Zoran Stamenković
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Outline
• Modeling Systems
• Simulation and Verification
• Analog Integrated Circuits
• Digital Integrated Circuits
• Embedded Memories
• Logic Synthesis
• Design for Testability
• Layout Generation
• Design for Manufacturability
• SOC Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Domains and Levels
• ESL Design
• Basics of HDL
• Gate Modeling
• Delay Modeling
• Power Modeling
• Effects of Parasitics
• Logic Optimization
Modeling Systems
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Domains and Levels
• Open Systems Interconnection (OSI) model of network communication
Local area network (LAN) technologies are defined by standards that describe unique functions at both the Physical and the Data Link layers
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• 802.11 Wireless LAN modem
Modulates outgoing digital signals from a computer or other digital device to an analogue (radio) signal
Demodulates the incoming analogue (radio) signal and converts it to a digital signal for the digital device
Domains and Levels
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
MIMO and MIMAX WLAN Modems
Signal processing performed in the analogue RF domain
Number of the digital basebands reduced to a single one
Signal processing performed in the digital baseband
Domains and Levels
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
BehavioralStructural
Physical
Analysis
Synthesis
Generation
Extraction
RefinementAbstraction
Domains and Levels
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
BehavioralStructural
Physical
Algorithm(behavioral)
Register-TransferLanguage
Boolean Equations
Differential Equations
Behavioral Domain
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
BehavioralStructural
Physical
Processor-MemorySystem
Registers
Gates
Transistors
Structural Domain
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
BehavioralStructural
Physical
Polygons
Sticks
Standard Cells
Floor Plan
Physical Domain
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• The point of a system level model is to capture the intent of the design
Design does exactly what it is defined to do, and the model is the definition of what the design does
It allows software developers to test their code on a working model
• The value of system level modeling is in helping us to understand the implications of our intent
To explore responses to the stimulus in an useful way
• ESL Languages
UML, SystemC, SystemVerilog
• ESL Verification
“No amount of experimentation can ever prove me right; a single experiment can prove me wrong” – Albert Einstein
The system level testbench languages and methodologies that exist today are woefully inadequate
If one tries to capture enough information in ESL to verify RTL, then one might as well write RTL
Electronic System Level Design
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• The environment that provides models of memories, connectors, and queues that can be interconnected with configured processors into an overall system model
Processor and device interfaces are at the transaction level
Transaction-level modeling requests for SOC architecture assembly and simulation tools
If RTL IP blocks present, HW/SW co-verification tools needed
Electronic System Level Design
XtensaProcessorGenerator
Use standard ASIC/COT design techniques andlibraries for any IC fabrication process
Complete Hardware DesignSource pre-verified RTL, EDA scripts, test suite
CustomizedSoftware Tools
C/C++ compiler Debuggers SimulatorsRTOSes
ProcessorExtensions
int main( ){ int i; short c[100]; for (i=0;i<N/2;i++) {
int main( ){ int i; short c[100]; for (i=0;i<N/2;i++) {
ANSI C/C++ Code
XPRESCompiler
ProcessorConfiguration1. Select from menu2. Add instruction
description (TIE)3. Automatic instruction
discovery (XPRES)
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Electronic System Level Design
Auto-generated XTMP model based on memory maps
• Specify chip-level memory maps for shared/private memories
• Place interrupt and reset vectors• Assign code/data to distributed
memories
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Motivation for HDL
Increased hardware complexity
Design space exploration
Inexpensive alternative to prototyping
• General features
Support for describing circuit connectivity
High-level programming language support for describing behavior
Support for timing information (constraints, etc.)
Support for concurrency
• VHDL
IEEE Standard 1076-1987
IEEE Standard 1076-1993
Extension VHDL-AMS-1999
• Verilog
IEEE Standard 1364-1995
IEEE Standard 1364-2000
Hardware Description Languages
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Entity (VHDL) or Module (Verilog) declaration
Describes the input/output ports of a module
entity reg3 isport ( d0, d1, d2, en, clk : in bit;
q0, q1, q2 : out bit );end;
name port names
port mode (direction)
port type (VHDL only)
reserved words punctuation
module reg3 ( d0, d1, d2, en, clk, q0, q1, q2 ); input d0, d1, d2, en, clk;output q0, q1, q2;
endmodule
VHDL
Verilog
Modeling Interfaces
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Architecture Body (VHDL)
Describes an implementation of an entity
May be several per entity
• Module (Verilog)
Is unique
• Behavioral Architecture
Describes the algorithm performed by the module
Contains
Procedural Statements, each containing
Sequential Statements, including
Assignment Statements and
Wait Statements
Modeling Behavior
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
entity reg3 isport ( d0, d1, d2, en, clk : in bit;
q0, q1, q2 : out bit );end;architecture behav of reg3 isbegin
process ( d0, d1, d2, en, clk ) begin
if en = '1' and clk = '1' then q0 <= d0 after 5 ns; q1 <= d1 after 5 ns; q2 <= d2 after 5 ns;end if;
end process;end;
`timescale 1ns/10psmodule reg3 ( d0, d1, d2, en, clk,
q0, q1, q2 ); input d0, d1, d2, en, clk;output q0, q1, q2; reg q0, q1, q2;
always @ ( d0 or d1 or d2 or en or clk ) if ( en & clk ) begin q0 <= #5 d0; q1 <= #5 d1; q2 <= #5 d2; endendmodule Verilog
VHDL
Behavior Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Structural Architecture
Implements the module as a composition of components
Contains
Signal Declarations (entity ports are also signals)
Declare internal connections
Component Instances
Instantiate previously declared entity/architecture pairs
Port Maps in component instances
Connect signals to component ports
Wait Statements
Suspend a process or procedure
Modeling Structure
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Structure Example
d0
d1
d2
q0
q1
q2
bit0
d_latch
d
clk
q
bit1
d_latch
d
clk
q
bit2
d_latch
d
clk
q
int_clken
clk
gate
and2
a
b
y
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
entity d_latch isport ( d, clk : in bit; q : out bit );
end;architecture basic of d_latch isbegin
process ( d, clk ) begin
if clk = ‘1’ thenq <= d after 5 ns;
end if;end process;
end;
entity and2 isport ( a, b : in bit; y : out bit );
end;architecture basic of and2 isbegin
process ( a, b )begin
y <= a and b after 5 ns;end process;
end;
`timescale 1ns/10psmodule d_latch ( d, clk, q );
input d, clk;output q; reg q;
always @ ( d or clk ) if ( clk ) begin q <= #5 d; endendmodule
`timescale 1ns/10psmodule and2 ( a, b, y );
input a, b;output y; reg y;
always @ ( a or b ) begin y <= #5 ( a & b ); endendmodule
VHDL
VHDL
Verilog
Verilog
Structure Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
entity reg3 isport ( d0, d1, d2, en, clk : in bit;
q0, q1, q2 : out bit );end;architecture struct of reg3 is
component d_latchport ( d, clk : in bit; q : out bit );
end component;
component and2port ( a, b : in bit; y : out bit );
end component;
signal int_clk : bit;
begin
bit0 : d_latch port map ( d0, int_clk, q0 );
bit1 : d_latch port map ( d1, int_clk, q1 );
bit2 : d_latch port map ( d2, int_clk, q2 );
gate : and2 port map ( en, clk, int_clk );
end;
module reg3 ( d0, d1, d2, en, clk, q0, q1, q2 );
input d0, d1, d2, en, clk;output q0, q1, q2;
wire int_clk; d_latch bit0 ( d0, int_clk, q0 ); d_latch bit1 ( d1, int_clk, q1 ); d_latch bit2 ( d2, int_clk, q2 ); and2 gate ( en, clk, int_clk );endmodule
VHDL
Verilog
Structure Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• An architecture can contain both behavioral and structural parts
Process Statements and Component Instances
Collectively called Concurrent Statements
Processes can read and assign to signals
• Example: Register-transfer-language model
Data-path described structurally
Control section described behaviorally
Mixing Behavior and Structure
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
shift_reg
reg
shift_adder
control_section
multiplier multiplicand
product
Mixed Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
entity multiplier isport ( clk, reset : in bit; multiplicand, multiplier : in integer;
product : out integer );end;
architecture mixed of multiplier is
signal partial_product, full_product : integer;signal arith_control, result_en, mult_bit, mult_load : bit;
begin
arith_unit : entity work.shift_adder(behavior)port map ( addend => multiplicand, augend => full_product,
sum => partial_product,add_control => arith_control );
result : entity work.reg(behavior)port map ( d => partial_product, q => full_product,
en => result_en, reset => reset );
...
Mixed Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
…
multiplier_sr : entity work.shift_reg(behavior)port map ( d => multiplier, q => mult_bit,
load => mult_load, clk => clk );
product <= full_product;
control_section : process is-- variable declarations for control_section-- …
begin-- sequential statements to assign values to control signals-- …wait on clk, reset;
end process control_section;
end;
Mixed Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Function
f = a’b + ab’: a is a variable, a and a’ are literals, ab’ is a term
• Irredundant Function
No literal can be removed without changing its value
• Implementing logic functions is non-trivial
No logic gates in the library for all logic expressions
A logic expression may map into gates that consume a lot of area, time, or power
• A set of functions f1, f2, ... is complete if every Boolean function can be generated by a combination of the functions from the set
NAND is a complete set
NOR is a complete set
AND and OR are not complete
Transmission gates are not complete
• Incomplete set of logic gates
No way to design arbitrary logic
Logic Functions
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
a out
+
c
a
Inverter
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Inverter
P-type substrate (bulk)
SiO2
metal (Al)
N+N+ drainN+ source
N well
P+ sourceP+ drain
INPUT INPUT
V+
OUTPUTGND
N well
Active areas
Polysilicon
P+(N+) doping
Contacts
Metal
(a)
(b)
(c)
Input
Output
V+
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Complementary switch produces full-supply voltages for both logic 0 and logic 1
n-type transistor conducts logic 0
p-type transistor conducts logic 1
Switches
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
b
+
a
out
b
a
VDD
GND
tubties
out
NAND Gate
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
b
a
out
a
b
VDD
GND
tub ties
out
NOR Gate
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• AOI = and/or/invert
• OAI = or/and/invert
• Implement larger functions
• Pull-up and pull-down networks are compact
Smaller area, higher speed than NAND/NOR network equivalents
• AOI312
And 3 inputs
And 1 input (dummy)
And 2 inputs
Or together these terms
Invert
AOI/OAI Gates
and
or
invert
out = [ab+c]’
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Solid logic 0/1 defined by VSS/VDD
• Inner bounds of logic values VL/VH are not directly determined by circuit properties, as in some other logic families
logic 1
logic 0
unknown
VDD
VSS
VH
VL
• Levels at output of one gate must be sufficient to drive next gate
Logic Levels
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Choose threshold voltages at points where slope of transfer curve is -1
• Inverter has
High gain between VIL and VIH points
Low gain at outer regions of transfer curve
• Note that logic 0 and 1 regions are not equally sized
In this case, high pull-up resistance leads to smaller logic 1 range
• Noise margins are VDD-VIH and VIL-VSS
Noise must exceed noise margin to make second gate produce wrong output
Inverter Transfer Curve
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Resistor model of transistor
Ignores saturation region
Mischaracterizes linear region
Gives acceptable results
Inverter Delay
• Only one transistor is on at the time
Rise time (pull-up on)
Fall time (pull-up off)
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Delay
Time required for gate’s output to reach 50% of final value
• Transition time
Time required for gate’s output to reach 10% (logic 0) or 90% (logic 1) of final value
• Gate delay based on RC time constant
Vout(t) = VDD exp{-t/[(Rn+RL)CL]}
td = 0.69 RnCL
tf = 2.3 RnCL
• 0.5 m process
Rn = 3.9 k
CL = 0.68 fF
td = 0.69 x 3.9 x .68E-15 = 1.8 ps
tf = 2.3 x 3.9 x .68E-15 = 6.1 ps
• For pull-up time, use pull-up resistance
• Current source model (in power/delay studies)
tf = CL (VDD-VSS)/[0.5 k’ (W/L) (VDD-VSS -Vt)2]
• Fitted model
Fit curve to measured circuit characteristics
RC Model for Delay
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Step Input (VGS = VDD) Approximation
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Source voltage of gates in middle of network may not equal substrate voltage
Difference between source and substrate voltages causes body effect
0
0
Source above VSS
Early arriving signal
• To minimize body effect
Put early arriving signals at transistors closest to power supply
Body Effect
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Clock frequency
f = 1/t
• Energy
E = CL(VDD - VSS)2
• Power
E x f = f CL(VDD - VSS)2
• Almost all power consumption comes from switching behavior
A single cycle requires one charge and one discharge of capacitor
• Static power dissipation
Comes from leakage currents
• Surprising result
Resistance of the pull-up/pull-down transistor drops out of energy calculation
Power consumption is independent of the sizes of the pull-up and pull-down transistors
• Static CMOS power-delay product is independent of frequency
Voltage scaling depends on this fact
Power Consumption
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Capacitance on power supply is not bad
Can be good in absence of inductance
• Resistance slows down static gates
May cause pseudo-nMOS circuits to fail
• Increasing capacitance/resistance
Reduces input slope
• Resistance near source is more damaging
It must charge more capacitance
Effects of Parasitics
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Sometimes, large loads must be driven
Off-chip or by long wires on-chip
• Sizing up the driver transistors only pushes back the problem
Driver now presents larger capacitance to earlier stage
• Use a chain of inverters
Each stage has transistors larger than previous stage
is the driver size ratio, Cbig/Cd=n, ln(Cbig/Cd) = n ln
• Minimize total delay through the driver chain
ttot = ln(Cbig/Cd)(/ln)td
• Optimal driver size ratio is opt = e
• Optimal number of stages is nopt = ln(Cbig/Cd)
Optimal Sizing
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Driving Large Fan-Out
• Fan-out adds capacitance
• Increase sizes of driver transistors
Must take into account rules for driving large loads
• Add intermediate buffers
This may require/allow restructuring of the logic
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Network delay is measured over paths through network
Can trace a causality chain from inputs to worst-case output
• Critical path creates longest delay
Can trace transitions which cause delays that are elements of the critical path delay
• To reduce circuit delay, speed up the critical path
Reducing delay off the path doesn’t help
• There may be more than one path of the same delay
Must speed up all equivalent paths to speed up circuit
Path Delay
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Logic gates are not simple nodes
Some input changes don’t cause output changes
• A false path is a path which cannot be exercised due to Boolean gate conditions
False paths cause pessimistic delay estimates
False Paths
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Rewrite by using sub-expressions
Logic rewrites may affect gate placement
• Flattening logic
Increases gate fan-in
• Logic synthesis programs
Transform Boolean expressions into logic gate networks in a particular library
Deep LogicShallow Logic
Logic Transformations
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Optimization goals
Minimize area, meet delay constraint
• Technology-independent optimization
Works on Boolean expression equivalent
Estimates size based on number of literals
Uses factorization, resubstitution, minimization, etc.
Uses simple delay models
• Technology-dependent optimization
Maps Boolean expressions into a particular cell library
May perform some optimizations on addition to simple mapping
Allows more accurate delay models
Logic Optimization
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Simulation
• Verification
• Annotation
Simulation and Verification
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Simulation
Tests the functionality of a design’s elaborated model
Needs a test bench and a simulation tool
Advances in discrete time steps
• Test Bench
Includes an instance of the design under test
Applies sequences of test values to inputs
Monitors signal values on outputs using simulator
• Simulation Tools
NCSIM (Cadence)
VSIM (Mentor Graphics)
VCS (Synopsys)
Simulation
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Event-driven simulation is designed for digital circuit characteristics
Small number of signal values
Relatively sparse activity over time
• Event-driven simulators try to update only those signals which change in order to reduce CPU time requirements
An event is a change in a signal value
A time-wheel is a queue of events
• Simulator traces structure of circuit to determine causality of events
Event at input of one gate may cause new event at gate’s output
Event-Driven Simulation
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Special type of event-driven simulation optimized for MOS transistors
Treats the transistor as a switch
Takes capacitance into account to model charge sharing
• Can also be enhanced to model the transistor as a resistive switch
A
B
CD
Switch Simulation
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
entity test_bench isend;
architecture test_reg3 of test_bench is
signal d0, d1, d2, en, clk, q0, q1, q2 : bit;
begin
dut : entity work.reg3(behav)port map ( d0, d1, d2, en, clk, q0, q1, q2 );
stimulus : process isbegin
d0 <= ’1’; d1 <= ’1’; d2 <= ’1’; wait for 20 ns; en <= ’0’; clk <= ’0’; wait for 20 ns;en <= ’1’; wait for 20 ns;clk <= ’1’; wait for 20 ns;d0 <= ’0’; d1 <= ’0’; d2 <= ’0’; wait for 20 ns;…wait;
end process stimulus;
end;
Test Bench Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• To test a refinement of a design
Low-level structural model must be functionally the same as a corresponding behavioral model
• To include two instances of a design in the test bench
To stimulate both with same test values on inputs
To compare values of outputs for equality
• To take account of timing differences
Zero delay
Unit delay
Gate delay
RC delay
Verification
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
architecture regression of test_bench is
signal d0, d1, d2, d3, en, clk : bit;signal q0a, q1a, q2a, q3a, q0b, q1b, q2b, q3b : bit;
begin
dut_a : entity work.reg4(struct)port map ( d0, d1, d2, d3, en, clk, q0a, q1a, q2a, q3a );
dut_b : entity work.reg4(behav)port map ( d0, d1, d2, d3, en, clk, q0b, q1b, q2b, q3b );
stimulus : process isbegin
d0 <= ’1’; d1 <= ’1’; d2 <= ’1’; d3 <= ’1’; wait for 20 ns; en <= ’0’; clk <= ’0’; wait for 20 ns;en <= ’1’; wait for 20 ns;clk <= ’1’; wait for 20 ns;…wait;
end process stimulus;
...
Verification Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
…
verify : process isbegin
wait for 10 ns;assert q0a = q0b and q1a = q1b and q2a = q2b and q3a = q3b
report ”implementations have different outputs”severity error;
wait on d0, d1, d2, d3, en, clk;end process verify;
end architecture regression;
Verification Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Standard Delay Format (SDF) annotation
Design timing is stored in an SDF file
Used to iteratively improve design
• Updates a more-abstract design with information from later design stages
Annotation of logic schematic with extracted parasitic resistances and capacitances
• Back annotation requires tools to know more about each other
Simulation tools
Synthesis tools
Layout tools
Annotation
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
(DELAYFILE
(SDFVERSION "OVI 1.0")
(DESIGN "tcp_1_chip")
(DATE "Fri Apr 30 09:48:22 2004")
(VENDOR "cdr3synPwcslV225T125")
(PROGRAM "Synopsys Design Compiler cmos")
(VERSION "2003.06")
(DIVIDER /)
(VOLTAGE 2.25:2.25:2.25)
(PROCESS)
(TEMPERATURE 125.00:125.00:125.00)
(TIMESCALE 1ns)
(CELL
(CELLTYPE "tcp_1_chip")
(INSTANCE)
(DELAY
(ABSOLUTE
(INTERCONNECT U5/x U81/a (0.000:0.000:0.000))
(INTERCONNECT U73/x U74/a (0.000:0.000:0.000))
...
)
)
)
(CELL
(CELLTYPE "exnor2_1")
(INSTANCE i_aes_wr/U_ALG/U6533)
(DELAY
(ABSOLUTE
(IOPATH a x (0.662:1.045:1.045) (0.682:1.076:1.076))
(IOPATH b x (1.379:1.416:1.416) (1.454:1.492:1.492))
)
)
)
...
(CELL
(CELLTYPE "mux2_2")
(INSTANCE i_mips/u0/ejt_tap\/pa_addr_reg_next\/bit_00i/U1)
(DELAY
(ABSOLUTE
(IOPATH d0 x (0.395:0.395:0.395) (0.464:0.464:0.464))
(IOPATH d1 x (0.387:0.403:0.403) (0.447:0.477:0.477))
(IOPATH sl x (1.768:1.781:1.781) (1.879:1.892:1.892))
)
)
)
)
Standard Delay Format
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Filters
• Amplifiers
• Phase Lock Loop
• Voltage Control Oscillator
• Modulator/Demodulator
Analog Integrated Circuits
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Fairchild Semiconductor μA741 Op-Amp
• In 1963, a 26-year-old engineer named Robert Widlar designed the first monolithic op-amp IC, the μA702
• Price at the beginning was $300
• Fairchild and competitors have sold it in the hundreds of millions
• Now, for $300 you can get about a thousand of today’s 741 chips
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Signetics NE555 Timer
• A simple IC from 1971 that could function as a timer or an oscillator
• It would become a best seller in analog semiconductors
Kitchen appliances
Toys
Spacecraft
A few thousand other things
• Many billions have been sold
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Intersil ICL8038 Waveform Generator
• A generator of sine, square, triangular, sawtooth, and pulse waveforms from 1983
• Countless applications
Music synthesizers
“Blue boxes”
• Hundreds of millions sold
• Intersil discontinued the production in 2002
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
LNA in BiCMOS Technology
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
PLL for 802.11a WLAN
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Oscillator
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Modulator
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Adders
• Multipliers
• Shifters
• Carry Units
• Arithmetic-Logic Units
Digital Integrated Circuits
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Computes one-bit sum and carry
si = ai bi cin
cout = aibi + aici + bicin
• Ripple-carry adder: n-bit adder built from full adders
• Delay of ripple-carry adder goes through all carry bits
Full Adder
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
0 1 1 0 multiplicand
x 1 0 0 1 multiplier
0 1 1 0
+ 0 0 0 0
0 0 1 1 0
+ 0 0 0 0
0 0 0 1 1 0
+ 0 1 1 0
0 1 1 0 1 1 0
partial product
Combinational Multiplier
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Array multiplier is an efficient layout of a combinational multiplier
• Array multipliers may be pipelined to decrease clock period at the expense of latency
xny0
+ 0+
P(2n-1) P(2n-2)
+
x0y0x1y0x2y00
x0y1+ x1y1
0
+ x0y2+ x1y2
P0
Array Multiplier
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Reduces depth of adder chain• Built from carry-save adders
Three inputs a, b, c
Produces two outputs y, z
y + z = a + b + c• Carry-save equations
yi = parity (ai,bi,ci)
zi = majority (ai,bi,ci)
• At each stage, i numbers are combined to form 2i/3-sums
• Final adder completes the summation
• Wiring is more complex
Wallace Tree
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Used in serial-arithmetic operations• Multiplicand can be held in place by register• Multiplier is shifted into array
Serial-Parallel Multiplier
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Can perform n-bit shifts in a single cycle
• Accepts 2n data inputs and n control signals, producing n data outputs
• Selects arbitrary contiguous n bits out of 2n input buts
• Examples
Right shift: data into top, 0 into bottom
Left shift: 0 into top, data into bottom
Rotate: data into top and bottom
data 1data 2n bits
n bits
output
n bits
Barrel Shifter
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Two-dimensional array of 2n vertical X n horizontal cells
• Input data travels diagonally upward
Output wires travel horizontally
• Control signals run vertically
Exactly one control signal is set to 1, turning on all transmission gates in that column
• Large number of cells, but each one is small
• Delay is large, considering long wires and transmission gates
Barrel Shifter
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• First computes carry propagate and generate
Pi = ai + bi
Gi = aibi
• Computes sum and carry from P and G
si = ci Pi Gi
ci+1 = Gi + Pici
• Can recursively expand carry formula
ci+1 = Gi + Pi(Gi-1 + Pi-1ci-1)
ci+1 = Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-1ci-2)
• Expanded formula does not depend on intermediate carries
• Allows carry for each bit to be computed independently
Carry-Lookahead Unit
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Deepest carry expansion requires gates with large fan-in
Large and slow
• Carry-lookahead unit requires complex wiring between adders and lookahead unit
Values must be routed back from lookahead unit to adder
Depth-4 Carry-Lookahead Unit
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Looks for cases in which carry out of a set of bits is identical to carry in
• Typically organized into m-bit stages
• If ai = bi for every bit in stage, then bypass gate sends stage’s carry input directly to carry output
Carry-Skip Adder
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Computes two results in parallel, each for different carry input assumptions
• Uses actual carry in to select correct result
• Reduces delay to multiplexer
Carry-Select Adder
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Precharged carry chain which uses P and G signals
• Propagate signal connects adjacent carry bits
• Generate signal discharges carry bit
• Worst-case discharge path goes through entire carry chain
Manchester Carry Chain
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• May be used in signal-processing arithmetic where fast computation is important but latency is unimportant
• LSB control signal clears the carry shift register
Serial Adder
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Computes a variety of logical and arithmetic functions based on opcode
• May offer complete set of functions of two variables or a subset
• Built around adder, since carry chain determines delay
• Function block may be used to compute required intermediate signals for a full-function ALU
Transmission gates may introduce significant delay
Arithmetic-Logic Unit
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• P and G compute intermediate values from inputs
May not correspond to carry lookahead P and G for non-addition functions
• Add unit is adder of choice
• Output unit computes from sum, propagate signal
Arithmetic-Logic Unit
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• 32-bit RISC microprocessor from 1985
• The simplicity made all the difference
Small, low power, and easy to program
• ARM architecture has become the dominant embedded processor
More than 10 billion ARM cores have been used in all sorts of gadgetry, including the iPhone
Acorn Computers ARM1 Processor
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Russell Fish and Chuck Moore 1988 found a way to have the processor run its own super fast internal clock while still staying synchronized with the rest of the computer
• In the years since Sh-Boom’s invention, the speed of processors had by far surpassed that of motherboards, and so practically every maker of computers and consumer electronics wound up using the same solution
Since 2006, Patriot Scientific (and Moore) have reaped over US $125 million in licensing fees from Intel, AMD, Sony, Olympus, and others
Computer Cowboys Sh-Boom Processor
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Microchip Technology PIC16C84 8-bit microcontroller in 1993
Incorporates EEPROM
Does not need UV light to be erased as EPROM needs
8-bit Microprocessors
• Radiation-hardened RCA CDP 1802 8-bit microprocessor in 1976
One of the first, if not the first, CMOS processors
Low power consumption, wide range of operating voltages and military operating temperature range
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Read-Only Memory
• Static Random-Access Memory
• Dynamic Random-Access Memory
• Memory Generators
Embedded Memories
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Address is divided into row and column
Row may contain full word or more than one word
• Selected row drives/senses bit lines in columns
• Amplifiers/drivers read/write bit lines
Memory Architecture
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• ROM core is organized as an array of NOR gates
Pull-down transistors of NOR determine programming
• Erasable ROMs require special processing that is not typically available
• ROMs on digital ICs are generally mask-programmed
Placement of pull-downs determines ROM contents
Read-Only Memory (ROM)
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Core cell uses six-transistor circuit to store value
• Value is stored symmetrically
Both true and complement are stored on cross-coupled transistors
• SRAM retains value as long as power is applied to circuit
• Read
Precharge bit and bit’ high
Set select line high from row decoder
One bit line will be pulled down
• Write
Set bit/bit’ to desired (complementary) values
Set select line high
Drive on bit lines will flip state if necessary
Static Random-Access Memory (SRAM)
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Differential pair
Takes advantage of complementarity of bit lines
• One bit line goes low
One arm of diff pair reduces its current, causing compensating increase in current of another arm
• Sense amp can be cross-coupled to increase speed
SRAM Sense Amplifier
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Dynamic Random-Access Memory (DRAM)
• Cell can easily be made with a CMOS digital technology process
• Dynamic RAM loses value due to charge leakage
Must be refreshed
• Value is stored on gate capacitance of transistor t1
• Read
read = 1, write = 0, read_data’ is precharged
t1 will pull down read_data’ if 1 is stored
• Write
read = 0, write = 1, write_data = value
Guard transistor writes value onto gate capacitance
• Modern commercial DRAMs use one-transistor cell
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• In 1980, Fujio Masuoka recruited four engineers to a project aimed at designing a memory chip that could store lots of data and would be affordable
Team came up with a variation of EEPROM that featured a memory cell consisting of a single transistor (at the time, conventional EEPROM needed two transistors per cell)
• Why is it named “flash”?
Because of the chip’s ultrafast erasing capability
• In 1984 Masuoka presented a paper at the IEEE International Electron Devices Meeting
In 1988 Intel developed a type of flash based on NOR logic gates (a 256‑kilobit chip)
Toshiba’s first NAND flash (greater storage densities but trickier to manufacture) hit the market in 1989
Toshiba NAND Flash Memory
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Memory Generators
• A software tool which can create memories (ROM or RAM blocks) in a range of sizes as needed
The customer usually wants a particular number of words (depth) and bits (width) for each memory ordered
Each of the final building blocks (physical layout) will be implemented as a stand-alone, densely packed, pitch-matched array
• Complex layout generators and state-of-the-art logic and circuit design techniques offer
Embedded memories of extreme density and performance
• Each memory generator is a set of various, parameterized generators
Layout generator generates an array of custom, pitch-matched leaf cells
Schematic generator and Net-lister extracts a net-list used for both layout vs. schematic and functional verification
Function and Timing model generators create models for gate level simulation, dynamic/static timing analysis and synthesis
Symbol generator generates schematic
Critical Path generator is used for both circuit design and timing characterization
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Logic Synthesis
• Logic Synthesis Flow
• Optimization
• Technology Mapping
• Low-Power Techniques
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Optimized Net-listSDF File
Optimized Net-listSDF File
yes
Architectural DescriptionArchitectural Description
RTL DescriptionRTL Description
Computer-Aided SynthesisComputer-Aided Synthesis
Design ConstraintsDesign Constraints
Constraints Metno
Gate-Level Net-listGate-Level Net-list
Standard Cell LibraryStandard Cell Library
Logic Synthesis Flow
• Goal is to create a logic gate network which performs a given set of functions
Input is Boolean formulae
Output is gates implementing Boolean functions
Several iterations needed for generation of the optimized gate-level description
• Logic synthesis
Maps onto available gates
Restructures for delay, area, testability, power, etc.
• Automated logic synthesis has enabled
Enormous reduction of the time needed for conversion of a design from high-level to gate-level description
Saving of designer resources for architectural and RTL descriptions, and optimization of the standard cell library
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Scheduling determines
Number of clock cycles required
As-soon-as-possible (ASAP) schedule puts every operation as early in time as possible
As-late-as-possible (ALAP) schedule puts every operation as late in schedule as possible
• Binding determines
Area and cycle time
• Area tradeoffs must consider
Shared function units vs. multiplexers and control
• Delay tradeoffs must consider
Cycle time vs. number of cycles
High-Level Synthesis
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Technology-independent optimizations
A Boolean network is the main representation of the logic functions
Each node can be represented as sum-of-products (or product-of-sums)
Functions in the network need not correspond to logic gates
• Technology mapping (library binding)
Design transformation from technology-independent to technology-dependent
• Technology-dependent optimizations
Work in the available set of logic gates
Logic Synthesis Phases
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Area is estimated by number of literals
Literal is true or complement form of a variable
• Simplification
Rewrites a node to reduce the number of literals in it
• Network restructuring
Introduces new nodes for common factors
Collapses several nodes into one new node
• Delay restructuring
Changes factorization to reduce path length
out1 = k2 + x2’ out2 = k3 + x1
k2 = x1’ x2 x4 + k1k3 = k1 x4’
k1 = x2 + x3
x1 x2 x3 x4
Technology-Independent Optimization
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Function is defined by
On-set: set of inputs for which output is 1
Off-set: set of inputs for which output is 0
Don’t-care-set: set of inputs for which output is don’t-care
• Each way to write a function as a sum-of-products is a cover
It covers the on-set
• A cover is composed of cubes
Cubes are product terms that define a subspace cube in the function space
x1
x2
x3
0
1
1
1
Covers and Cubes
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Larger cover
x1’ x2’ x3’ + x1 x2’ x3’ + x1’ x2 x3’ + x1 x2 x3
Requires four cubes (12 literals)
• Smaller cover
x2’ x3’ + x1’ x3’ + x1 x2 x3
Requires three cubes (7 literals)
x1’ x2 x3’ is covered by two cubes
• Don’t-cares
Can be implemented in either on-set or off-set
Provide the greatest opportunities for minimization in many cases
• Espresso
A two-level logic optimizer
Expands, makes irredundant and reduces
Optimization loop refines cover to reduce its size
Covers and Optimizations
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Based on division
Formulate candidate divisor
Test how it divides into the function
If g = f/c, we can use c as an intermediate function
• Algebraic division
Doesn’t take into account Boolean simplification
Less expensive then Boolean division
• Three steps
Generate potential common factors and compute literal savings if used
Choose factors to substitute into network
Restructure the network to use the new factors
• Algebraic/Boolean division is used to implement first step
Factorization
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Rewrites Boolean network
In terms of available logic functions
• Optimizes for
Area
Delay
• Can be viewed as a pattern matching problem
Find pattern match which minimizes area/delay cost
• Procedure
Write Boolean network in canonical NAND form
Write each library gate in canonical NAND form
Assign cost to each library gate
Use dynamic programming to select minimum-cost cover of network by library gates
Technology Mapping
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Breaking into Trees
not optimal, but reasonable cuts usually work well
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
after three levels of matching
Mapping Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
after four levels of matching
Mapping Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Architecture-driven supply voltage scaling
Add extra logic to increase parallelism so that system can run at lower frequency
Power improvement for n parallel units over Vref
Pn(n) = [1 + Ci(n)/nCref + Cx(n)/Cref](V/Vref)
• Dynamic voltage and frequency scaling
Decreased to parts of the circuit where it does not adversely affect the performance
Dynamic scaling is regulated by software based on system load
• Reducing capacitances
Parasitic capacitances of the transistors
Parasitic capacitances of the wires
Low Power Techniques
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Reducing switching activity
Deactivate the clock to unused registers (clock gating)
Deactivate signals if not used (signal gating)
Deactivate VDD for unused hardware blocks (power gating)
Low Power Techniques
• Distributed clocks: Globally Asynchronous Locally Synchronous
Eliminating centrally synchronous clocks and utilizing local clocks
Distinct local clocks, possibly running at different frequencies
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Design for Testability
• DFT Methods
• Scan Design
• Test Pattern Generation
• Built-In Self-Test
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Make the system as testable as possible
Keep minimum cost in hardware and testing time
Use knowledge of architecture to help in selection of testability points
Modify architecture to improve testability
• DFT for digital circuits
Ad-hoc methods
Avoid asynchronous feedback
Make flip-flops initializable
Avoid redundant gates, large fan-in gates and gated clocks
Provide test control for difficult-to-control signals
Consider ATE requirements (tri-states, etc.)
Structured methods
Scan Design
Built-in self-test (BIST)
Boundary scan
Design for Testability Methods
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Circuit is designed using pre-specified design rules
• Test structure (hardware) is added to the verified design
Add a test control (TC) primary input
Replace flip-flops by scan flip-flops (SFF) and connect to form one or more shift registers (scan-chains) in the test mode
Make input/output of each scan-chain controllable/observable from primary input/primary output
• Use combinational ATPG to obtain tests for all testable faults in the combinational logic
• Add shift register tests and convert ATPG tests into scan sequences for use in manufacturing test
• Full scan is expensive
Must roll out and roll in state many times during a set of tests
• Partial scan selects some registers (not all) for scanability to reduce the chain length
Analysis is required to choose which registers are best for scan
Scan Design
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
DTC
SD
CK
Q
QMUX
D flip-flop
Master latch Slave latch
CK
TC Normal mode, D selected Scan mode, SD selected
Master open Slave opent
t
Logicoverhead
Scanable Flip-Flop
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Level-Sensitive Scanable Flip-Flop
D
SD
MCK
Q
Q
D flip-flop
Master latch Slave latch
SCK
TCK
SCK
MCK
TCK Norm
al
mode
MCK
TCK Sca
nm
ode
Logic
overhead
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Scan Structure
SCANIN
SCANOUT
TC or TCK
SFF
SFF
SFF
Combinationallogic
PI PO
Not shown: CK orMCK/SCK feed allSFFs
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Combinational Test Vectors
I2 I1
O1 O2
PI
PO
SCANIN
SCANOUT
S1 S2
N1 N2
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0TC
Sequence length = (ncomb + 1) nsff + ncomb clock periods
ncomb = number of combinational vectors
nsff = number of scan flip-flops
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Scan-chain must be tested prior to application of scan test sequences
• A shift sequence 00110011 . . . of length nsff+4 in scan mode (TC=0)
Produces 00, 01, 11 and 10 transitions in all flip-flops
Observes the result at SCANOUT output
• Total scan test length
(ncomb + 2) nsff + ncomb + 4 clock periods
• Example
2,000 scan flip-flops, 500 comb. vectors, total scan test length ~ 106 clocks
• Multiple scan-chains reduce test length
Testing Scan Chain
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Errors are introduced during manufacturing
Testing weeds out infant mortality
• Varieties of testing
Functional testing
Performance testing
• Fault model
Possible locations of faults
I/O behavior produced by the fault
With a fault model, we can test the network for every possible instantiation of that type of fault
It is difficult to enumerate all types of manufacturing faults
• Testing procedure
Set inputs
Observe output
Compare fault-free and observed output
Testing and Faults
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Logic gate output is always stuck at 0 or 1 independently on input values
• Correspondence to manufacturing defects depends on logic family
• Experiments show that 100% stuck-at-0/1 fault coverage corresponds to high overall fault coverage
• Testing NAND
Three ways to test it for stuck-at-0
Only one way to test it for stuck-at-1
• Testing NOR
Three ways to test it for stuck-at-1
Only one way to test it for stuck-at-0
a b OK SA0SA1
0 0 1 0 1
0 1 1 0 1
1 0 1 0 1
1 1 0 0 1
a b OK SA0SA1
0 0 1 0 1
0 1 0 0 1
1 0 0 0 1
1 1 0 0 1
Stuck-At-0/1 Faults
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Can test both NANDs for stuck-at-0 simultaneously
abc = 000
• Cannot test both NANDs for stuck-at-1 simultaneously due to inverter
Must use two vectors
Must also test inverter
Multiple Test Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Transistors always on/off
• t1 is stuck open (switch cannot be closed)
No path from VDD to output capacitance
• Testing requires two cycles
Must discharge capacitor
Try to operate t1 to charge capacitor
Stuck-At-Open/Closed Model
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Two parts of testing
Controlling the inputs of (possibly interior) gates
Observing the outputs of (possibly interior) gates
• Delay faults
Gate delay model assumes that all delays are lumped into one gate
Path delay model takes into account the delay of a path through network
Performance problems
Functional problems in some types of circuits
Combinational Testing Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Goal
Test gate D for stuck-at-0 fault
• First step
Justify 0 values on gate inputs
• Work backward from gate to primary inputs
w1 = 0 (A output = 0)
i1 = i2 = 1
• Observe the fault at a primary output
o1 gives different values if D is true/faulty
• Work forward and backward
F’s other input must be 0 to detect true/fault
Justify 0 at E’s output
• In general, may have to propagate fault through multiple levels of logic to primary outputs
Testing Procedure
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Redundant logic can mask faults
• Testing NOR for SA0 requires setting both inputs to 0
• Network topology ensures that one NOR input (for instance b) will always be 1
• Function reduces to 0
f = ((a+b)’ + b)’ = (a + b)b’ = 0
• Redundant logic can introduce delay faults and other problems
Redundancy and Testing
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Much harder than combinational testing
Can’t set memory element values directly
• Must apply sequences
To put machine in proper state for test
To observe value of test
• Testing of NAND for stuck-at-1
Set both NAND inputs to 1
Primary input i1 can be controlled directly
Lower input is 1 if ps0/ps1 = 1
Sequential Testing
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• A model for sequential test
Unroll machine in time
• A single-stuck-at fault in sequential machine appears to be the multiple-stuck-at fault
Time-Frame Expansion
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Automatic test pattern generator (ATPG) generates a set of test vectors
Boolean network (combinational ATPG)
Sequential machine (sequential ATPG)
• D (from Discrepancy) allows us to quickly write fault
D value on a node means that good and faulty circuits have different values at that point
• If a test for a particular fault exists, D-algorithm will find it by an exhaustive search of all sensitized paths
Start at the faulty gate
Suppose initially a stuck-at fault on gate output
“Primitive D-cube of failure” (PDCF) of gate summarizes minimal assignment of input values to highlight fault
• Propagation D-cube (PDC) has D or D’ on output and on at least one input
Summarizes “non-controlling” values for other inputs to allow propagation of D signal
Test Pattern Generation
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• PODEM stands for Path-Oriented DEcision Making
Circuit-based, fault-oriented ATPG algorithm
• Goal
Propagate D value to primary outputs
• Signal values are explicitly assigned at primary inputs only
Other values are computed by implication
• Backtracking means reassigning primary inputs when a contradiction occurs
Uses implicit enumeration
• Uses five values: 0, 1, D, D’, and X
Start all values at X
• In worst case, must examine all possible inputs
Can be implemented to run quickly
PODEM Algorithm
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Fault Propagation Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Includes on-chip machine responsible for
Generating tests
Evaluating correctness of tests
• Allows many tests to be applied
• Can’t afford large memory for test results
Rely on compression and statistical analysis
• Uses a linear-feedback shift register (LFSR) to generate a pseudo-random sequence of bit vectors
Built-In Self-Test (BIST)
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• One LFSR generates test sequence
• Another LFSR captures/compresses results
• Can store a small number of signatures which contain expected compressed results for valid system
• Usually used for testing memory blocks
BIST Architecture
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Layout Generation
• Layout Generation Flow
• Design Rules
• Layout Tools
• Standard Cells
• Floorplanning
• Placement
• Routing
• Clock Tree
• Pads
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
VerifyVerify
Net-listNet-list
FloorplanningFloorplanning
Plan Power RoutingPlan Power Routing
PlacementPlacement
Adjust SizeAdjust Size
Generate Clock TreesGenerate Clock Trees
Optimize PlacementOptimize Placement
RouteRoute
Post-layout Net-listSDF, DEF, GDSII
Post-layout Net-listSDF, DEF, GDSII
Cell Library(LEF, TLF)
Cell Library(LEF, TLF)
Design Constraints
Design Constraints • Library Exchange Format (LEF) files
To create a library database (standard cells, I/O cells, and macro blocks)
• Timing Library Format (TLF) file
Timing constraints
• General Constraints Format (GCF) file
Design constraints
• Verilog net-list
To create a design database
Layout Generation Flow
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Floorplanning
To create a core area with rows (or columns) and I/O rows around the core area
• Power planning and routing
To plan, modify and rout power paths, power rings and power stripes
• Placement
An I/O constraints file may be used to place the I/O pads
Block placement
Cell placement
• Size adjustment
To estimate the die size
To resize the design to make it routable
Layout Generation Flow
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Generating clock trees
The clock buffer space and clock net must be defined
Generating clock trees is iterative process
At this point, the physical net-list differ from the logical (original) net-list
• Placement optimization
To resize gates and insert buffers to correct timing and electrical violations
• Routing
To perform both global and final route on a placed design
• Verification
To check for shorts and design rule violations
Layout Generation Flow
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Masks are tools for manufacturing
• Manufacturing processes have inherent limitations in accuracy
• Design rules specify geometry of masks which will provide reasonable yields
• Design rules are determined by experience
• MOSIS SCMOS
Designed to scale across a wide range of technologies
Designed to support multiple vendors
Designed for educational use
Fairly conservative
• Lambda () design rules
Size of a minimum feature defines
Specifying particularizes the scalable rules
Parasitics are generally not specified in units
Design Rules
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
metal 36
metal 23
metal 13
pdiff/ndiff3
poly2
Wires
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
2
3
1
3 2
5
Transistors
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Types of via
Metal1/diff
Metal1/poly
Metal2/metal1
Metal3/metal2
...
4
1
4
2
• Highest via
Cut: 3 x 3
Overlap by metal2: 1
Minimum spacing: 3
Minimum spacing to via1: 2
Vias
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Diffusion/diffusion
3
• Poly/poly
2
• Poly/diffusion
1
• Via/via
2
• Metal1/metal1
3
• Metal2/metal2
4
• Metal3/metal3
4
Spacings
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Cut in passivation layer
Connection for bonding wire
• Minimum bonding pad
100
• Pad overlap of glass opening
6
• Minimum pad spacing to unrelated metal2/3
30
• Minimum pad spacing to unrelated metal1, poly, active
15
Overglass
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Layout editors are interactive tools
• Design rule checkers identify errors on the layout
• Circuit extractors extract the net-list from the layout
• Connectivity verification systems (CVS) compare extracted and original net-lists
CADENCE Virtuoso’s Layout-versus-Schematic (LVS) tool
• Standard cell layouts are created from pre-designed cells using the custom routing
Silicon Ensemble (CADENCE)
Encounter (CADENCE)
Physical Compiler (SYNOPSYS)
Layout Tools
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
routing area
routing area
rout
ing
are
a
routin
g area
• Layout made of small cells
Gates, flip-flops, etc.
Cells are hand-designed
• Assembly of cells is automatic
Cells arranged in rows
Wires routed between and through cells
• Pitch is the height of a cell
All cells have same pitch, may have different widths
• VDD/VSS connections are designed to run through cells
• A feedthrough area allows wires to be routed over the cell
Standard Cell Layout
VSS
n tub
p tub
Intra-cell wiring
pullups
pulldowns
pin
pin
Fee
dthr
ough
are
aVDD
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Floorplanning must take into account
Blocks of varying function, size, and shape
Space allocation
Signal routing
Power supply routing
Clock distribution
Floorplanning Strategy
MIPS Logic
Analog RAM
Test Logic
Processor Logic
Analog RAM
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Develop a wiring plan
Think about how layers will be used to distribute important wires
• Draw separate wiring plans for power and clocking
These are important design tasks which should be tackled early
• Sweep small components into larger blocks
A floorplan with a single NAND gate in the middle will be hard to work with
• Design wiring that looks simple
If it looks complicated, it is complicated
• Design planar wiring
Planarity is the essence of simplicity
Do it where feasible (and where it doesn’t introduce unacceptable delay)
Floorplanning Tips
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Placement of components interacts with routing of wires
• Quality metrics for layout
Area and delay
• Area and delay determined in part by
Wiring
• How do we judge a placement without wiring?
Estimate wire length without actually performing routing
bad placement good placement
Placement Metrics
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• To construct an initial solution
• To improve an existing solution
• Pairwise interchange is a simple improvement metric
Interchange a pair, keep the swap if it helps wire length
Heuristic determines which two components to swap
• Placement by partitioning
Works well for components of fairly uniform size
Partition net-list to minimize total wire length using min-cut criterion
• Kernighan-Lin Algorithm
Computes min-cut criterion, count total net-cut change
Exchanges sets of nodes to perform hill-climbing finding improvements where no single swap will improve the cut
Recursively subdivide to determine placement detail
Placement Techniques
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Major phases in routing
Global routing assigns nets to routing areas
Detailed routing designs the routing areas
• Net ordering determines quality of result
Net ordering is a heuristic
• Blocks and wiring
Blocks divide wiring area into routing channels
Large wiring areas may force rearrangement of block placement
• Channel routing
Channel grows in one dimension to accommodate wires
Pins generally on only two sides
• Switchbox routing
Box cannot grow in any dimension
Pins are on all four sides
Routing
chan
nel
switch
box
channel
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Tracks form a grid for routing
Spacing between tracks is center-to-center distance between wires
Track spacing depends on wire layer used
• Density (vertical and horizontal)
Gives the number of wire segments crossing a vertical/horizontal grid segment
• Different layers are used for horizontal and vertical wires
Horizontal and vertical wires can be routed relatively independently
• Placement of cells determines placement of pins
• Pin placement determines difficulty of routing problem
Routing Channels
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Assumes one horizontal segment per net
• Sweep pins from left to right
Assign horizontal segment to lowest available track
• Limitations
Some combinations of nets require more than one horizontal segment per net (a dog-leg wire)
• Aligned pins form vertical constraints
Wire to lower pin must be on lower track
Wire to upper pin must be above lower pin’s wire
A B C
A B B C
B A
A B
aligned
?
Left-Edge Algorithm
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Global routing
Assign wires to paths through channels
Don’t worry about exact routing of wires within channel
Can estimate channel height using congestion
• Detailed routing
Dog-leg router breaks net into multiple segments as needed
Minimize number of dog-leg segments per net to minimize congestion for future nets
Use left-edge criterion on each dog-leg segment to fill up the channel
Global and Detailed Routing
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Multipoint nets are harder to design than two-point nets
Rectilinear Steiner tree problem:
Find the minimum-length tree interconnecting all net’s points
Find additional points (so-called Steiner points) in the plane, if they contribute to a shorter tree length
• Steiner tree algorithm
Compute the spanning tree
Optimize the tree length by flipping L-shaped branches
Steiner points always have degree three
Multipoint Nets
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Place and route to minimize
Capacitance of nodes with high glitching activity
• Feed back wiring capacitance values
To better estimate power consumption
• Size wires to be able to handle current
Requires designing topology of VDD/VSS networks
• Keep power network in metal
Requires designing planar wiring
Layout for Low Power
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Clock delay varies with position
Deliver clock to memory elements with acceptable skew
Deliver clock edges with acceptable sharpness
• Clocking network design
The greatest challenge in the design of a large chip
Clock Delay
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Clocks are generally distributed via wiring trees
• Use low-resistance interconnect to minimize delay
• Use multiple drivers to distribute driver requirements
Use optimal sizing principles to design buffers
• Clock lines can create significant crosstalk
Clock Distribution Tree
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Pads are placed on top-layer metal
Provide a place to bond to the package
• Pads are typically placed around periphery of chip
Some advanced packaging systems bond directly to package without bonding wire
Some allow pads across entire chip surface
• Supply power/ground to
Each pad
Chip core
• Positions of pads
May be determined by pin requirements
• Distribute power/ground pins as evenly as possible
To minimize power distribution problems
Pad Architecture
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Main purpose is to provide electrostatic discharge (ESD) protection
• Gate of transistor is very sensitive
Can be permanently damaged by high voltage
Static electricity in room is sufficient to cause damage
• Resistor is used in series with pad to limit current caused by voltage spike
• May use parasitic bipolar transistors to drain away high voltages
One for positive pulses
Another for negative pulses
• Must design layout to avoid latch-up
Input Pad
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Don’t need ESD protection
Transistor gates not connected to pad
• Must be able to drive capacitive pad load and outside load
• May need voltage level shifting
To be compatible with other logic families
Output Pad
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Combination of input and output
Controlled by a mode-input on chip
• Pad includes logic to disconnect output driver when pad is used as an input
• Must be protected against ESD
Three-State Pad
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Design for Manufacturability
• Defects and Faults
• Critical Area Modeling
• Critical Area Extraction
• Yield Modeling
• Yield Control
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Design faults
• Random faults
Shorts between lines
Breaks of lines
Leakage of insulation layers
• Permanent faults
• Dynamic faults
• Transient faults
• Parametric test
Power consumption
Input resistance
Input/output current
• Functional test
Test signals applied
Output observed
Defects, Faults, and Tests
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Defect density distribution
g(D)
• Defect size distribution
h(X)
Defects Statistics
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
D
g(D)
2DD
1
2D
1
D
Defect Density Distribution
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Defect Size Distribution
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Sample size
Number of measurements
• Structure size
High and low defect densities
• Critical dimensions
As defect sizes
• Self-isolation
Sensitive on one defect type
• Measurability
Electrical
Simple
Reliable
Test Structures
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Long parallel conductors Real Patterns
Critical Area Models
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
0
dx)x(h)x(AA
xws
wsxs
sx
for
for
for
swNL
sxNLxs
A
2
2
00
xsw
swxw
wx
for
for
for
wsNL
wxNLxo
A
2
2
00
Critical Area Models
)(
)/xexp(x)x(h
1
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Point defects Lithographic defects
)yy)(xx(A l 1212
)]yy()xx[(zA v 12122
Critical Area Extraction
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Shorts BreaksVerticalshorts
Critical Area Extraction
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
0
dDDgDAexpY
Yield Models
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Yield Control
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Design Methodology
• SOC Design Flow
• An Example
SOC Example
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Every company has its own design methodology
• Methodology depends on
Size of chip
Design time constraints
Cost/performance
Available tools
• Driven by contradictory impulses
Customer concerns about cost and performance
Forecasts of feasibility of cost and performance
• Design features, performance, power, etc.
May be negotiated at early stages
Negotiation at later stages creates problems
Design Methodology
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Design styles
Full custom logic design (very tedious)
Semi custom or standard cell design
Field Programmable Gate Array (FPGA) design
• Full custom design
Most likely for data-paths
Least likely for random logic off critical path
• Standard cell design
Application Specific Integrated Circuits (ASIC)
• FPGA design
Prototyping oriented
Design Styles
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Functionality, speed, power consumption, area, etc.
• Estimation techniques of circuit performance vary with module
Memories may be generated once size is known
Data-paths may be estimated from previous design
Controllers are hard to estimate without details
• Clock distribution
• Layout design rule check
• Testing
Generation of simulation test vectors
Generation of scan test vectors (ATPG)
Manufacturing test vectors comprise of both
Design Validation
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
MPL Library
New RTL Designs
HDL Top Module Definition
Simulation
OK?
Logic Synthesis
Simulation
OK?
Layout Synthesis
Simulation
OK?
Final Chip Layout
Test Benches
yes
Configurable Modules(synthesizable RTL code)
Pre-defined Modules(synthesized net-lists, layouts, standard cells)
New logic runsufficient?
yes
yes
New layout runsufficient?
yes
no
no
no
no
yes
no
Applications
System Specification
A
A
A
SOC Design Flow
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Wireless Broadband Networks
A highly integrated broadband wireless modem
• Wireless Internet
A new terminal oriented TCP/IP for wireless systems
RFBasebandDLC
ApplicationEngine
Mobile Bus.Engine
ProtocolEngine
WirelessInternet
PowerManagement
WirelessInternet
TestEngine TestProject
Wireless Broadband Network
• Wireless Sensor Networks
A flexible sensor network node architecture for medical applications
• Mobile Business Engine
A specific application processor for highly efficient encryption operations
Applications
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• SOC designs are a mix of
Intellectual Property (IP) blocks
Standard functions (in-house developed)
Application specific blocks (in-house developed)
• RTL descriptions, net-lists and layouts
Soft-core MIPS32 4KEp
Soft-core LEON-2
Hard-core LEON-3
Soft-core IPMS430
UART, GPIO, PCMCIA
AMBA, I2C bus
Controllers
Hardware accelerators
SRAM memory generator
Hard-core flash
Reusable Modules
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
Area (mm2) 27
Transistors (x106) 1.5
Memory including BIST (kB) 20
Power/Frequency (mW/MHz) 8.1
Max Frequency (MHz) 70
Scan Chain (#FF) 11203
CPU
I-Cache
D-Cache
AHB Controller
DCLDSU
AHB
AHB/APBBridge
Memory Controller
APB
SRAM Flash
IO PortIrq Ctrl
TimersUARTs
CPU
I-Cache
D-Cache
AHB Controller
DCLDSU
AHB
AHB/APBBridge
Memory Controller
APB
SRAM Flash
IO PortIrq Ctrl
TimersUARTs
Configurable Processor
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Customisation addresses three architectural levels
Instruction extension: designer specifies its functionality
Inclusion/exclusion of predefined blocks: special registers, BIST, etc.
Parameterisation: cache size, number of registers, etc.
• To customise the extensible processor to a specific application
It starts with profiling the application using instruction-set simulator
• To evaluate customisations using retargetable tool generation
Retargetable techniques automatically generate compilers and simulators aware of the new or extended instructions
• Major players in the field
Tensilica, Improv Systems, ARC, Coware, and Target Compiler Techn.
NEC’s TCP/IP offload engine integrates 10 extensible processors
Extensible Processor
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• “The crisis of complexity” in SOC design
SOC designs do not exploit the possible 100 M transistors per chip
250 M gates per SOC are feasible by 2005, most SOCs would use only 50 M gates
• More design starts for Application-Specific Standard Products (ASSP) than for ASIC
Increase of the number of programmable processing units
Decrease of the gate count for custom logic blocks
• The trend is expected to continue, yielding a “sea of processors”
Many heterogeneous processors connected by a network-on-chip
• System-level design tools combined with use of off-the-shelf components are needed
Application-Specific Instruction-set Processors (ASIP) from ASSP
Extensible processor platform is state-of-the-art in ASIP technology
SOC Design Trends
DAAD Program „Akademischer Neuaufbau Südosteuropa“, Embedded System Design Course at University of Nis,Faculty of Electronic Engineering, 29.06.-03.07.2009
• Optimisation and search through a large design space
Right set of extensible instructions and its constraints
• Communication of many extensible SOC processors on chip
A customised Network-on-Chip (NOC)
• Shift in SoC design distinction
To the process of customising extensible processors
To the software design expertise needed to program them
• “Structured” ASICs introduce pre-built blocks (logic, configurable memories, and test structures)
Most of the metal layers predefined
Customisation of the upper two or three metal layers
All customers share the same prefabricated die
Open Issues and Alternatives