From Combinational to Sequential Circuits to Simple Processors

72
1 From From Combinational Combinational to Sequential to Sequential Circuits to Circuits to Simple Simple Processors Processors

description

From Combinational to Sequential Circuits to Simple Processors. Reminder Embedded Systems. 2. Outline. Introduction Combinational logic Sequential logic FSM design Custom single-purpose processor design RT-level custom single-purpose processor design. SYNTHESIS METHODOLOGIES. - PowerPoint PPT Presentation

Transcript of From Combinational to Sequential Circuits to Simple Processors

Page 1: From Combinational to Sequential Circuits to Simple Processors

1

From Combinational From Combinational to Sequential to Sequential

Circuits to Simple Circuits to Simple ProcessorsProcessors

Page 2: From Combinational to Sequential Circuits to Simple Processors

2

Reminder Embedded SystemsReminder Embedded Systems

Page 3: From Combinational to Sequential Circuits to Simple Processors

Outline

• Introduction• Combinational logic• Sequential logic• FSM design• Custom single-purpose processor design• RT-level custom single-purpose processor design

Page 4: From Combinational to Sequential Circuits to Simple Processors

SYNTHESIS SYNTHESIS METHODOLOGIEMETHODOLOGIE

SS

Page 5: From Combinational to Sequential Circuits to Simple Processors

5

Increasing abstraction level in design specification

• Higher abstraction level focus of hardware/software design evolution– Description smaller/easier to capture

• E.g., Line of sequential program code can translate to 1000 gates– Many more possible implementations available

• (a) Like flashlight, the higher above the ground, the more ground illuminated– Sequential program designs may differ in performance/transistor count by orders of magnitude– Logic-level designs may differ by only power of 2

• (b) Design process proceeds to lower abstraction level, narrowing in on single implementation

(a) (b)

idea

implementation

back-of-the-envelopesequential program

register-transferslogic

mod

elin

g co

st in

crea

ses

oppo

rtuni

ties

decr

ease

idea

implementation

Page 6: From Combinational to Sequential Circuits to Simple Processors

What is Synthesis

• Automatically converting system’s behavioral description to a structural implementation– Complex whole formed by parts– Structural implementation must optimize design metrics

• Synthesis is more expensive, it is complex than compilers– Cost = $100s to $10,000s– User controls 100s of synthesis options– Optimization criticalOptimization critical

• Otherwise could use software– Optimizations different for each user– Run time = hours, days

Page 7: From Combinational to Sequential Circuits to Simple Processors

7

Gajski’s Y-chart

• Each axis represents type of description– BehavioralBehavioral

• Defines outputs as function of inputs• Algorithms but no implementation

– StructuralStructural• Implements behavior by connecting

components with known behavior– PhysicalPhysical

• Gives size/locations of components and wires on chip/board

• Synthesis converts behavior at given level to structure at same level or lower

– E.g.,• FSM → gates, flip-flops (same level)• FSM → transistors (lower level)• FSM X registers, FUs (higher level)• FSM X processors, memories (higher

level)

Behavior

Physical

Structural

Processors, memories

Registers, FUs, MUXs

Gates, flip-flops

Transistors

Sequential programs

Register transfers

Logic equations/FSM

Transfer functions

Cell Layout

Modules

Chips

Boards

FU = functional unitFU = functional unit

FSM = finite state machineFSM = finite state machine

Page 8: From Combinational to Sequential Circuits to Simple Processors

Example of Custom ProcessorExample of Custom Processor• Processor

– Digital circuit that performs a computation tasks

– Controller and datapath– General-purpose: variety of computation

tasks– Single-purpose: one particular

computation task– Custom single-purpose: non-standard task

• A custom single-purpose processor may be

– Fast, small, low power– But, high NRE, longer time-to-market,

less flexible

Microcontroller

CCD preprocessor

Pixel coprocessorA2D

D2A

JPEG codec

DMA controller

Memory controller ISA bus interface UART LCD ctrl

Display ctrl

Multiplier/Accum

Digital camera chip

lens

CCD

Page 9: From Combinational to Sequential Circuits to Simple Processors

CMOS transistor on silicon

• Transistor– The basic electrical component in digital systems– Acts as an on/off switch– Voltage at “gate” controls whether current flows from source to drain– Don’t confuse this “gate” with a logic gate

source drainoxidegate

IC package IC channel

Silicon substrate

gate

source

drain

Conductsif gate=1

1

Page 10: From Combinational to Sequential Circuits to Simple Processors

CMOS transistor implementations

• Complementary Metal Oxide Semiconductor

• We refer to logic levels– Typically 0 is 0V, 1 is 5V

• Two basic CMOS types– nMOS conducts if gate=1– pMOS conducts if gate=0– Hence “complementary”

• Basic gates– Inverter, NAND, NOR

x F = x'

1

inverter

0

F = (xy)'

x1

x

y

y

NAND gate

0

1

F = (x+y)'

x y

x

y

NOR gate0

gate

source

drain

nMOS

Conductsif gate=1

gate

source

drain

pMOS

Conductsif gate=0

Page 11: From Combinational to Sequential Circuits to Simple Processors

Basic logic gates

F = x yAND

F = (x y)’NAND

F = x yXOR

F = xDriver

F = x’Inverter

x F

F = x + yOR

F = (x+y)’NOR

x F

x

yF

Fx

y

x

yF

xy

Fx

y F

F = x yXNOR

Fyxx

0y0

F0

0 1 01 0 01 1 1

x0

y0

F0

0 1 11 0 11 1 1

x0

y0

F0

0 1 11 0 11 1 0

x0

y0

F1

0 1 01 0 01 1 1

x0

y0

F1

0 1 11 0 11 1 0

x0

y0

F1

0 1 01 0 01 1 0

x F0 01 1

x F0 11 0

Page 12: From Combinational to Sequential Circuits to Simple Processors

Combinational logic designCombinational logic design A) Problem description

y is 1 if a is to 1, or b and c are 1. z is 1 if b or c is to 1, but not both, or if all are 1.

D) Minimized output equations

000

1

01 11 100

1

0 1 0

1 1 1

abcy

y = a + bc

000

1

01 11 100

0

1 0 1

1 1 1

z

z = ab + b’c + bc’

abc

C) Output equations

y = a'bc + ab'c' + ab'c + abc' + abc

z = a'b'c + a'bc' + ab'c + abc' + abc

B) Truth table

1 0 1 1 11 1 0 1 11 1 1 1 1

0 0 1 0 10 1 0 0 10 1 1 1 01 0 0 1 0

00 0 0 0

Inputsa b c

Outputsy z

E) Logic Gates

abc

y

z

Page 13: From Combinational to Sequential Circuits to Simple Processors

Combinational componentsCombinational components

With enable input e all O’s are 0 if e=0

With carry-in input Ci

sum = A + B + Ci

May have status outputs carry, zero, etc.

O =I0 if S=0..00I1 if S=0..01…I(m-1) if S=1..11

O0 =1 if I=0..00O1 =1 if I=0..01…O(n-1) =1 if I=1..11

sum = A+B (first n bits)carry = (n+1)’th bit of A+B

less = 1 if A<B equal =1 if A=Bgreater=1 if A>B

O = A op Bop determinedby S.

n-bit, m x 1 Multiplexor

O

…S0

S(log m)

n

n

I(m-1) I1 I0

log n x n Decoder

O1 O0O(n-1)

I0I(log n -1)…

n-bitAdder

nA B

n

sumcarry

n-bitComparator

n nA B

less equal greater

n bit, m function

ALU

n nA B

…S0

S(log m)

n

O

Students should be able to use all kinds of combinational blocks in synthesis of various problems

Page 14: From Combinational to Sequential Circuits to Simple Processors

Levels of synthesisLevels of synthesis• Logic-level behavior to structural implementation

– Logic equations and/or FSM to connected gates• Combinational logic synthesis

– Two-level minimization (Sum of products/product of sums)• Best possible performance

– Longest path = 2 gates (AND gate + OR gate/OR gate + AND gate)• Minimize size

– Minimum cover– Minimum cover that is prime– Heuristics

– Multilevel minimization• Trade performance for size• Pareto-optimal solution

– Heuristics

• FSM synthesis and Control Unit Synthesis– State minimization– State encoding– State decomposition– Special architectures– Microprogramming– Petri Nets etc

• Block-level synthesis• System Synthesis

Page 15: From Combinational to Sequential Circuits to Simple Processors

Minimum Minimum CoverCover

Page 16: From Combinational to Sequential Circuits to Simple Processors

16

Two-level logic minimization

• Represent logic function as sum of products (or product of sums)– AND gate for each product– OR gate for each sum

• Gives best possible performance– At most 2 gate delay

• Goal: minimize size– Minimum cover

• Minimum # of AND gates (sum of products)– Minimum cover that is prime

• Minimum # of inputs to each AND gate (sum of products)

F = abc'd' + a'b'cd + a'bcd + ab'cd

Sum of products

4 4-input AND gates and 1 4-input OR gate → 40 transistors

abc

dF

Direct implementation

Page 17: From Combinational to Sequential Circuits to Simple Processors

Minimum cover

• Minimum # of AND gates (sum of products)• Literal: variable or its complement

– a or a’, b or b’, etc.• Minterm: product of literals

– Each literal appears exactly once • abc’d’, ab’cd, a’bcd, etc.

• Implicant: product of literals– Each literal appears no more than once

• abc’d’, a’cd, etc.– Covers 1 or more minterms

• a’cd covers a’bcd and a’b’cd

• Cover: set of implicants that covers all minterms of function• Minimum cover: cover with minimum # of implicants

Page 18: From Combinational to Sequential Circuits to Simple Processors

18

Minimum cover: K-map approach

• Karnaugh map (K-map)– 1 represents minterm– Circle represents implicant

• Minimum cover– Covering all 1’s with min # of

circles– Example: direct vs. min cover

• Less gates– 4 vs. 5

• Less transistors– 28 vs. 40

11

10 0 0

0 0 1 01 0 0 00 0 0

abcd

00

01

11

10

00 01 10

1

10 0 00 0 1 01 0 0 00 0 0

abcd

00

01

11

10

00 01 11 10

1

F=abc'd' + a'cd + ab'cd

abc

d

F

2 4-input AND gate1 3-input AND gates1 4 input OR gate → 28 transistors

K-map: sum of products K-map: minimum cover

Minimum cover

Minimum cover implementation

Page 19: From Combinational to Sequential Circuits to Simple Processors

19

Minimum cover that is a prime cover

• Minimum # of inputs to AND gates• Prime implicant

– Implicant not covered by any other implicant

– Max-sized circle in K-map• Minimum cover that is prime

– Covering with min # of prime implicants– Min # of max-sized circles– Example: prime cover vs. min cover

• Same # of gates– 4 vs. 4

• Less transistors– 26 vs. 28

10 0 00 0 1 01 0 0 00 0 0

abcd

00

01

11

10

00 01 11 10

1

K-map: minimum cover that is prime

Minimum cover that is prime

F=abc'd' + a'cd + b'cd

1 4-input AND gate 2 3-input AND gates1 4 input OR gate

→ 26 transistors

F

abc

d

Implementation

Page 20: From Combinational to Sequential Circuits to Simple Processors

Minimum cover: heuristics

• K-maps give optimal solution every time– Functions with > 6 inputs too complicated– Use computer-based tabular method

• Finds all prime implicants• Finds min cover that is prime• Also optimal solution every time• Problem: 2n minterms for n inputs

– 32 inputs = 4 billion minterms– Exponential complexity

• Heuristic– Solution technique where optimal solution not guaranteed– Hopefully comes close

Page 21: From Combinational to Sequential Circuits to Simple Processors

Heuristics: iterative improvementiterative improvement

• Start with initial solution– i.e., original logic equation

• Repeatedly make modifications toward better solution• Common modifications

– ExpandExpand• Replace each nonprime implicant with a prime implicant covering it• Delete all implicants covered by new prime implicant

– ReduceReduce• Opposite of expand

– ReshapeReshape• Expands one implicant while reducing another• Maintains total # of implicants

– IrredundantIrredundant• Selects min # of implicants that cover from existing implicants

• Synthesis tools differ in modifications used and the order they are used

Page 22: From Combinational to Sequential Circuits to Simple Processors

Multilevel logic minimization

• Trade performance for size– Increase delay for lower # of gates– Gray area represents all possible

solutions– Circle with X represents ideal solution

• Generally not possible– 2-level gives best performance

• max delay = 2 gates• Solve for smallest size

– Multilevel gives pareto-optimal solutionpareto-optimal solution• Minimum delay for a given size• Minimum size for a given delay

size

dela

y

multi-lev

el minim

.

2-level minim.

Page 23: From Combinational to Sequential Circuits to Simple Processors

23

Example of logic factorizationfactorization

• Minimized 2-level logic function:– F = adef + bdef + cdef + gh– Requires 5 gates with 18 total gate inputs

• 4 ANDS and 1 OR• After algebraic manipulation:

– F = (a + b + c)def + gh– Requires only 4 gates with 11 total gate inputs

• 2 ANDS and 2 ORs– Less inputs per gate– Assume gate inputs = 2 transistors

• Reduced by 14 transistors– 36 (18 * 2) down to 22 (11 * 2)

– Sacrifices performance for size• Inputs a, b, and c now have 3-gate delay

• Iterative improvement heuristic commonly used

F

b

ce

ad

fgh

2-level minimized

F

bc

e

a

dfgh

multilevel minimized

Page 24: From Combinational to Sequential Circuits to Simple Processors

Control automataControl automata

CounterCounter

ROM or ROM or similar logicsimilar logic

CounterCounter

ROM or ROM or similar logicsimilar logic

RegisterRegister

CounterCounter

ROM or ROM or similar logicsimilar logic

RegisterRegister

Small Small FSMFSM

inputs

outputs

pageAddress of outputs Address of outputs

Variant 1Variant 1 Variant 2Variant 2

Variant 3Variant 3

Page 25: From Combinational to Sequential Circuits to Simple Processors

Control automataControl automata

Register/ Register/ CounterCounter

ROM or ROM or similar logicsimilar logic

CounterCounter

ROM or ROM or similar logicsimilar logic

RegisterRegister

Small Small FSMFSM

inputs

outputsAddress of outputs

Variant 4Variant 4Variant 6Variant 6

Load new address

Load/count

Page 26: From Combinational to Sequential Circuits to Simple Processors

FSM FSM synthesissynthesis

Page 27: From Combinational to Sequential Circuits to Simple Processors

27

FSM synthesisFSM synthesis• FSM to gates• State minimization

– Reduce # of states• Identify and merge equivalent states

– Outputs, next states same for all possible inputs– Tabular method gives exact solution

• Table of all possible state pairs• If n states, n2 table entries• Thus, heuristics used with large # of states

• State encoding– Unique bit sequence for each state– If n states, log2(n) bits– n! possible encodings– Thus, heuristics common

Page 28: From Combinational to Sequential Circuits to Simple Processors

Sequential componentsSequential components

Q = 0 if clear=1, I if load=1 and clock=1, Q(previous) otherwise.

Q = 0 if clear=1, Q(prev)+1 if count=1 and clock=1.

clear

n-bitRegister

n

n

load

I

Q

shift

I Q

n-bitShift register

n-bitCountern

Q

Q = lsb - Content shifted - I stored in msb

Reversible shifter shifts left and rigth

Reversible counter counts up and down

Reading it operation in most of registers – generalized registers.generalized registers.

Page 29: From Combinational to Sequential Circuits to Simple Processors

Sequential logic designSequential logic designA) Problem Description

You want to construct a clock divider. Slow down your pre-existing clock so that you output a 1 for every four clock cycles

0

1 2

3

x=0

x=1x=0

x=0

a=1 a=1

a=1

a=1

a=0

a=0

a=0

a=0

B) State Diagram

C) Implementation Model

Combinational logic

State register

a x

I0

I0

I1

I1

Q1 Q0

D) State Table (Moore-type)

1 0 1 1 11 1 0 1 11 1 1 0 0

0 0 1 0 10 1 0 0 10 1 1 1 01 0 0 1 0

00 0 0 0

InputsQ1 Q0 a

OutputsI1 I0

1

0

0

0

x

• Given this implementation model– Sequential logic design quickly reduces to

combinational logic design

Page 30: From Combinational to Sequential Circuits to Simple Processors

Sequential logic design (cont.)

00

1

Q1Q0 I1

I1 = Q1’Q0a + Q1a’ + Q1Q0’

0 1

1

1

010

00 11 10 a 01

0

0

0

1 0 1

1

00 01 11 a

1

10 I0 Q1Q0

I0 = Q0a’ + Q0’a0

1

0 0

0

1

1

0

0

00 01 11 10

x = Q1Q0

x

0

1

0

aQ1Q0

E) Minimized Output Equations F) Combinational Logic

a

Q1 Q0

I0

I1

x

Page 31: From Combinational to Sequential Circuits to Simple Processors

Custom single-purpose processor Custom single-purpose processor basic modelbasic model

controller and datapath

controller datapath

externalcontrolinputs

externalcontrol outputs

externaldata

inputs

externaldata

outputs

datapathcontrolinputs

datapathcontroloutputs

… …

a view inside the controller and datapath

controllercontroller datapathdatapath

… …

stateregister

next-stateand

controllogic

registers

functionalunits

Page 32: From Combinational to Sequential Circuits to Simple Processors

Example:Example: greatest common greatest common

divisordivisor

Page 33: From Combinational to Sequential Circuits to Simple Processors

Example:Example: greatest common divisor

GCD

(a) black-box (a) black-box viewview

x_i y_i

d_o

go_i

0: int x, y;1: while (1) {2: while (!go_i);3: x = x_i; 4: y = y_i;5: while (x != y) {6: if (x < y) 7: y = y - x; else 8: x = x - y; }9: d_o = x; }

(b) desired functionality(b) desired functionality

y = y -x7: x = x - y8:

6-J:

x!=y

5: !(x!=y)

x<y !(x<y)

6:

5-J:

1:

1

!1

x = x_i3:

y = y_i4:

2:

2-J:

!go_i

!(!go_i)

d_o = x

1-J:

9:

(c) state (c) state diagramdiagram

• First create algorithm• Convert algorithmConvert algorithm to

“complex” state machine– Known as FSMD: finite-

state machine with datapath– Can use templates to

perform such conversion

Page 34: From Combinational to Sequential Circuits to Simple Processors

State diagram templates

Assignment statement

a = bnext statement

a = b

next statement

Loop statement

while (cond) { loop-body-

statements}next statement

loop-body-statements

cond

next statement

!cond

J:

C:

Branch statement

if (c1) c1 stmtselse if c2 c2 stmtselse other stmtsnext statement

c1

c2 stmts

!c1*c2 !c1*!c2

next statement

othersc1 stmts

J:

C:

Page 35: From Combinational to Sequential Circuits to Simple Processors

Creating the datapath

• Create a register for any declared variable

• Create a functional unit for each arithmetic operation

• Connect the ports, registers and functional units– Based on reads and writes– Use multiplexors for multiple

sources

• Create unique identifier – for each datapath component

control input and output

y = y -x7: x = x - y8:

6-J:

x!=y

5: !(x!=y)

x<y !(x<y)

6:

5-J:

1:

1

!1

x = x_i3:

y = y_i4:

2:

2-J:

!go_i

!(!go_i)

d_o = x

1-J:

9:

subtractor subtractor7: y-x8: x-y5: x!=y 6: x<y

x_i y_i

d_o

0: x 0: y

9: d

n-bit 2x1 n-bit 2x1x_sel

y_selx_ld

y_ld

x_neq_y

x_lt_y

d_ld

<5: x!=y

!=

Datapath

Page 36: From Combinational to Sequential Circuits to Simple Processors

Creating the controller’s FSM

• Same structure as FSMD• Replace complex

actions/conditions with datapath configurations

y = y -x7: x = x - y8:

6-J:

x!=y

5: !(x!=y)

x<y !(x<y)

6:

5-J:

1:

1

!1

x = x_i3:

y = y_i4:

2:

2-J:

!go_i

!(!go_i)

d_o = x

1-J:

9:

y_sel = 1y_ld = 1

7: x_sel = 1x_ld = 1

8:

6-J:

x_neq_y

5:!x_neq_y

x_lt_y !x_lt_y

6:

5-J:

d_ld = 1

1-J:

9:

x_sel = 0x_ld = 13:

y_sel = 0y_ld = 14:

1:1

!1

2:

2-J:

!go_i

!(!go_i)

go_i

0000

0001

0010

0011

0100

0101

0110

0111 1000

1001

1010

1011

1100

Controller

subtractor subtractor7: y-x8: x-y5: x!=y 6: x<y

x_i y_i

d_o

0: x 0: y

9: d

n-bit 2x1 n-bit 2x1x_sel

y_selx_ld

y_ld

x_neq_y

x_lt_y

d_ld

<5: x!=y

!=

Datapath

Page 37: From Combinational to Sequential Circuits to Simple Processors

SplittingSplitting into a controller and datapath

y_sel = 1y_ld = 1

7: x_sel = 1x_ld = 1

8:

6-J:

x_neq_y=1

5:x_neq_y=0

x_lt_y=1 x_lt_y=0

6:

5-J:

d_ld = 1

1-J:

9:

x_sel = 0x_ld = 13:

y_sel = 0y_ld = 14:

1:1

!1

2:

2-J:

!go_i

!(!go_i)

go_i

0000

0001

0010

0011

0100

0101

0110

0111 1000

1001

1010

1011

1100

ControllerController implementation model

y_selx_sel

Combinational logic

Q3 Q0

State register

go_i

x_neq_yx_lt_y

x_ldy_ld

d_ld

Q2 Q1

I3 I0I2 I1

subtractor subtractor7: y-x8: x-y5: x!=y 6: x<y

x_i y_i

d_o

0: x 0: y

9: d

n-bit 2x1 n-bit 2x1x_sel

y_selx_ld

y_ld

x_neq_y

x_lt_yd_ld

<5: x!=y

!=

(b) Datapath

Page 38: From Combinational to Sequential Circuits to Simple Processors

Controller state tableController state table for the GCD example

Inputs Outputs

Q3 Q2 Q1 Q0 x_neq_y

x_lt_y

go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld

0 0 0 0 * * * 0 0 0 1 X X 0 0 0

0 0 0 1 * * 0 0 0 1 0 X X 0 0 0

0 0 0 1 * * 1 0 0 1 1 X X 0 0 0

0 0 1 0 * * * 0 0 0 1 X X 0 0 0

0 0 1 1 * * * 0 1 0 0 0 X 1 0 0

0 1 0 0 * * * 0 1 0 1 X 0 0 1 0

0 1 0 1 0 * * 1 0 1 1 X X 0 0 0

0 1 0 1 1 * * 0 1 1 0 X X 0 0 0

0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0

0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0

0 1 1 1 * * * 1 0 0 1 X 1 0 1 0

1 0 0 0 * * * 1 0 0 1 1 X 1 0 0

1 0 0 1 * * * 1 0 1 0 X X 0 0 0

1 0 1 0 * * * 0 1 0 1 X X 0 0 0

1 0 1 1 * * * 1 1 0 0 X X 0 0 1

1 1 0 0 * * * 0 0 0 0 X X 0 0 0

1 1 0 1 * * * 0 0 0 0 X X 0 0 0

1 1 1 0 * * * 0 0 0 0 X X 0 0 0

1 1 1 1 * * * 0 0 0 0 X X 0 0 0

Page 39: From Combinational to Sequential Circuits to Simple Processors

Completing the GCD custom single-purpose processor design

• We finished the datapath• We have a state table for

the next state and control logic– All that’s left is

combinational logic design

• This is not an optimized design, but we see the basic steps

… …

a view inside the controller and datapath

controller datapath

… …

stateregister

next-stateand

controllogic

registers

functionalunits

You may be asked in homeworks or exams or projects to optimize the design with some respect such as area, speed , power or testability

Page 40: From Combinational to Sequential Circuits to Simple Processors

Example:Example:

Bus Bridge Bus Bridge DesignDesign

Page 41: From Combinational to Sequential Circuits to Simple Processors

• We often start with a state machine– Rather than algorithm– Cycle timing often too central

to functionality

• Example– Bus bridge that converts 4-bit

bus to 8-bit bus– Start with FSMD– Known as register-transfer

(RT) level– Exercise: complete the design

RT-level custom single-purpose processor design – Example “Bus Bridge”

Prob

lem

Spe

cific

atio

n

BridgeA single-purpose processor that

converts two 4-bit inputs, arriving one at a time over data_in along with a

rdy_in pulse, into one 8-bit output on data_out along with a rdy_out pulse.

Sender

data_in(4)

rdy_in rdy_out

data_out(8)

Receiver

clock

FSM

D

WaitFirst4 RecFirst4Startdata_lo=data_in

WaitSecond4

rdy_in=1rdy_in=0

RecFirst4End

rdy_in=1

RecSecond4Startdata_hi=data_in

RecSecond4End

rdy_in=1rdy_in=0rdy_in=1

rdy_in=0

Send8Startdata_out=data_hi

& data_lordy_out=1

Send8Endrdy_out=0

Bridge

rdy_in=0Inputs rdy_in: bit; data_in: bit[4];Outputs rdy_out: bit; data_out:bit[8]Variables data_lo, data_hi: bit[4];

Page 42: From Combinational to Sequential Circuits to Simple Processors

RT-level custom single-purpose processor design (cont’)

WaitFirst4 RecFirst4Startdata_lo_ld=1

WaitSecond4

rdy_in=1rdy_in=0

RecFirst4End

rdy_in=1

RecSecond4Startdata_hi_ld=1

RecSecond4End

rdy_in=1rdy_in=0rdy_in=1

rdy_in=0

Send8Startdata_out_ld=1

rdy_out=1

Send8Endrdy_out=0

(a) Controller

rdy_in rdy_out

data_lodata_hi

data_in(4)

(b) Datapathdata_out

data

_out

_ld

data

_hi_

ld

data

_lo_

ldclk

to a

ll re

gist

ers

data_out

Bridge

Example “Bus Bridge”

Page 43: From Combinational to Sequential Circuits to Simple Processors

Optimization in Optimization in

Synthesis Synthesis

Page 44: From Combinational to Sequential Circuits to Simple Processors

Optimizing single-purpose processors

• Optimization is the task of making design metric values the best possible

• Optimization opportunities– original program– FSMD– datapath– FSM

Page 45: From Combinational to Sequential Circuits to Simple Processors

Optimizing the original program

• Analyze program attributes and look for areas of possible improvement– number of computations– size of variable– time and space complexity– operations used

• multiplication and division very expensive

Page 46: From Combinational to Sequential Circuits to Simple Processors

Optimizing the original program (cont’)

0: int x, y;1: while (1) {2: while (!go_i);3: x = x_i; 4: y = y_i;5: while (x != y) {6: if (x < y) 7: y = y - x; else 8: x = x - y; }9: d_o = x; }

0: int x, y, r; 1: while (1) { 2: while (!go_i); // x must be the larger number 3: if (x_i >= y_i) { 4: x=x_i; 5: y=y_i; } 6: else { 7: x=y_i; 8: y=x_i; } 9: while (y != 0) {10: r = x % y;11: x = y; 12: y = r; }13: d_o = x; }

original program optimized program

replace the subtraction operation(s) with modulo

operation in order to speed up program

GCD(42, 8) - 9 iterations to complete the loop

x and y values evaluated as follows : (42, 8), (43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2).

GCD(42,8) - 3 iterations to complete the loop

x and y values evaluated as follows: (42, 8), (8,2), (2,0)

Page 47: From Combinational to Sequential Circuits to Simple Processors

Optimizing the FSMD

• Areas of possible improvements– merge states

• states with constants on transitions can be eliminated, transition taken is already known

• states with independent operations can be merged

– separate states• states which require complex operations (a*b*c*d) can be broken

into smaller states to reduce hardware size

– scheduling

Page 48: From Combinational to Sequential Circuits to Simple Processors

Optimizing the FSMD (cont.)

int x, y;

2: go_i !go_i

x = x_iy = y_i

x<y x>y

y = y -x x = x - y

3:

5:

7: 8:

d_o = x9:

y = y -x7: x = x - y8:

6-J:

x!=y

5: !(x!=y)

x<y !(x<y)

6:

5-J:

1:

1

!1

x = x_i

y = y_i4:

2:

2-J:!go_i

!(!go_i)

d_o = x

1-J:

9:

int x, y;

3:

original FSMD optimized FSMD

eliminate state 1 – transitions have constant values

merge state 2 and state 2J – no loop operation in between them

merge state 3 and state 4 – assignment operations are independent of one another

merge state 5 and state 6 – transitions from state 6 can be done in state 5

eliminate state 5J and 6J – transitions from each state can be done from state 7 and state 8, respectively

eliminate state 1-J – transition from state 1-J can be done directly from state 9

Page 49: From Combinational to Sequential Circuits to Simple Processors

Optimizing the datapath

• Sharing of functional units– one-to-one mapping, as done previously, is not necessary– if same operation occurs in different states, they can share a

single functional unit• Multi-functional units

– ALUs support a variety of operations, it can be shared among operations occurring in different states

Page 50: From Combinational to Sequential Circuits to Simple Processors

Optimizing the FSM

• State encoding– task of assigning a unique bit pattern to each state in an FSM– size of state register and combinational logic vary– can be treated as an ordering problem

• State minimization– task of merging equivalent states into a single state

• state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state

Page 51: From Combinational to Sequential Circuits to Simple Processors

51

Technology mappingTechnology mapping• Library of gates available for implementation

– Simple• only 2-input AND,OR gates

– Complex• various-input AND,OR,NAND,NOR,etc. gates• Efficiently implemented meta-gates (i.e., AND-OR-INVERT,MUX)

• Final structure consists of specified library’s components only• If technology mapping integrated with logic synthesis

– More efficient circuit– More complex problem– Heuristics required

Page 52: From Combinational to Sequential Circuits to Simple Processors

52

Complexity impact on user

• As complexity grows, heuristics used• Heuristics differ tremendously among synthesis tools

– Computationally expensive• Higher quality results• Variable optimization effort settings• Long run times (hours, days)• Requires huge amounts of memory• Typically needs to run on servers, workstations

– Fast heuristics• Lower quality results• Shorter run times (minutes, hours)• Smaller amount of memory required• Could run on PC

• Super-linear-time (i.e. n3) heuristics usually used– User can partition large systems to reduce run times/size– 1003 > 503 + 503 (1,000,000 > 250,000)

Page 53: From Combinational to Sequential Circuits to Simple Processors

53

Integrating logic design and physical design

• Past– Gate delay much greater than wire delay– Thus, performance evaluated as # of levels

of gates only• Today

– Gate delay shrinking as feature size shrinking

– Wire delay increasing• Performance evaluation needs wire length

– Transistor placement (needed for wire length) domain of physical design

– Thus, simultaneous logic synthesis and physical design required for efficient circuits

Wire

Transistor

Del

ay

Reduced feature size

Page 54: From Combinational to Sequential Circuits to Simple Processors

Embedded Embedded Systems CaseSystems Case

StudyStudy

54Elevator Controller

Page 55: From Combinational to Sequential Circuits to Simple Processors

55

Page 56: From Combinational to Sequential Circuits to Simple Processors

Elevator System• CRC cardsCRC cards is a well-known method for analyzing a

system and developing an architecture.• CRCCRC

– Classes: logical groupings of data and functionality– Responsibilities: describe what the class do– Collaborators: other classes w/ which a given class works

• Elevator Control ClassesElevator Control Classes– Elevator car, Passenger, Floor control, Car control, Car sensors, etc.

• Architectural ClassesArchitectural Classes– Car state, Floor control reader, Car control reader, Car control sender,

Scheduler

56

Page 57: From Combinational to Sequential Circuits to Simple Processors

57

F floorsF floors

N hoistwaysN hoistways

Page 58: From Combinational to Sequential Circuits to Simple Processors

58

Page 59: From Combinational to Sequential Circuits to Simple Processors

59

Page 60: From Combinational to Sequential Circuits to Simple Processors

60

Page 61: From Combinational to Sequential Circuits to Simple Processors

61

Page 62: From Combinational to Sequential Circuits to Simple Processors

62

Classes: logical groupings of data and functionality

Responsibilities: describe what the class do

Collaborators: other classes w/ which a given class works

Elevator Control ClassesElevator Control Classes

Elevator car, Passenger, Floor control, Car control, Car sensors, etc.

Architectural ClassesArchitectural Classes

Car state, Floor control reader, Car control reader, Car control sender, Scheduler

Physical Physical InterfacesInterfaces

Page 63: From Combinational to Sequential Circuits to Simple Processors

63

Page 64: From Combinational to Sequential Circuits to Simple Processors

Architecture• Computation and I/O occur at:

– Floor control panels/displays– Elevator cars– System controller

• Panels Controller• Car Controller

– read buttons and send events to system controller– read sensor inputs and send to system controller

64

Page 65: From Combinational to Sequential Circuits to Simple Processors

System ControllerSystem Controller• Must take inputs from many sources:• Must control cars to hard real-time deadlines• User interface, scheduling are soft deadlines• Testing

– Build an elevator simulator using SystemC, Verilog, VHDL and FPGA• Simulate multiple elevators• Simulate real-time control demands

65

Page 66: From Combinational to Sequential Circuits to Simple Processors

HomeworkHomework• The simplest possible custom single-purpose processor

– Design a processor to multiply two numbers. The initial data are in registers/counters A and B. The result should be in register/counter C.

– You have only reversible counters (with reading) to be used in the data path.

– The counters perform the following operations:• Add one• Subtract one• Read new value

– Invent the algorithm for multiplication. Use minimum number of counters– Design the reversible counter by hand using logic gates and D FFs.– Design the control unit– Design the data path– Draw the timing diagram of the whole system.– You can use VHDL or Verilog to help you, but I need your design by hand.

Page 67: From Combinational to Sequential Circuits to Simple Processors

Summary

• Custom single-purpose processors– Straightforward design techniques– Can be built to execute algorithms– Typically start with FSMD– CAD tools can be of great assistance

Page 68: From Combinational to Sequential Circuits to Simple Processors

Questions to Exams (1)

1. What are the main methods of Combinational logic design?2. What is Mealy FSM (Finite State Machine)?3. What is Moore State Machine?4. Think about a robot controller as a Sequential logic Circuit. What are the

blocks and their role?5. Role of abstraction in FSM design. Give examples.6. Explain the concepts from Gajski’s Chart in a Custom single-purpose

processor design7. RT-level custom single-purpose processor design. Explain briefly all design

stages from bottom of design hierarchy (layout) to the top (system design of a GCD processor as an example)

8. List and explain logic gates.9. List and explain combinational blocks.10. List and explain sequential blocks.11. List and explain sensors to be used with embedded systems of FSM type.12. List and explain actuators to be used with such embedded systems.

Page 69: From Combinational to Sequential Circuits to Simple Processors

Questions to Exams (2)1. What are the main synthesis processes and CAD tools in Combinational logic

design?2. What are the methods to solve the covering problem?3. Explain the concept of search and give examples.4. Explain the concept of heuristic in search and give examples. SOP minimization

can be very useful. Also ESOP.5. Explain design tradeoffs and Pareto Optimization on one practical example.6. Explain in detail on example the basic synthesis method for Mealy FSM from

specification to a circuit from D type flip-flops (FFs) and logic gates.7. Explain and illustrate how D, T and JK flip-flops work.8. What is a difference between

• Register with enable• Register without enable• Reversible register

9. Draw the schematic of the FSMD.10. Explain GCD algorithm of Euclides on examples.11. Without looking to the slides, convert GCD algorithm to a FSMD. 12. How can we optimize GCD?13. Apply these ideas to Least Common Multiplier algorithm and FSMD for two

numbers.

Page 70: From Combinational to Sequential Circuits to Simple Processors

Questions to Exams (3)

1. The role of GO-TO commands in FSMD design. Are they good or bad? Give examples. The role of structured design of FSMD.

2. How the data path is created from FSMD? This is one of main topics for this whole class. You have to know it well.

3. How CU (Control Unit) is created from FSMD? This is one of main topics for this whole class. You have to know it well.

4. Compare state graph, state transition table and flow-chart. Why we need all of them?

5. In this class we are not optimizing combinational logic or FSMs too much. But if you have taken ECE 572 or ECE 573 classes you know many methods to optimize on these levels. Can you give practical examples of these optimizations in GCD or other similar system?

6. Complete the “Bus bridge” FSMD that converts 4-bit bus to 8-bit bus and is given in these slides.

7. Discuss Optimizing the single-purpose processors. Give examples. Explain levels of optimization, such as the original program, the FSMD, the data path, the CU, the register, the combinational logic, finally the technology mapping.

8. Design the complete elevator system for a villa of a crazy millionaire artist from Hollywood. Cost does not count. You have to amaze his guests.

Page 71: From Combinational to Sequential Circuits to Simple Processors

71

Sources

Slides from S. Mohammadi

Vahid, Siamak Mohammadi Givargis and Marwedel

•EECE 353-1•Real-Time Systems•T. John Koo•Embedded Computing Systems Laboratory•Institute for Software Integrated Systems•Department of Electrical Engineering and Computer Science•Vanderbilt University•5306 Stevenson Center•January 16, 2006•[email protected]

Page 72: From Combinational to Sequential Circuits to Simple Processors

What we can cover on Monday meeting?1. Design of SOP circuits from KMaps. Prime implicants and Covering2. Design of POS circuits from KMaps. Prime implicates and Covering3. Design of ESOP circuits from KMaps. Algebraic rules for AND/EXOR

logic.4. Design using NAND and NOR gates. De Morgan Rules.5. Factorization.6. Multiplexers.7. Iterative circuits and their types.8. Using State Machines to design one-directional iterative circuits9. Predicates10. Oracles11. SAT oracles12. Graph Coloring oracles and distributed processors13. SEND+MORE=MONEY problem and its oracle.14. The idea of Constraint Satisfaction and Distributed Software/hardware for it.