A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit
description
Transcript of A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit
A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit
Based on a ENEL619.23 white paperprepared by Darrell Anklovitch
Overview
• Architecture Overview
• Register Map
• ALU features and sample instructions
• Multiplier features and sample instructions
• Shifter features and sample instructions
References
• ADSP-BF535 Blackfin Processor Hardware Reference, Rev 2, April 2004, Analog Devices. – Section 2
• Blackfin Processor Instruction Set Reference, Rev 2, May 2003, Analog Devices. – Sections 8 ~ 10, 14 & 15
• A number of the figures in this presentation are based on figures found in the ADSP-BF535 Blackfin Processor Hardware Reference.
ADSP-2106x Core ArchitectureADSP-2106x Core Architecture
DAG 2
8 x 4 x 24
DAG 1
8 x 4 x 32
CACHE
MEMORY
32 x 48
PROGRAM
SEQUENCER
PMD BUS
DMD BUS
24PMA BUS
PMD
DMD
PMA
32DMA BUSDMA
48
40
JTAG TEST &
EMULATION
FLAGS
FLOATING & FIXED-POINT
MULTIPLIER,
FIXED-POINT
ACCUMULATOR
32-BIT
BARREL
SHIFTER
FLOATING-POINT
& FIXED-POINT
ALU
REGISTER
FILE
16 x 40
BUS CONNECT
TIMER
Register File and COMPUTE Units
• Key issues– 5 data paths FROM COMPUTE units
– 5 data paths TO COMPUTE units
– Highly parallel operations UNDER THE RIGHT CONDITIONS
BF533 Memory Accesses
Under the right conditions -- 4 memory accesses at same time64 bit Instruction Fetch, 2x32 bit Data Loads, 32 bit Data Store
PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same timePLUS background DMA activity
Compute Unit Architecture
2 Multipliers
2 ALUs
1 set of Video ALUs1
Shifter
RegisterFile
Register File
8 x 32 bit OR
16 x 16 bit
2 x 40 bitaccumulators
DATA REGISTER SYNTAX:•R0, R1 etc refer to 32 bit registers•R0.L refers to the low 16 bits of the R0 32 bit reg•R0.H refers to the high 16 bits of the R0 registerACCUMULATOR SYNTAX:•A0.L => low 16 bits•A0.H => next 16 bits•A0.W => least significant 32 bit word•A0.X => MS 8 bit extension
SHARC – 16 32-bit data registers, integer and floatThere is a pair of SHARC accumulator registers too
ALU Data Flow2 x 32 bit paths to dualMultiplier/ALU units
2 x 32 bit paths back to register file
Sample instructions
BlackfinR0 = R1 + R2;
R0.L = R1.L + R2.H;
R0 = R1 +|- R2;
Means
R0.L = R1.L – R2.Lin parallel withR0.H = R1.H + R2.H
SHARCR0 = R1 + R2;
Closest
R0 = R1 + R2, R4 = R1 – R2;
68KMOVE.L R2, R0ADD.L R1, R0
MOVE.W R2, R0ADD.W R1, R0
MOVE.L R2, R0ASR.L #16, R0MOVE.L R1, R3ASR.L #16, R3ADD.W R3, R0ASL.L #16, R0MOVE.W R2, R0ADD.W R1, R0
ALU Features
Dual 16 bit OPS:
Can be :
Single 16 bit OPS:
Single 32 bit OPS:
31
31
Rm
Rp
Rn
Rm
Rp
Rn
Dual 16 bit Cross:
ALU Sample InstructionsSingle 16 bit ops: Dual 16 bit ops:
Quad 16 bit ops:
A B A BDC
Single 32 bit ops:
Dual 32 bit ops:
•A & B registers must stay on the same side of the ‘|’ for bothInstructions•For dual and quad 16 bit operations the (CO) option causes the destination registers to cross
Operator order is important+ must come before -
Does not work in parallelMust have this option
Multiply Data Flow2 x 32 bit paths to dualMultiplier/ALU units
2 x 32 bit paths back to register file
2 x 40 bitaccumulator
Multiplier share the same operand/result buses as the ALU
Multiply Features
H H
H L
L H
L L
•Multiplies are signed fractional by default•Signed fractional multiply result is automatically leftshifted 1 bit. •Signed fractional multiply != signed integer multiply•Rounding available on fractional number multiplies andspecial option of integer number multiplies
Rounding2 cases:
0x8000
31
Rd
top 16 bits go to destination register
31
Rm31
Rp
0x8000
31
Rd
top 16 bits go to destination register
32 bit result
Rounding adds 0x8000 to the 32 bit multiplier result oraccumulator value before extracting a 16 bit value to thedestination register
Fractional Multiply
•When extracting a 16 bit fractional value from an accumulator the high 16 bits is taken•Where in the destination register it goes depends on whichaccumulator is being extracted from
Fractional Multiply !=Integer Multiply
Fractional Multiply !=Integer Multiply
Integer Multiply
•When extracting a 16 bit integer value from an accumulatorthe low 16 bits is taken.•Where in the destination register the 16 bit value goes depends on which accumulator is being extracted from
Fractional Multiply !=Integer Multiply
Multiply Sample Instructions16 bit extraction from ACC 0 16 bit extraction from ACC 1
32 bit extraction A1 += R1.H * R2.L , A0 += R1.L * R2.L;R3.H = (A1 += R1.H * R2.L) , R3.L = (A0 += R1.L * R2.L);Any combination of .H and .L in the 2 operands is allowed
R3 = (A1 += R1.H*R2.L), R2 = (A0 += R1.L * R2.L);Where destination registers must be paired as follows: R[1,0], R[3,2], R[5,4] and R[7,6]
R3.H = (A1 += R1.H * R2.L), A0 += R1.L * R2.L;
Multi-issue MAC Instruction Examples
Shifter Sample Instructions
2 operatorRegistershifts
2 operatorImmediateshifts
3 opRegshift
3 opImmediateshift
Arithmetic shift
Parallel Instruction Examples• In general there are 16 and 32 bit versions of
the arithmetic instructions• Most of the 32 bit instructions can be
executed in parallel with 2 x 16 bit memory/index operations
• Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands
• || means parallel• Examples:
– A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\– R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0];