REMIT Implementation in DONG Energy DERA Seminar, 27 April 2012 Marie-Louise Piil Christensen.
Piil bl ltiPromising low power reusable solutions: Apppp ... · Digital Signal Processors History...
Transcript of Piil bl ltiPromising low power reusable solutions: Apppp ... · Digital Signal Processors History...
P i i l bl l tiP i i l bl l tiPromising low power reusable solutions: Promising low power reusable solutions: Application Specific InstructionApplication Specific Instruction--set Processors set Processors pp ppp p
Myung Hoon SunwooMultimedia Comm. SoC Lab. Ajou University, Korea
Ajou Univ. SOC Lab.MultimediaCommunications1 / 75
Outline
What is ASIP? and Why ASIP?
SPOCS (Signal Processors for OFDM Communications)SPOCS (Signal Processors for OFDM Communications)SPOCS Architecture for FFT and Bit Manipulation
Performance Comparisons and Implementations
DASIP (Digital Audio Specific Instruction set Processor)Proposed Instructions and Coprocessor
Proposed Inverse Quantization Algorithm
VSIP (Video Specific Instruction set Processor)Proposed Instructions and Coprocessors
Performance Comparisons
Trends of recent ASIPsApplications of Low power ASIPs
ASIP design technologies
Conclusions
Ajou Univ. SOC Lab.MultimediaCommunications2 / 75
Outline
What is ASIP? and Why ASIP?
SPOCS (Signal Processors for OFDM Communications)SPOCS (Signal Processors for OFDM Communications)SPOCS Architecture for FFT and Bit Manipulation
Performance Comparisons and Implementations
DASIP (Digital Audio Specific Instruction set Processor)Proposed Instructions and Coprocessor
Proposed Inverse Quantization Algorithm
VSIP (Video Specific Instruction set Processor)Proposed Instructions and Coprocessors
Performance Comparisons
Trends of recent ASIPsApplications of Low power ASIPs
ASIP design technologies
Conclusions
Ajou Univ. SOC Lab.MultimediaCommunications3 / 75
What is ASIP?DSP
Disadvantages : L P f /
Multi-StandardMultimedia & Communications
Low Performance/High Power Consumption
WLAN
Ad t f ASIC
Advantages : Programmability,
Flexibility4G Wireless
Communication
Advantages of ASIC + Advantages of DSP ASIP
Advantages :
DVB, DAB
Disadvantages :
Advantages :Optimization, Low Power,
High Performance H.264/AVC
ASIC
Disadvantages : High Development Cost,
Low Flexibility, Long Time to Market
DMB
Ajou Univ. SOC Lab.MultimediaCommunications
ASIC
4 / 75
What is ASIP?
Changes of System Design EnvironmentSh t Ti t M k tShort Time to MarketFrequent Spec. Changes27% CAGR(Compound Annual Growth Rate) of DSP Market
16
18
10
12
14
$
4
6
8$B
year 0
2
2002 2003 2004 2005 2006 2007 2008 2009
S F d C t F b 2005
Ajou Univ. SOC Lab.MultimediaCommunications
Source: Forward Concepts, February 2005
5 / 75
Why ASIP?
Computational Efficiency and Flexibility
GeneralPurpose Digital
Signal
StrongARM1100.4MIPS/mW
TMS320C54x3MIPS/mW
exib
ility
Processors SignalProcessors Application
Specific Instruction setProcessors
Application
Fle
PhysicallyOptimized
ApplicationSpecific
ICs
Performance
OptimizedICs
Determine the Best Choice between Flexibility vs. PerformanceHigh Performance and Flexibility System
Source: T. Noll, RWTH Aachen
Ajou Univ. SOC Lab.MultimediaCommunications
g y yApplication Specific Instruction set Processors
6 / 75
What Resources in SOC
Digital signal processors Hardware-independent SoftwareDigital signal processorsMicroprocessorsASIPs
Hardware-independent Software
Applications
User definedI f
Libraries Middle
Various MemoriesPeripheral, InterfaceP bl C
Interface
Hardware-Dependent Software
Operating Systems
ware
Programmable CoresA/D, D/A, AnalogRTOS
Operating Systems (Kernel)
Device Drivers
RTOSMiddle WareApplication SW
Hardware
Analog
CPUCore
DSPROM
MPEG Cache
DRAM
Logic
Etc.Analog DSPROMDRAM
Ajou Univ. SOC Lab.MultimediaCommunications7 / 75
SOC Challenges
Reuse Technology
Block Based Design
Platform Based DesignMethodology
Timing Driven DesignMethodology
Block Based DesignMethodology
SRAM
Methodology
ReusableμP core
ROM
ROMATMData Cache
S i l I/F
SRAM
ROMμP core
Logic
CustomerDefined
Logic
Logic
MPEG RAM
Serial I/F
LogicSoft I/F IP
LogicLogic
Ajou Univ. SOC Lab.MultimediaCommunications
Cited from “Surviving the SOC Revolution,” Chang et al., Kluwer Academic Publishers
8/ 75
Microprocessors vs. Digital Signal ProcessorsDigital Signal Processors
History of Microprocessors
ConvergingConverging
Ajou Univ. SOC Lab.MultimediaCommunications9 / 75
Microprocessors vs. Digital Signal ProcessorsDigital Signal Processors
History of DSPsy
Diverging
Hundreds of DSPs
(In-house)
Ajou Univ. SOC Lab.MultimediaCommunications10 / 75
Design flow of ASIP
Target ApplicationSelection SPOCS DASIP VSIPSelection
Application Profiling
SPOCS DASIP VSIP
WLAN MPEG – 2/4 AAC H.264/AVC
H/W, S/W Partitioning
Design Special Instructions
and Architecture
Design Hardware Accelerators
FFT, Bit operation
IMDCT,Huffman decoding
ME/MC,VLC
Verification and
and Architecturep Huffman decoding VLC
FPGA board LISA simulator C/Matlab programPerformance Comparison
Chip Fabrication
FPGA board, LISA simulator, C/Matlab program
Ajou Univ. SOC Lab.MultimediaCommunications11 / 75
Design flow using LISATek tools
ApplicationAdjust Generate
LISA 2.0 DescriptionLISATek
Processor Designer
Application
C-Compiler
Assembler
LinkerD i lSimulator
Architecture
Design goalsmet?
NoArchitecture
Debugging & Profiling
RTL Generation BuildYes
RTLImplementation
SoftwareTools
ConvergenSCSystemC
Analyze
Ajou Univ. SOC Lab.MultimediaCommunications12/75
p(Verilog, VHDL,SystemC)
yModels
Software tool developmentDisAssembly Assembly code
< LISATek Development Environment >
< Assembler / Linker > < Simulator >Register Memory Pipeline
Ajou Univ. SOC Lab.MultimediaCommunications
< Assembler / Linker > Simulator
13 / 75
HW/SW verification environment
Compare FPGA board, C / Matlab, Lisa simulatorp , ,Reduce the ASIP development time
C simulator FPGA results
Ex) Verification of IMDCT of DASIP
Lisa simulator
Matching !!
Ajou Univ. SOC Lab.MultimediaCommunications14 / 75
Outline
What is ASIP? and Why ASIP?
SPOCS (Signal Processors for OFDM Communications)SPOCS (Signal Processors for OFDM Communications)SPOCS Architecture for FFT and Bit Manipulation
Performance Comparisons and Implementations
DASIP (Digital Audio Specific Instruction set Processor)Proposed Instructions and Coprocessor
Proposed Inverse Quantization Algorithm
VSIP (Video Specific Instruction set Processor)Proposed Instructions and Coprocessors
Performance Comparisons
Trends of recent ASIPsApplications of Low power ASIPs
ASIP design technologies
Conclusions
Ajou Univ. SOC Lab.MultimediaCommunications15 / 75
Signal Processors for OFDM Communication Systems (SPOCS)Communication Systems (SPOCS)
PCU Program
SPOCSFFT calculation problem of General DSP
PCU(Program Control
Unit)
ProgramMemory
Do/Loop instruction => additional cycle neededInefficient Butterfly calculation (Fixed MAC structure)
AGU(Address Generation
FAGU(FFT AGU)
FFT #N (Instruction)Input data address decision
(Address Generation Unit)
DPU
(FFT AGU) Addr.offset
Address generation (automatically)Reduce address generation time
DataMemory
(Data Processing Unit)
DSP FFT calculation cycleCarmel DSP (N+10)log2N + 5N/4- 4
TMS320C62X(4N/2)log2N +
BMUTMS320C62X 2
7log2N + N/4 + 9
SPOCS (2N/2)log2N + 9 * N : FFT point
DPU(Data Processing
Unit)
BMU(Bit Manipulation
Unit)
Ajou Univ. SOC Lab.MultimediaCommunications
pSPOCS : application specific signal processor for OFDM communication systems [Jour. Of signal proc., 2008].Design of new DSP instructions and their hardware architecture for high-speed FFT [Jour. of VLSI signal proc., 2003].
16 / 75
SPOCS architecture
Proposed DPU Architecture Butterfly Calculation flow
Adder3Mul MulP1 P2 Acc3
Cycle 1(SBUTTERFLY)
Cycle 2(ABUTTERFLY)
Switching Logic
Adder1 Adder2Acc1 Acc2
2MAC/1ALU
SPOCS FFT Calculation
DPU ArchitectureFixed MAC of Existing DSP add Switching Logic : Support MUL-MUL-SUB(ADD), ADD-SUB Operation per CycleFFT Instruction
Existing DSP : Many Instructions Using (DO, ADD, SUB, Load, Store, MAC etc.) FFT, SBUTTERFLY, ABUTTERFLYSupport Various Instructions
Ajou Univ. SOC Lab.MultimediaCommunications
51 Instructions including New Instructions
17 / 75
SPOCS bit manipulation operations
MotivationVarious communication systems have been developed, such as xDSL, WLAN,
DMB, IMT2000, etc.These systems have similar bit manipulation functions.
ScramblingConvolutional
Encoding/Puncturing
Interleaving Modulation
BasebandChannel
Sync/ViterbiDescrambling
BasebandData
yDemodulationDeinterleaving
ViterbiDecodingDescrambling
Ajou Univ. SOC Lab.MultimediaCommunications18 / 75
Basic bit manipulation operations
ScramblingN th Output decided by XOR operations of
Input
N-th Output decided by XOR operations of input bit and N-th shifted data according to generator polynomialGenerator Polynomial = X7 + X4 + 1
Output
R0R1R2 R3R4R5R6
Shift XOR operations
Output A
C l ti l E di
Shift, XOR operations
Input
Output B
R0 R1 R2 R3 R4 R5 Convolutional EncodingOutputs derived by XOR operations of bits in the shift register decided by encoder structure
Input
Generator Polynomial = X7 + X4 + 1 Shift, XOR operations
A4A3A2A1A0 B0B1B2B3B4 Bit Stream MultiplexingCombining two bit streams as an alternate order
A4A3A2A1A0 B0B1B2B3B4
A2 A1 A0B0B1B2
Ajou Univ. SOC Lab.MultimediaCommunications
B7 A7 B6 A6 B5 A5 B4 A4 B3 A3 B2 A2 B1 A1 B0 A0 Bit Stream Multiplexing
19 / 75
Basic bit manipulation operations
Input AInput A
Input B PuncturingDeletes some of the encoded bits according to
ttOutput
patterns
Bit Insert and Extract OperationsOperations
InterleavingShuffling input bits
Bit Insert and Extract Operations
Ajou Univ. SOC Lab.MultimediaCommunications20 / 75
SPOCS bit manipulation Instructions
Existing DSP (Puncturing, Interleaving) SPOCS (Puncturing, Interleaving)
Input DataShift LeftShift Right0 0 0 0 0 0
Input Data
Bit ExtractProgrammable Switchg
0 0 0 0 0
0 0 0 0 Data Generation
OR OperationBit Load Register :Load the Extracted BitData Generation1 Cycle Operation0 0 0 0 Data Generation 1 Cycle Operation
Existing DSP (Scrambling, Convolution) SPOCS (Scrambling, Convolution)
Input DataALU : XOR Operation Input Data
Existing DSP (Scrambling, Convolution) SPOCS (Scrambling, Convolution)
Shifter : Shift
Shifter : ShiftALU : XOR Operation
ALU : XOR OperationBMU : Maximum 9 DataCan Be Shifted and XOR1 Cycle Operation
Ajou Univ. SOC Lab.MultimediaCommunications21 / 75
FFT performance of SPOCS
Key Features
Proposed Instructions for FFT Calculation FFT ABUTTERFLY SBUTTERFLYProposed Instructions for FFT Calculation - FFT, ABUTTERFLY, SBUTTERFLY
FAGU – Automatically generate Data addresses (Very Fast FFT Operation)
Reduce Program Memory Accesses (Only three instructions) => Very Low Power
Standard FFT point Time limit (µs) SPOCS time (µs)
WLAN (54Mbps) 64 4 1.4
DAB512 62 16.5
2048 256 80.5
DVB-T 2048 231 80 5
Meet Various Communication Standards
DVB-T 2048 231 80.5
VDSL 4096 250 174.5
Implementation of application-specific DSP for OFDM systems [IEEE ISCAS2004].FFT operating apparatus of programmable processors and operation method thereof[US/European patents].Digital signal processor architecture with bit manipulation accelerator for communication
Ajou Univ. SOC Lab.MultimediaCommunications
Digital signal processor architecture with bit manipulation accelerator for communicationsystems [EURASIP JASP, 2005].Bit manipulation operation circuit and method in programmable processor [US patents].
22 / 75
OFDM performance of SPOCS
PerformanceCarmel DSP TMS320C62X SPOCS
DSP Structure VLIW VLIW Application Specific DSP
Hardware Size VLIW (N.A.) VLIW (N.A.) 107,000 Gates + 12Kbyte Memory( ) ( ) , y y
DPU Structure 2MAC/2ALU 2MUL/6ALU 2MAC/1ALU
Cycles/Butterfly 2 4 2
Calculation Time (FFT)64-point 520 835 393
256-point 2,452 4,225 2,057
1024-point 11,616 20,815 10,249
2048 point 25 194 45 654 22 5372048-point 25,194 45,654 22,537
StarCore SC140 TMS320C62X SPOCS
Operation 4 Shift / 4 Logical Operation BMUOperation 4 Shift / 4 Logical Operation BMU
Convolution (IS-95) (K=9, R=1/2, 192 bits) 463 N.A. 152
Block Interleaving (802.11a) (16 * 6 bits) 414 N.A. 91
Scrambling (802.11a) (12Mbit/s) N.A. 39 X 106 20 X 106
Ajou Univ. SOC Lab.MultimediaCommunications
Convolution (802.11a) (12Mbit/s) N.A. 77 X 106 12 X106
23 / 75
SPOCS implementation
iPROVE Xilinx xc2v6000
SPOCS Core Design FPGA Implementation
SEC 0.18um Synthesis (Synopsys)• Gate : 107,000• Program Memory : 4 Kbyte, Data Memory : 8 Kbyte• Frequency : 290MHz
iPROVE Xilinx xc2v6000Emulate IEEE 802.11a WLAN
Special Instruction Set for FFT Operation and BMU InstructionsC t OFDM C i ti t d d
Frequency : 290MHz
Ajou Univ. SOC Lab.MultimediaCommunications
Can meet OFDM Communication standards
24 / 75
SPOCS implementation
Macro Libraries for IEEE 802.11aScrambling (Descrambling)DO #end, @R3SCB GR7, #0x0cMOV2 @R1, ACC0 | @R4, ACC1PUNC ACC1 GR2L
Mapping (Demapping)start of 64 QAM mapping
MOVI #0x0000,R3 * Q-channel inputPUNC ACC1, GR2LMOV2 @R2, ACC0 | @R5, ACC1PUNC ACC1, GR3Lend:
Convolution Encoding
MOVI #0x0000,R3 Q channel inputMOVI #0x0050,R4 * I-channel input MOVI #0x0090,R1 * to loop MOVI #0x0030,GR7 * #48 loopingMOVE GR7,@R1
DO # d f1 @R1Convolution EncodingDO #ENDDO, @R4
MOVEC R5, GR7MOVE @R0+, GR2
DO #endof1,@R1MOVE @R3,GR0 * to change value two's complimentMOVI #0x0003,ACC0 * make ACC0 011 to get last 2bits of GR0AND GR0,ACC0 * get last 2bits of GR0MOVEC ACC0,GR1 * store the value of ACC0MOVI #0x0002,ACC0 * make ACC0 010 to compare with GR1
CONV GR0, GR2, GR3, GR4CONV GR1, GR2, GR3, GR5MOVE @R1+, ACC0
Interleaving (Deinterleaving)
p
IFFT (FFT)Interleaving (Deinterleaving)DO #loop1, @R5DO #label1, @R1MOVE @R0+, GR0label1: PUNC ACC1, GR0L, GR6
( )MOVI PSW 0x4000 -- PSW setting scale downMOVI M0 0x000A -- Xmem base = 10MOVI R7 0x000A -- Ymem base = 10IFFT #256SBUTTERFLYABUTTERFLY
Ajou Univ. SOC Lab.MultimediaCommunications
, ,ROL GR6, ACC0MOVEC GR4, R0
ABUTTERFLY
25 / 75
SPOCS implementationHW/SW Verification Environment using FPGA, Matlab, Lisa simulator
Ajou Univ. SOC Lab.MultimediaCommunications26 / 75
Ajou Univ. SOC Lab.MultimediaCommunications
Outline
What is ASIP? and Why ASIP?
SPOCS (Signal Processors for OFDM Communications)SPOCS (Signal Processors for OFDM Communications)SPOCS Architecture for FFT and Bit Manipulation
Performance Comparisons and Implementations
DASIP (Digital Audio Specific Instruction set Processor)Proposed Instructions and Coprocessor
Proposed Inverse Quantization Algorithm
VSIP (Video Specific Instruction set Processor)Proposed Instructions and Coprocessors
Performance Comparisons
Trends of recent ASIPsApplications of Low power ASIPs
ASIP design technologies
Conclusions
Ajou Univ. SOC Lab.MultimediaCommunications28 / 75
Digital Audio Specific Instruction set Processor (DASIP)Processor (DASIP)
Audio Applications
High Speed IMDCT
High Speed Parallel Execution
DOLBY (AC3)DOLBY (AC3)
Parallel Executionof Huffman Decoding
DTS 96/24DTS 96/24
MPEG AACMPEG AAC
High
ApplicationSpecific MP3PROMP3PRO
ASIP for Audio Applications
HighPerformance
AAC
Instruction Setfor Audio Algorithm
MP3PROMP3PRO
OGG, WMAOGG, WMA
Ajou Univ. SOC Lab.MultimediaCommunications29 / 75
Digital Audio Specific Instruction set Processor (DASIP)
Register files including 32 registersProgram control unit, data processing unit, address generation unit
Processor (DASIP)
Program control unit, data processing unit, address generation unitHuffman accelerator for MPEG-2/4 AAC2 ROM tables and 2 Data Memories
ControlP C t l U it Program
Register Program Control Unit ProgramMemory
DataProcessing
Unit
AddressGeneration
UnitRegister
ROMTABLE
Data
ROMTABLE
Data
Huffmanaccelerator
filesData
MemoryData
Memory
Design of a high-quality audio-specific DSP core [Best Paper Award in IEEE SIPS 2005].
Ajou Univ. SOC Lab.MultimediaCommunications
Computing circuits and method for running an MPEG-2 AAC or MPEG-4 AAC audio decodingalgorithm on programmable processors [US and Korea patents].
30 / 75
Complexity of the MPEG-2 AAC decodingdecodingHigh computational loadsHigh computational loads
Filterbank IMDCT(Inverse Modified DCT)Huffman decoding Compare & Program controls
FilterbankHuffman DecodingI Q t & l
4 1%
33%
Inv-Quant & scaleEtc.
16%
4.1%
16%
48%
Ajou Univ. SOC Lab.MultimediaCommunications31 / 75
Fast IMDCT Algorithm
The fast algorithm efficiently reduces the computational loads g y pof overall system by a factor of about 10 Using N/4-point complex IFFT
( )X k ( 2 1) (2 )2NX k j X k− − + ⋅
2 1( )8
j nNeπ
⋅ +×
2 1( )8
j kNeπ
⋅ +×
( )x n
Ajou Univ. SOC Lab.MultimediaCommunications32 / 75
Proposed instructions for IMDCT
X(k) LDPRE instruction
Pre-processing LDPRE, ST2 • 4 data transfers (load)• IAMU• Support parallel loads
N/4 IFFT LD4 instruction
pp p
Post-processing LD4, ST2
• 4 data transfers (load)• High data bandwidth• Support parallel loads
Data de-
interleavingLD4, ST2
pp p
ST2 instructioninterleaving
x(n)
• 2 data transfers (store)• High data bandwidth• Support parallel stores
Ajou Univ. SOC Lab.MultimediaCommunications33 / 75
Huffman decoder
Bitstream parser Specific Instructions for Huffman decoding
General Reg.
Huffman book select
Accumulator
HFMD GR0, GR1, Acc0, GR[n]GR0 index(9bit) of [Acc0]GR1 code length(5bit) of [Acc0]
▪ Gate Count : 3800 gates
<Special Feature>HFMD
g ( ) [ ]
▪ Index value directly loaded to RegisterHuffman decoder
Processor Computation CycleTMS320C62x N. A. (Very large)
Korean DSP 5 cycles
General Reg. General Reg.
Korean DSP 5 cycles
ASIC 2.5 cycles
Ajou ASIP 2 cycles<Performance Comparisons of Huffman Decoding >
Ajou Univ. SOC Lab.MultimediaCommunications
index Code length<Performance Comparisons of Huffman Decoding >
34 / 75
Proposed inverse quantization algorithm
4 43 3( 8) ( ) 16
8 8X XX = × = × Features
43
(1) 1 256,
: ( )
from X to
X LUT X
=
=
1. Require 256 LUT
2. Consist of 4 stages
3 No computation requires atRemainder Function
①
443
16
(2) 257 2047,
(401 [ ])8: 2( ([ 1]) ([ ]) ) ( ) ([ ]) 2
8 8 2 8 8
from X toX
X X X XX LUT LUT rem LUT
=
−= + − − × + ×
3. No computation requires at
the first stage
4. All of multiplications and ②
(3) 2048 8191,
: ( ) 32,64
from X toXif rem
=
≤
divisions can achieve by
only shift operations
5. The positive and negative
(1)③
43
12
(218 [ ])644( ([ 1]) ([ ]) ) ( )
64 64 2 64
XX X XX LUT LUT rem
−= + − − × + 8([ ]) 2
64
: ( ) 32,
XLUT
Xif rem
×
>
errors have almost same
distribution (It can reduce
error accumulation)(2)④
483
12
: ( ) 32,64
(218 [ ])644( ([ 1]) ([ ]) ) ( ( ) 64) ([ ]) 2
64 64 2 64 64
if rem
XX X X XX LUT LUT rem LUT
>
−= + − + × − + ×
(2)④
Ajou Univ. SOC Lab.MultimediaCommunications
(3)Gauss Function
35 / 75
Proposed architecture
EXTB instructionThe rem(X/N) and the gauss[X/N] functions in one cycle
Syntax EXTB ACC0, GR0, #N
( ) g [ ] yThe syntax of the EXTB instruction The operation of the EXTB2
Description ACC rem ( GR0 / 2N ) when N<0
Description ACC [ GR0 / 2N ] when N>0Description ACC [ GR0 / 2 ] when N>0
Implementation Results (Instruction count)
Can reduce computational loads
Processor ARM TI 54X DASIP
Direct linear interpolation algorithm 29 27 21
Implementation Results (Instruction count)
Tsai algorithm 61 57 47
Proposed algorithm 49 46 38
Ajou Univ. SOC Lab.MultimediaCommunications
T. H. Tsai and C. C. Yen, “A High Quality requantization quantization method for MP3 and MPEG-4 AAC audio coding,” in Proc. IEEE Int. Symp. On Circuits and Syst., 2002, pp. 851-854
36 / 75
Proposed inverse quantization algorithm
Error graph of the proposed IQ method
Proposed method vs. Direct method
ErrorDirect
Method(256)Korean
256(2001)
Taiwan256
(2003)
Taiwan128
(2003)
The proposed Algorithm
256Max. error(257-2048) 0.08728 0.04365 0.02538 0.03669 0.048115Max. error(2049-8191) 1.39655 0.69832 0.35389 0.58217 0.323076
Average error 0.41979 -0.20990 0.03161 0.16233 0.0079631
Ajou Univ. SOC Lab.MultimediaCommunications
Novel non-linear inverse quantization algorithm and its architecture for digital audio codecs [IEEE ISCAS 2007].
37 / 75
Outline
What is ASIP? and Why ASIP?
SPOCS (Signal Processors for OFDM Communications)SPOCS (Signal Processors for OFDM Communications)SPOCS Architecture for FFT and Bit Manipulation
Performance Comparisons and Implementations
DASIP (Digital Audio Specific Instruction set Processor)Proposed Instructions and Coprocessor
Proposed Inverse Quantization Algorithm
VSIP (Video Specific Instruction set Processor)Proposed Instructions and Coprocessors
Performance Comparisons
Trends of recent ASIPsApplications of Low power ASIPs
ASIP design technologies
Conclusions
Ajou Univ. SOC Lab.MultimediaCommunications38 / 75
Video Specific Instruction set Processor (VSIP)Processor (VSIP)
Video Applications
JPEG 2000JPEG 2000
Special Features for
HuffmanME/MC CoprocessorParameterized JPEG 2000JPEG 2000Parameterized,
Highly Parallel Architecture
MPEG 2/4MPEG 2/4
H.264/AVCH.264/AVC
ASIP for VideoApplications
Optimized DALUApplication
Specific H.264/AVCH.264/AVCOptimized DALUfor
Integer DCT, Loop Filter
Instruction Setfor VideoAlgorithm
Ajou Univ. SOC Lab.MultimediaCommunications39 / 75
Video Specific Instruction set Processor (VSIP)
DSP Core
Processor (VSIP)
H.264 Decoding (%)MC
In-Loop filter
VLC
Color converter
Inv. Transform/Q
DSP Core
PCUProgramS ifiS ifi
Q
Intra Prediction
Decode MV
Other DPU
Programmemory
Data
Specific Specific InstructionsInstructions
AGU
Datamemory
H.264 Encoding (%)
Motion Estimation
Intra Prediction
In-loop filter ME/MCCAVLC/UVLC
CoprocessorCoprocessor
Transform/Q CoprocessorCoprocessor
Ajou Univ. SOC Lab.MultimediaCommunications
ASIP Instructions and their hardware architecture for H.264/AVC [Journal of Semiconductor Technology and Science, 2005.12]
40 / 75
H.264 computation characteristic
Deblocking filtering Intra prediction
p’0=(p2+2*p1+2*p0+2*q0+q1+4)>>3p’1=(p2+p1+p0+q0+2)>>2p’2=(2*p3+3*p2+p1+p0+q0+4)>>3
– a is predicted by (A + 2B + C + I + 2J + K + 4) >> 3
– b, e are predicted by (B + 2C + D + J + 2K + L + 4) >> 3
c f i are predicted by (C + 2D + E + K + 2L + M + 4) >> 3p ( p p p p q )
p’0=(2*p1+p0+q1+2)>>2p’1=p1p’2=p2
– c, f, i are predicted by (C + 2D + E + K + 2L + M + 4) >> 3
– d, g, j, m are predicted by (D + 2E + F + L + 2M + N + 4) >> 3
– h, k, n are predicted by (E + 2F + G + M + 2N + O + 4) >> 3
Ajou Univ. SOC Lab.MultimediaCommunications
p’2=p2 – l, o are predicted by (F + 2G + H + N + 2O + P + 4) >> 3
41 / 75
Proposed instruction
Packed Instruction
8-bit 8-bit 8-bit 8-bit8-bit 8-bit 8-bit 8-bit
8-bit 8-bit 8-bit 8-bit
Existing packed instruction Packed instructionExisting packed instruction Packed instruction required for H.264
Ajou Univ. SOC Lab.MultimediaCommunications42 / 75
Integer transform
Integer transform matrix
⎥⎥⎥⎥⎤
⎢⎢⎢⎢⎡
⊗
⎥⎥⎥⎥⎤
⎢⎢⎢⎢⎡
−−−−
⎥⎥⎥⎥⎤
⎢⎢⎢⎢⎡
⎥⎥⎥⎥⎤
⎢⎢⎢⎢⎡
−−−−
=⊗=2/2/4/2/4/2/2/2/
21112111
1121
11112112
1111
)( 22
22
22
abaabababbababaaba
XECXCY T
Operation flow of 4x4 integer transform21d
52b
21
≅≅=a⎥⎥⎦⎢
⎢⎣
⎥⎦
⎢⎣ −−⎥⎦
⎢⎣⎥⎦
⎢⎣ −− 4/2/4/2/11211221 22 babbab
Operation flow of 4x4 integer transform
x(0)
x(1)
X(0)
X(2)- -
x(0)
x(1)
X(0)
X(2)
-22
x(2)
x(3)
X(1)
X(3)
-
-
1/2
1/2-
- x(2)
x(3)
X(1)
X(3)
Ajou Univ. SOC Lab.MultimediaCommunications
-
1D Forward Transform 1D Inverse Transform43 / 75
Proposed instructions
fTRAN, iTRANForward /Backward Transform4 x 1 1D transform for 1 cycle 2 input operands, 1 output operandT d f 16 16 bl k d thTwo modes for 16x16 blocks and others
Operation AssemblyADD R0(0), R0(3), tmp0 ADD R0(1), R0(2), tmp1 SUB R0(1) R0(2) tmp2SUB R0(1), R0(2), tmp2 SUB R0(0), R0(3), tmp3 ADD tmp0, tmp1, R4(0)
R4 = fTRAN (R0, mode) - mode 1 : 16x16 - mode 2 : Others
ADD tmp2, tmp1<<1, R4(1) SUB tmp0, tmp1, R4(2) SUB tmp2, tmp1<<1, R4(3)
mode 2 : Others
Ajou Univ. SOC Lab.MultimediaCommunications
p , p , ( )
44 / 75
Performance comparisons
Deblocking filtering performanceLDW AX0, p r0= M(a0)
Edge Filtering
pLDW AX1, qLDW r1 #h’4LDW r2 #h’1LDW r3 #h’1222DOTPU4 r2, pDOTPU4 r3 q
( )r1=M(a1)r3=#h’4r4=hadd(r0:0011.0001)r5=hadd(r1:0111.0011)r4=hadd(r0:0011.0001)r5=hadd(r1:0111 0011)
Improves 20~25 % of deblockingFiltering
(66 %)
Others
DOTPU4 r3, qADD2 acc0,acc1ADD2 acc0, r1SHFL acc0 3PACK acc0STDW acc0
r5=hadd(r1:0111.0011)Acc0=r4+r5acc0=(acc0+r3)>>3M(a3)=acc0
Reduced 40 %
deblocking filtering performance
Integer transform performance
Others(34 %)
Deblocking filtering
15 instructions 9 instructions
64x Proposed Instruction
Reduced 40 %
TMS320c55x TMS320c55x TMS320c64x Proposed
Integer transform performance
SW HW SW ASIP
Required MIPS 12.8 2.8 1.0 1.2
Ajou Univ. SOC Lab.MultimediaCommunications
Novel Instructions and Their Hardware Architecture for Video Signal Processing [IEEE ISCAS 2005].ASIP Approach for Implementation of H.264/AVC [Journal of Signal Processing Systems, Jan. 2008]
45 / 75
VSIP implementation
Compare FPGA board, C / Matlab, Lisa simulator
Forward Integer Transform
loop #16 lpR0=M(AR0,2) - - - copy pixels to register R1=M(AR0,2)R2=M(AR0,2)R3=M(AR0,2)
loop #2 ftran - - - loopRF1=trans(RF0) - - - transpose 4 x 4 matrixR0=ftran(R4,1) - - - 1D integer transformR1=ftran(R5,1)R2=ftran(R6,1)R3=ftran(R7,1)R3 ftran(R7,1)
ftran: - - - ftran loop endnopnopM(AR1,2)=R0 - - - store pixels to memoryM(AR1,2)=R1M(AR1 2)=R2M(AR1,2)=R2M(AR1,2)=R3
lp:
VSIP ME Chi < VSIP MC Chi >
Ajou Univ. SOC Lab.MultimediaCommunications
< VSIP ME Chip> < VSIP MC Chip >
46 / 75
Further Research
ASIP for motion estimationAims to support various Motion Estimation (ME) algorithmsTry to find good balance between flexibility and performanceFunded by Samsung ElectronicsR h t iResearch topics
Reconfigurable Interconnection
Optimalprocessor
model
Reconfigurablearchitecture
Interconnectionbetween core andH/W accelerator
model
Development of ME ASIP Scalability of ME ASIP
Program
Ajou Univ. SOC Lab.MultimediaCommunications47 / 75
Programtemplates forME algorithms
Outline
What is ASIP? and Why ASIP?
SPOCS (Signal Processors for OFDM Communications)SPOCS (Signal Processors for OFDM Communications)SPOCS Architecture for FFT and Bit Manipulation
Performance Comparisons and Implementations
DASIP (Digital Audio Specific Instruction set Processor)Proposed Instructions and Coprocessor
Proposed Inverse Quantization Algorithm
VSIP (Video Specific Instruction set Processor)Proposed Instructions and Coprocessors
Performance Comparisons
Trends of recent ASIPsApplications of Low power ASIPs
ASIP design technologies
Conclusions
Ajou Univ. SOC Lab.MultimediaCommunications48 / 75
ASIP for CommunicationsMSC8156 Processor - Freescale semiconductor
FeatureProvide flexibility integration and cost efficient for next generationProvide flexibility, integration and cost efficient for next generation wireless communication standards (3G-LTE, WiMAX, eHSPA, TDD-LTE, etc)S pport req irements of the ne t generation base stationSupport requirements of the next generation base station
High speed processing and decreasing latencySupport high data rates with up-to-date OFMDA (Orthogonal Frequency Division Multiple Access) standard
CLASSCLASS
SC3850 DSP CORESC3850 DSP CORESC3850 DSP CORE
32 KB L1 32 KB L1
SC3850 DSP CORE
32 KB L1 32 KB L1 Dual RISC Processors
MAPLE-B
32 KB L1I-Cache
32 KB L1D-Cache
512 KB L2 Cache/M2 Memory
32 KB L1I-Cache
32 KB L1D-Cache
512 KB L2 Cache/M2 Memory
32 KB L1I-Cache
32 KB L1D-Cache
512 KB L2 Cache/M2 Memory
32 KB L1I-Cache
32 KB L1D-Cache
512 KB L2 Cache/M2 Memory
Dual RISC Processors
DFT/IDFT
Turbo/Viterbi
FFT/IFFT CRC
Ajou Univ. SOC Lab.MultimediaCommunications49 / 75
ASIPs for Multimedia (Video)
SSD1933 Multimedia Processor - Solomon SystechF tFeatures
Dual core architecture with ARM926EJ-S and AV-DSPHigh quality multimedia for mobile multimedia device, navigation system, mobile internet device
Standard I/O
Connectivity
Humanf
CPU Subsystem
ARM926D-Cache
I-Cache
MultimediaAcceleration
2D GraphicInterface
Systemcontrol
Memoryf
Multimedia Subsystem
AV-DSP3D-DMA
L1-Cache
Engine
Pre and PostInterface
MultimediaInterface SRAMPRISM
Processing
Ajou Univ. SOC Lab.MultimediaCommunications50 / 75
ASIPs for Multimedia (Audio)
ZSP800 processor – VeriSiliconF tFeatures
Support Z.Turbo accelerator – users can add instructions and acceleratorHigh-definition audio DSP incorporates innovative features to provide the right balance between silicon cost and processing
Ajou Univ. SOC Lab.MultimediaCommunications51 / 75
ASIPs for Multimedia (Audio)
Z.Turbo accelerator of ZSP processorF tFeatures
User-definable, user-configurableEnables user to add own accelerator or co-processor
Accelerates special functions without burdening the main DSP core
M d t ffi i t th j t kiMore power and cost efficient than just cranking up MHz or just adding more execution units
Customers can differentiate using own designs g gon top of ZSP architecture
Ajou Univ. SOC Lab.MultimediaCommunications52 / 75
ASIP for FECFEC ASIP - IMEC
FeaturesThe world’s first decoding of Turbo code and LDPC in one processorThe world s first decoding of Turbo code and LDPC in one processorUsing multiprocessor with several SIMD architectures shows high performance and energy efficiencyHandling Scrambling of LDPC and Interleaving of turbo code with rAGU (reconfigurable Address Generation Unit)
Input/output
Inputfifo
Outputfifo
Input/outputinterface
AGU1
AGU2
BackgroundMem bank
AGU1
AGU2
BackgroundMem bank
AGU1
AGU2
BackgroundMem bank
Shuffler Shuffler Shuffler
Rotation engine
Rot
atio
nsu
port
Aligned scratchpad
N-way SIMDpipline
VRF LIFO
ControlUnit
Program
SRF Aligned scratchpad
N-way SIMDpipline
VRF LIFO
ControlUnit
Program
SRF Aligned scratchpad
N-way SIMDpipline
VRF LIFO
ControlUnit
Program
SRF
Ajou Univ. SOC Lab.MultimediaCommunications
VRF LIFOmem
Control interface
VRF LIFOmem VRF LIFOmem
53 / 75
ASIPs for MPSOC systemAachen Univ. - T.G Noll team
Reconfigurable ASIP architecture using eFPGA (embedded FPGA)More application specific architecture than typical FPGAMore application specific architecture than typical FPGASmall area and low power architecture - Optimize arithmetic operation
Performance update using program language like HDLUsing configurable block, the performance closed to ASIC with low cost and time
I t ti C fi tiInstructionMemory
Configurationmemory
eFPGA
Control unit
register
ASIP core
Ajou Univ. SOC Lab.MultimediaCommunications54 / 75
ASIPs for MPSOC systemAachen Univ. - H. Meyr team
Reconfigurable ASIP architecture using CGRA (Coarse Grained Reconfigurable Architecture)Reconfigurable Architecture)CGRA
Include arithmetic, logical operation or specific processing element)inside coreInstead of FPGA CGRA implement system using architecture inside the coreInstead of FPGA, CGRA implement system using architecture inside the coreAlso the reconfigurable block is application specific block
Although flexibility of CGRA is less then flexibility of FPGA, we can develop fast with low cost using application specific CGRAdevelop fast with low cost using application specific CGRA
z
+resistera
z
>>
configurable
by
Ajou Univ. SOC Lab.MultimediaCommunications
g
CGRA – PE architecture55 / 75
ASIPs for MPSOC system
ASIP should be specialized for specific applicationASIP should be specialized for specific application
To optimize MPSOC systemTo optimize MPSOC system
Support the interface for communication among ASIPsSupport the interface for communication among ASIPs inside system
Guarantee compatibility among compilers
Need a low power architecture for mobile deviceNeed a low power architecture for mobile device
Ajou Univ. SOC Lab.MultimediaCommunications56 / 75
ASIP design technologies
Architecture Description Language (ADL) based designArchitecture Description Language (ADL) based designMaximize flexibility and efficiency, but significant design effortLISATek (CoWare), IP Designer (Target), ASIP Meister (ASIP S l ti I )Solutions, Inc.)
Configurable Processor CoresUse pre-designed and pre-verified coreEfficiency via custom instruction set extensionsEfficiency via custom instruction set extensionsXtensa (Tensilica), CorExtend (MIPS), Configurable cores ARC600, ARC700 (ARC)
Ajou Univ. SOC Lab.MultimediaCommunications57 / 75
ADL based ASIP designLISATek Processor Designer – CoWare
Language for Instruction-set Architectures (LISA) is powerful g g ( ) prepresentative of instruction-set languageGenerate complete set of SW development tools including optimizing C-Compiler and fast instruction-set simulatorp g p
Ajou Univ. SOC Lab.MultimediaCommunications58 / 75
ADL based ASIP designIP Designer – Target Compiler Technologies
Retargetable tool-suitable for ASIP designg gDefine ASIP architecture in the nML language (hierarchical and highly structured architecture description language)
Ajou Univ. SOC Lab.MultimediaCommunications59 / 75
ADL based ASIP designASIP Meister – ASIP Solutions, Inc.
Generate dedicated processor hardware descriptions and software development tools automatically based on target specificationsOperations of instructions can be defined easily using the Micro Operation description language provided by ASIP Meisterp p g g p y
Ajou Univ. SOC Lab.MultimediaCommunications60 / 75
Configurable ASIPsXtensa LX3 - Tensilica
Architecture16bit or 32 bit multiplier, single 16 bit MAC16bit or 32 bit multiplier, single 16 bit MACSupport multiprocessorAdapt multi-issue VLIW using FLIX (Flexible Length Instruction eXtensions) architectureSelectable 5-stage or 7-stage optional pipelineConfigurable over a wide range of pre-verified options
Ajou Univ. SOC Lab.MultimediaCommunications61 / 75
Configurable ASIPsXtensa LX3 - Tensilica
XPRES compiler – featureAnalysis C/C++ source code and a run-time application profile to automaticallyAnalysis C/C source code and a run time application profile to automatically suggest configuration settings and new instructionsProvide a useful starting point for further optimization by the designer
XPRES compiler – design flow
A li ti d f ti l
Xtensa Processor Generatorbuilds complete optimized
hardware block and tool-chainin minutes
C/C++ source code
TIE :Designer-Defined
Instructions
ProcessorConfiguration
Input
Application code or functionalspecification in full C/C++ language
Analyze thousands of possiblefi ti i i t XPRES Compiler
TIE :Designer Defined
TIE :TIE :
Instructions Input
Xtensa Processor Generator
processor configurations in minutes
Optimally tune TIE or combine Designer-DefinedInstructions
Designer-DefinedInstructions
TIE :Designer-Defined
Instructions Hardware (RTL) System Models CompleteSoftware Tools
p ywith manually generated or
automatically generated TIE.Select optimal configuration
Ajou Univ. SOC Lab.MultimediaCommunications62 / 75
Configurable ASIPsCorExtend - MIPS
FeaturesAllow SoC designers to add proprietary instructions and tightlyAllow SoC designers to add proprietary instructions and tightly coupled hardwareAs many instructions as an expert designer needs can be addedMIPS32@4KE, M4K, 4KSd Pro, MIPS32@24K Pro, 24KEMIPS32@34K Pro, MIPS32@74K, MIPS32@1004K
Ajou Univ. SOC Lab.MultimediaCommunications63 / 75
Configurable ASIPsConfigurable cores ARC600, ARC700 - ARC
FeaturesEnable designers to add features they need and remove featuresEnable designers to add features they need and remove features they do not need for their individual applicationOffer the flexibility to add instructions, registers, flags and condition codes creating processor that is highl t ned for specific applicationcodes, creating processor that is highly tuned for specific application
Ajou Univ. SOC Lab.MultimediaCommunications64 / 75
Evolution of ASIPsFuture of ASIPs
Higher PerformanceHigher Performance
ASIPASIP
ReconfigurableReconfigurable More specificMore specificapplicationapplication
Low power Low power consumptionconsumption
High FlexibilityHigh Flexibility
Ajou Univ. SOC Lab.MultimediaCommunications65 / 75
Outline
What is ASIP? and Why ASIP?
SPOCS (Signal Processors for OFDM Communications)SPOCS (Signal Processors for OFDM Communications)SPOCS Architecture for FFT and Bit Manipulation
Performance Comparisons and Implementations
DASIP (Digital Audio Specific Instruction set Processor)Proposed Instructions and Coprocessor
Proposed Inverse Quantization Algorithm
VSIP (Video Specific Instruction set Processor)Proposed Instructions and Coprocessors
Performance Comparisons
Trends of recent ASIPsApplications of Low power ASIPs
ASIP design technologies
Conclusions
Ajou Univ. SOC Lab.MultimediaCommunications66 / 75
Conclusions
Proposed three ASIPs for OFDM systems, Audio and Video combine high performance of ASIC and flexibilityVideo combine high performance of ASIC and flexibility of DSP
Smaller hardware size than existing DSPsSupport various standardsSupport various standards
ASIP Core for OFDM communication systemsSpecial instructions and hardware architectures for FFT and bit manipulationSupport various OFDM and DMT modem systemsSupport various OFDM and DMT modem systems
ASIP Core for AudioSpecial instructions for audio codingAccelerator for Huffman decodingAccelerator for Huffman decodingSupport various high quality audio codecs
ASIP Core for Video applicationsSpecial instructions for video codingSpecial instructions for video codingTwo coprosessors for ME/MC and VLCSupport various video Codecs
Ajou Univ. SOC Lab.MultimediaCommunications67 / 75
Implemented ASIPs
< SPOCS > < VSIP ME > < VSIP MC > < DASIP >
ASIP Specification
< SPOCS > < VSIP ME > < VSIP MC > < DASIP >
Library OperationFrequency
GateCounts
MemorySize
Remarks
SPOCS Sec 0.18㎛ 280MHz 107,000 12Kbyte -SPOCS Sec 0.18㎛ 280MHz 107,000 12Kbyte
VSIP HSI 0.25㎛ 160MHz 141,260 24Kbyte ME/MC hardware accelerator
DASIP Sec 0.18㎛ 200MHz 120,283 24Kbyte -
Ajou Univ. SOC Lab.MultimediaCommunications68 / 75
Implemented chips (1/2)
S fDSP forwireless
communication 40MH
MDSP (1st version)
30MHz
MDSP (2nd version)
60MHz
MDSP (3nd version)
60MHz40MHz 30MHz 60MHz
Multimedia DSP + Fixed Point DSP16 bits fixed point DSP
60MHz
Multimedia DSP Fixed Point DSPMobile multimedia communication
DCT(176 x 144) 168.64 fr/sBMA(352 240) 14 f /
16 bits fixed point DSPInstructions are
compatiblewith Motorola
DSP56100
Ajou Univ. SOC Lab.MultimediaCommunications
BMA(352 x 240) 14 fr/sDSP56100
69 / 75
Implemented chips (2/2)
PRML ReadDOCSIS 2.0 WLAN modem chip LMDSChannel FilterCable modem IEEE 802.11 DOCSIS
RS+Viterbi FEC Parallel image S DCME
Ajou Univ. SOC Lab.MultimediaCommunications
RS+Viterbi FECdecoder
gprocessor FFT processor
S-DCMERS decoder
70 / 75
DVB-S2 System Chip Design
ETRI – Ajou universityTRI Ajou university
SoC LabSoC Lab
Ajou Univ. SOC Lab.MultimediaCommunications
DVB-S2 Receiver System Description
Standard of Satellite Digital Video BroadcastingCharacteristics
channel adaptive transmitter algorithm using ACM(Adaptive coding and modulation) and VCM (Variable coding and modulations)Important 3 Signal processing blocks
S h i Ti i d f h i d d d l tiSynchronizer : Timing and frequency synchronizer and demodulationsFEC : Error detection and correctionMode de-adaptation : Packet header decoding
DVB S2DVB-S2synchronizer(Ajou univ.)
FEC(LDPC+BCH)
ModeDe-adaptation
MODCOD
ADC Video signal
ADC : Analog Digital ConverterMODCOD : The code of modulation method and code-rate.BCH : Bose-Chaudhuri-Hocquenghem multiple error correction binary block code
Ajou Univ. SOC Lab.MultimediaCommunications72 / 75
DVB-S2 Synchronizer Description
DVB-S2 Synchronizer Descriptions STR : Using Gardner AlgorithmSTR : Using Gardner Algorithm.Frame Sync : Adopt correlation schemes GDPDIFreq Sync : Coarse, fine and phase estimation.SNR Estimator : Using SNV algorithmsReed-Muller decoder : MODCOD DecodingDemapper : QPSK, 8PSK, 16APSK, 32APSK demodulations
STR AGC Frame Sync.
SNR EstimatorADC
frame done
y
Descrambler Freq. Sync
Demapper
frame done SNR
Sync.
Reed-Muller
STR: Symbol timing recoveryAGC: Automatic Gain ControllerGDPDI: differential generalized post detection integration
MODCOD
Ajou Univ. SOC Lab.MultimediaCommunications73 / 75
DecoderGDPDI: differential generalized post-detection integrationSNR: Squared Signal-to-Noise Variance
DVB-S2 Test Environments
Test EnvironmentsUse VCM (Variable coding modulation)
QPSK : code rate : 1/28PSK : code rate : 2/3
SNR : 6dBSample rate : 9Msymbol/sCarrier Frequency : 21Ghz.
Ajou Univ. SOC Lab.MultimediaCommunications74 / 75
DVB-S2 Test Movie
Ajou Univ. SOC Lab.MultimediaCommunications75 / 75
Papers and Patents list
Papers[1] L l it d d ME d i t l ti l ith f H 264/AVC[1] Low power complexity-reduced ME and interpolation algorithms for H.264/AVC,
Jour. of signal proc., 2009[2] SPOCS : Application specific signal processor for OFDM communication systems,
Jour of signal proc 2008Jour. of signal proc., 2008[3] ASIP Approach for implementation of H.264/AVC, Jour. Of signal proc., 2008[4] Novel intra prediction algorithm using residual prediction for low power
lti di d ISIC2009multimedia codecs,ISIC2009[5] Efficient integer motion estimation algorithm using sub-sampling, IEEE
ISOCC2009[6] Novel residual prediction scheme for hybrid video coding, IEEE ICIP2009[7] Novel frame selection methods for multi-reference motion estimation,
International Conference on Digital Signal Processing 2009[8] Efficient frame selection schemes for multi-reference and variable block Size
Motion Estimation, IEEE ICME2008[9] Novel fractional pixel motion estimation algorithm using motion prediction and
Ajou Univ. SOC Lab.MultimediaCommunications
fast search pattern, IEEE ICME2008
Papers and Patents list
Papers[10] I d f l t f M lti R f ti ti ti IEEE ISOCC2008[10] Improved frame selector for Multi-Reference motion estimation, IEEE ISOCC2008[11] Fast multiple reference frame selection method For H.264/AVC, IEEE WSPS2008[12] Fast full search motion estimation algorithm using MNPDS, IEEE ICEIC2008[13] Power efficient integrated motion compensator for MPEG and H.264/AVC, IEEE
SLPHSC2008[14] Three low power ASIP processor designs for communications, video, and audio [ ] p p g , ,
applications, DTIS 2007[15] An ASIP approach for H.264/AVC implementation having novel coprocessors,
SIPS 2007[16] Low power ASIC architecture optimization based on target application profiling,
IEEE SCS2007[17] Novel non-linear inverse quantization algorithm and its architecture for digital [ ] q g g
audio codecs, ISCAS 2007[18] VSIP : Implementation of video specific instruction-set processor, IEEE APCCAS
2006
Ajou Univ. SOC Lab.MultimediaCommunications
Papers and Patents list
Papers[19] VSIP: Video specific instruction set processor for H.264/AVC, IEEE SIPS 2006[20] ASIP approach for implementation of H.264/AVC, IEEE ASP-DAC2006[21] Efficient memory reuse and sub-pixel interpolation algorithms for ME/MC of [ ] y p p g
H.264/AVC, IEEE SIPS 2006[22] Efficient motion estimation accelerator for H.264/AVC, A-SSCC 2006[23] ASIP instructions and their hardware architecture for H.264/AVC, ISOCC 2005[23] ASIP instructions and their hardware architecture for H.264/AVC, ISOCC 2005[24] Implementation of application-Specific DSP for OFDM Systems, IEEE international
Symposium on Antennas and Propagation 2005[25] Application-specific DSP architecture for H 264/AVC ITC-CSCC 2005[25] Application-specific DSP architecture for H.264/AVC, ITC-CSCC 2005[26] Reconfigurable coprocessor for communication systems, ITC-CSCC 2005[27] Design of a high-quality audio-specific DSP core, Best Paper Award in IEEE SIPS
20052005[28] Novel instructions and their hardware architecture for video signal processing,
IEEE ISCAS 2005[29] I l i f li i ifi i l f hi h d OFDM
Ajou Univ. SOC Lab.MultimediaCommunications
[29] Implementation of application-specific signal processor for high-speed OFDM Systems, COOL Chips Ⅷ
Papers and Patents list
Papers[30] I l i f i l l i di DSP hi f bil li i J[30] Implementation of a wireless multimedia DSP chip for mobile applications, Jour.
of VLSI signal proc., 2005[31] Digital signal processor architecture with bit manipulation accelerator for
communication systems EURASIP JASP 2005communication systems, EURASIP JASP, 2005[32] Implementation of a wireless multimedia DSP chip for mobile applications, Jour.
of VLSI signal proc., 2005S f / C[33] ASIP Instructions and their hardware architecture for H.264/AVC, Journal
Semiconductor Technology and Science, 2005[34] Audio-Specific Signal Processor (ASSP) for High-Quality Audio Codec, A-SSCC
20052005[35] Implementation of application-specific signal processor for high-speed
communication systems, ISPACS 2004[36] Design of reconfigurable coprocessor for communication Systems, SIPS 2004[37] Implementation of application-specific DSP for OFDM systems, ISCAS2004[38] Design of new DSP instructions and their hardware architecture for high-speed
Ajou Univ. SOC Lab.MultimediaCommunications
FFT, Jour. of VLSI signal proc., 2003
Papers and Patents list
PatentsPatents[1] Computing Circuits and Method for Running an MPEG-2 AAC or MPEG-4 AAC
Audio Decoding Algorithm on Programmable Processors, US patents[2] Frequency error estimator and frequency error estimating method thereof US[2] Frequency error estimator and frequency error estimating method thereof, US
patents[3] Modulation apparatus using mixed-radix fast Fourier transform, US patents[4] Bit i l ti ti i it d th d i bl US[4] Bit manipulation operation circuit and method in programmable processor, US
patents[5] Apparatus and method for computing an FFT in a programmable processor,
European patentsEuropean patents [6] FFT operating apparatus of programmable processors and operation method
thereof, US/European patents[7]M d l ti t i i d di f t F i t f J t t[7]Modulation apparatus using mixed-radix fast Fourier transform, Japan patents[8] Computing circuits and method for running an MPEG-2 AAC or MPEG-4 AAC
audio decoding algorithm on programmable processors, Korea patents
Ajou Univ. SOC Lab.MultimediaCommunications
Papers and Patents list
PatentsPatents[9] Reducing decoding complexity method and devices for low density parity
check, Korea patents[10] Frequency error estimator and frequency error estimating method thereof[10] Frequency error estimator and frequency error estimating method thereof,
Korea patents[11] Frame synchronization circuit in DVB-S2, Korea patents[12] R f f l ti th d f lti f ti ti ti f[12] Reference frame selection method for multi-reference motion estimation of
high performance multimedia codec, Korea patents[13] S-DCME algorithm processing Methods and Circuits for Reed-Solomon
decoder Korea patentsdecoder, Korea patents
Ajou Univ. SOC Lab.MultimediaCommunications
Thank you!Thank you!
Ajou Univ. SOC Lab.MultimediaCommunications