Extensible Processors
description
Transcript of Extensible Processors
Extensible Processors
2
ASIP
• Gain performance by: Specialized hardware for the whole application (ASIC).
− Almost no flexibility.− High cost.
Use special hardware for customized instructions in a GP processor Instruction set extension.
Application-specific instruction set processors, Customized to perform particularly well in a particular
application area. Can improve performance for particular problem instances
while maintaining the flexibility of the overall system.− Motivated by application-specific nature of embedded
systems.
3
ASIP
• Problems: Substantial non-recurring engineering costs
− Each new ASIP must be verified both from the functionality and timing perspectives.
− A new mask set must be created to fabricate the chip. − Software side: the compiler must be retargeted to each new
processor− Any hand-written libraries must be migrated to the new
platform. Automation of some of these tasks may be possible;
− however, the majority of this work is still a manual process.
Difficult to adopt a new ASIP despite the potential
advantages.
4
ASIP
• Advantages: System is post-programmable and can tolerate modest
changes to the application (little performance degradation)
− e.g., changes in standard. Computation intensive portions of applications from the
same domain (e.g., encryption) are often similar in structure.
− Customized instructions can often be generalized in small ways to make them more useful across a set of applications.
− Lowers the cost than ASIC.
5
ASIP
6
Xtensa Processor
• Xtensa from Tensilica [Gonzales00] A processor core which lets the system designer:
− select and size features for a given application,− define new instructions.
Designer can use standard ASIC design flow and tools to synthesize the processor.
− Xtensa is fully synthesizable. Tensilica processor generator adds the application-
specific functionality at the time the hardware is designed.
− Extensions are implemented in the same logic family as the rest of the processor.
− Cannot modify the extensions for other applications.
7
Xtensa Processor
• Designer specifies the characteristics in TIE (Tensilica Instruction Extension)
language and/or menus.− Number of physical registers,− Instruction cache size,− Data cache size,− Data RAM size,− External bus width,− Number of interrupts,− Extended instructions (functional units).
• Tools generate synthesizable RTL code for the processor, generate software development tools:
− ANSI C//C++ compiler,− Linker,− Assembler,− Code profiler,− Instruction set simulator.
8
Xtensa Processor
Designer can analyze and identify bottlenecks in application performance.
Can work around the bottlenecks. Can add instructions.
9
Xtensa Example
• Example: DES algorithm
10
Xtensa Example
• Characteristics: Extensive bit permutations:
− inefficient in software− efficient in hardware: simple renaming of wires
Rotation on 28-bit boundaries:− in software: rotation instruction on 32-bit boundaries
Table look-ups
• Added 4 instructions
11
Xtensa Speed-Up for DES
12
Xtensa Speed-Ups
• Speed-ups for some applications
13
Altera Nios, Xilinx MicroBlaze
• Soft extensible processor Can define custom instructions Can configure the processor Uses Altera FPGA resources
− Lower performance− Higher power consumption
14
Extensible Processors
• Major problems with ASIPs: Not flexible
− For a new application: new masks, other NRE costs. Large human effort required to identify and implement
an efficient set of instruction set extensions.
• Major problem with soft processors: Low performance
• Solution: A GP processor with reconfigurable FU.
15
Extensible Processors
• Custom Instruction (CI): Instructions in the extended Instruction Set
Architecture (ISA) Can be implemented in the processor's datapath itself
or as a separate co-processor.− Usually in the processor datapath.
A fragment of the program's dataflow graph mapped onto a hardware Custom Functional Unit (CFU).
• Basic block: A code fragment with single entry and exit points. Load/Store cannot be in the BB
− Cannot predict after how many clocks, the results are available to next instructions
16
Custom Instructions Limitations
1. Number of Operands: Imposed by base architecture of the core processor.
− Length of a custom instruction increases with increasing number of operands.
− Number of input and output ports to the register file the number of input and output operands
− cost and energy consumption of a processor increase significantly with increasing number of register file ports.
2. Number of custom instructions: Imposed by the format of the base ISA.
− If base ISA supports 26 instructions with fixed-length opcode 6 more CIs.
17
Custom Instructions Limitations
3. Area Important especially in embedded systems.
4. Control Flow: Custom instruction identification is typically performed
within basic block boundaries.− Assumption: compiler cannot exploit instructions that
cross basic block boundaries.
18
Instruction Set Extension (ISE)
• Automatic ISA extension generation consists of: Custom Instruction Identification
− Identifies patterns meeting certain topology requirements
Custom Instruction Selection− Selects the most important patterns under resource
and other constraints.
19
Automatic ISE
• To mimic the choices of an expert designer
• New concept of “Compiler”: Retargetable compiler:
− Maintaining a single piece of code for compiling to different machine targets:
− Reads underlying machine description, then produces code for it.
More automation:− Tuning the machine’s instruction set:− Compiler: defines the machine and then produces code
for it.
20
21
References
[Gonzalez00] R. Gonzalez, “Xtensa: a configurable and extensible processpr,” IEEE Micro, 2000.