The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)
-
Upload
igalia -
Category
Technology
-
view
48 -
download
2
description
Transcript of The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)
![Page 1: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/1.jpg)
The Orc Quest for BetterMultimedia PerformanceAdding Mips support to liborc
Guillaume Emont - Software EngineerIgaliaELCE 2014 - Monday, October 13 2014
![Page 2: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/2.jpg)
Some ContextWhat is SISD?
SISD: Single Instruction Single Data
lbu $2, variable1 ; $2 <- variable1 lbu $3, variable2 ; $3 <- variable2 addu $2, $2, $3 ; $2 <- $2 + $3 sb $2, variable3 ; variable3 <- $2
ASM
variable3 <- variable1 + variable2 PSEUDO-CODE
2/28
![Page 3: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/3.jpg)
Some ContextWhat is SIMD?
SIMD: Single Instruction Multiple Data
lw $2, array1 ; $2 <- array1[0..3] lw $3, array2 ; $3 <- array2[0..3] addu.qb $2, $2, $3 ; $2[i] <- $2[i] + $3[i] for i in 0..3 sw $2, array3 ; array3[0..3] <- $2
ASM
array3 <- array1 + array2 PSEUDO-CODE
3/28
![Page 4: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/4.jpg)
Some ContextSIMD use cases
colorspace conversion (e.g. YUV -> RGB, etc)image scalingblending two videos feeds togethervarious audio processing (volume, combine 2 sources, etc)
····
4/28
![Page 5: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/5.jpg)
Some ContextOrc
The Oil Runtime Compiler
OIL: Optimized Inner Loops
"Portable SIMD"
5/28
![Page 6: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/6.jpg)
Some ContextOrc
This ORC code
Will give you this C function
.function simple_addb .dest 1 d1 .source 1 s1 .source 1 s2 addb d1, s1, s2
ORC
void simple_addb (orc_uint8 * ORC_RESTRICT d1, const orc_uint8 * ORC_RESTRICT s1 const orc_uint8 * ORC_RESTRICT s2, int n);
C
6/28
![Page 7: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/7.jpg)
Some ContextOrc
$ orcc addb.orc --assembly --target mips -o addb-mips.s $ wc -l addb-mips.s 414 addb-mips.s
SH
7/28
![Page 8: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/8.jpg)
MIPS portMIPS DSPv2 ASE
DSPv2 ASE provides:
saturating arithmeticssimple SIMD instructions (32 bits registers)fixed point arithmetic
···
8/28
![Page 9: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/9.jpg)
MIPS portMIPS DSPv2 ASE
Port is only for the DSPv2 ASE, and not for:
MIPS SIMD Architecture (MSA)MIPS-3DMDMX
···
9/28
![Page 10: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/10.jpg)
MIPS portMIPS DSPv2 ASE
I used a MIPS32 74Kc
10/28
![Page 11: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/11.jpg)
MIPS portWhat's in a port?
simple instruction builderrulesprogram manager
···
11/28
![Page 12: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/12.jpg)
MIPS portWhat's in a port?
Simple instruction builder aka orcmips.{h,c}
Can generate assembly source code or binary code.
enum of available registersorc_mips_emit_<instruction>()some logic to handle labels and jumps
···
12/28
![Page 13: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/13.jpg)
MIPS portWhat's in a port?
Rules (orcrules-mips.c): convert ORC opcode to binary code.
Example:
void mips_rule_addb (OrcCompiler compiler, void user, OrcInstruction *insn) { int src1 = ORC_SRC_ARG (compiler, insn, 0); int src2 = ORC_SRC_ARG (compiler, insn, 1); int dest = ORC_DEST_ARG (compiler, insn, 0); orc_mips_emit_addu_qb (compiler, dest, src1, src2); }
C
13/28
![Page 14: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/14.jpg)
MIPS portWhat's in a port?
Program manager
Constructs the whole "program" ( = binary function)
push registers on stackload constants and parameters into registerstry to do some simple optimisations (load ordering)stride handlingcreates the many loops that are needed
·····
14/28
![Page 15: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/15.jpg)
MIPS portMany loops
Many loops:
2^(n-1) alignment cases with (n=number of arrays)in case array is big: loop unrollingarray end might not be aligned
···
15/28
![Page 16: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/16.jpg)
MIPS portMany loops
Alignment issues
This is only fine if $a1 is a multiple of 4
If not:
We want to be in the first case as much as possible.
lw $t0, 0($a1) ASM
lwr $t0, 0($a1) lwl $t0, 3($a1)
ASM
16/28
![Page 17: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/17.jpg)
MIPS portMany loops
.function simple_addb .dest 1 d1 .source 1 s1 .source 1 s2 addb d1, s1, s2
ORC
17/28
![Page 18: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/18.jpg)
MIPS portMany loops
Really:
.function simple_addb .dest 1 d1 .source 1 s1 .source 1 s2 .temp 1 t1 .temp 1 t2 loadb t1, s1 loadb t2, s2 addb t1, t1, t2 storeb s1, t1
ORC
18/28
![Page 19: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/19.jpg)
MIPS portMany loops
Everything aligned:
/ 0: loadb / lw $t7, 0($a3) / 1: loadb / lw $t6, 0($a2) / 2: addb / addu.qb $t6, $t6, $t7 / 3: storeb / sw $t6, 0($a1)
ASM
19/28
![Page 20: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/20.jpg)
MIPS portMany loops
If one is not aligned:
/ 0: loadb / lwr $t7, 0($a3) lwl $t7, 3($a3) / 1: loadb / lw $t6, 0($a2) / 2: addb / addu.qb $t6, $t6, $t7 / 3: storeb / sw $t6, 0($a1)
ASM
20/28
![Page 21: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/21.jpg)
MIPS portMany loops
Strategy:
Make sure we are aligned for at least d1One loop for each possible alignment configuration for (s1, s2)
··
21/28
![Page 22: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/22.jpg)
MIPS portMany loops
Loops we generate:
Then, loops for d1 aligned:
Finally:
byte by byte until d1 is aligned.·
s1 aligned, s2 not aligned.s1 not aligned, s2 aligned.everything aligned.neither s1 nor s2 aligned.
····
another byte by byte loop to finish processing if the arrays did not finish on analignment border for d1.
·
22/28
![Page 23: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/23.jpg)
MIPS portMany loops
Loops we generate:
Then, loops for d1 aligned:
Finally:
byte by byte until d1 is aligned.·
s1 aligned, s2 not aligned. Unrolled 8 timess1 not aligned, s2 aligned. Unrolled 8 timeseverything aligned. Unrolled 8 timesneither s1 nor s2 aligned. Unrolled 8 times
····
another byte by byte loop to finish processing if the arrays did not finish on analignment border for d1.
·
23/28
![Page 24: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/24.jpg)
MIPS portMany loops
Overall algorithm of generated code
1. Load parameters 2. Calculate number of iterations needed to have d1 aligned 3. Loop until d1 is aligned 4. Check alignment of the other array pointers 5. Go to loop corresponding to our alignment configuration 6. Iterate in said loop 7. Iterate in "left-over" loop
24/28
![Page 25: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/25.jpg)
MIPS portMany loops
Overall algorithm of generated code
Total: ~ 392
Add 22 lines of boilerplate and you know why it takes 414 lines of MIPS assemblyto efficiently add two arrays byte by byte.
1. Load parameters -> 5 lines 2. Calculate number of iterations needed to have d1 aligned -> 15 lines 3. Loop until d1 is aligned -> 15 lines 4. Check alignment of the other array pointers } 5. Go to loop corresponding to our alignment configuration } 22 lines 6. Iterate in said loop -> ~80 lines x 4 -> 320 lines 7. Iterate in "left-over" loop -> 15 lines
25/28
![Page 26: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/26.jpg)
Some conclusions
$ wc -l orcmips.{h,c} orcprogram-mips.c orcrules-mips.c 209 orcmips.h 961 orcmips.c 872 orcprogram-mips.c 733 orcrules-mips.c 2775 total
SH
26/28
![Page 27: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/27.jpg)
Some conclusionsA lot is already handled by the core of Orc:
register allocation(part of) label managementparameter passing conventiongeneration of wrapping function
····
27/28
![Page 28: The ORC Quest for Better Embedded Multimedia Performance Adding MIPS support to liborc (ELCE 2014)](https://reader034.fdocuments.us/reader034/viewer/2022051323/548090dd5806b5b85e8b4a1d/html5/thumbnails/28.jpg)
<Thank You!>http://www.igalia.com/
twitter @guijemontwww emont.org/bloggithub github.com/guijemont