Download - Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

Transcript
Page 1: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

IMPROVING $PORTPERFORMANCE ON $ARCH

PLATFORM-BASED PERFORMANCE TUNING OF WEBKIT(PORT=QT ARCH=MIPS74KF)

Embedded Linux ConferenceApril 29 — May 1, 2014

Adrián Pérez de Castro

Page 3: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

THE CHALLENGEMAKE A QTWEBKIT-BASED BROWSER USEABLE

ON LIMITED HARDWAREMIPS 74Kf @500 MHz

RAM: 256 MBNo GPU

Page 4: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

MIPS74KF“Classic” MIPS32

+FPU

+MMU

+DSP

Page 5: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

DSP?No. Not really a DSP.

SIMD instructions suitable for signal processing.

Page 6: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

CAN WE USE THIS TO IMPROVE PERFORMANCE?

CHALLENGE ACCEPTED

Page 7: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

THE PLANPROFILE → OPTIMIZE → VALIDATE

Page 8: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

WHAT TO OPTIMIZEVideo/audio decoding.

Image operations.

Page 9: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

WHERE TO OPTIMIZE?Can we improve the platform overall,

not just WebKit?

Yes!

QtWebKit uses the Qt drawing functions.

A/V decoding uses GStreamer, which uses Orc.

Good candidates for SIMD code.

Page 10: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

LIMITATIONSNo Valgrind.

No GDB.No perf.

No performance counters.

↓qemu + gdbserver.gperftools.

CLOCK_PROCESS_CPUTIME_ID

Page 11: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

ROLL YOUR OWN TOOLS(WITH HELP FROM EXISTING ONES)

Page 12: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

GNU HAMMER^WTIME!# Use full path to avoid using the shell's time builtin# One line per run with user/system time and page faults/usr/bin/time -a -o timings.txt \ -f '%U %S %F %x %C' $COMMAND

# For example, measuring the qtdemux GStreamer component/usr/bin/time -a -o timings.txt \ -f '%U %S %F %x %C' gst-launch -q \ filesrc=file.mp4 ! qtdemux ! video/x-h264 ! fakesink

Page 13: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

TIMINGBeware of CLOCK_PROCESS_CPUTIME_ID's resolution!#define CLOCK_MAX_RESOLUTION_DELTA (10000.0 * 1e-9)bool usePosixClock() { static bool checked = false; static bool useposix; if (!checked) { if (posixClockAvailable()) { double res_theorical = posixClockTheoricalResolution(); double res_empirical = posixClockEmpiricalResolution(); useposix = fabs(res_theorical - res_empirical) <= CLOCK_MAX_RESOLUTION_DELTA; } else { useposix = false; } checked = true; } return useposix;}

clock.cc

Page 14: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

WEBSNAP% g++ -DMAIN -o clock clock.cc% ./clockCLOCK_PROCESS_CPUTIME_ID is supportedResolution (advertised/empirical): 0.0000000010/0.0000002460sSampled resolution: 0.0000005470sPrinting the lines above took 0.0000483550s

% LD_PRELOAD=/usr/lib/libprofiler.so \ ./websnap http://igalia.com 1000 pprofLoading 100% Layout completedLoad successfullibprofile.so detected (0x7f77468e8f90, 0x7f77468e8fd0), output 'pprof'Profiling started, code: 0x1, timeout: 0PROFILE: interrupts/evictions/bytes = 634/537/22168http://igalia.com 1000 6.2709987870s

% mkdir out && ./runtests 1000 < urls.txt

github.com/aperezdc/websnap

Page 15: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

...AND BEYONDAd-hoc Python/Bash scripts:

Fix library paths in profiler output.Data munging.

Measurements comparison.Generate CSV files.Report generation.

Page 16: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

SOME RESULTS(DETAILED)

Page 17: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

LATIN-1→UTF16: V0// "dst" array (uint16_t*)// "src" array (uin8_t*)

while (len--) *dst++ = (uchar) *src++;

Page 18: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

LATIN-1→UTF16: V1; a0: "dst" array (uint16_t*); a1: "src" array (uint8_t *); a2: "len"

1: lbu t1, 0 (a1) addiu a2, a2, -1 ; len-- sh t1, 0 (a0) addiu a0, a0, 2 ; dst++ bnez a2, 1b addiu a1, a1, 1 ; src++

Page 19: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

LATIN-1→UTF16: V21: lw t1, (a1) ; t1 = ABCD ; ; TODO: extract bytes from t1 to t2/t3, padding ; them with zeroes: t2 = 0A0B, t3 = 0C0D ; addiu a1, a1, 4 ; src++ addiu a2, a2, -4 ; len-- sw t2, 0 (a0) sw t3, 4 (a0) bnez a2, 1b addiu a0, a0, 8 ; dst += 2

Page 20: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

LATIN-1→UTF16: V31: lw t1, (a1) ; t1 = ABCD srl t2, t1, 24 ; t2 = 000A sll t2, t2, 16 ; t2 = 0A00 sll t1, t1, 8 ; t1 = BCD0 srl t4, t1, 24 ; t4 = 000B or t2, t2, t4 ; t2 = 0A0B sll t1, t1, 8 ; t1 = CD00 srl t1, t1, 16 ; t1 = 00CD andi t4, t1, 0xFF00 ; t4 = 00C0 sll t4, t4, 8 ; t4 = 0C00 or t3, t1, t4 ; t3 = 0C0D addiu a1, a1, 4 ; src++ addiu a2, a2, -4 ; len-- sw t2, 0 (a0) sw t3, 4 (a0) bnez a2, 1b addiu a0, a0, 8 ; dst += 2

Page 21: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

LATIN-1→UTF16: V4; DSP instructions can unpack bytes directly :-)

1: lw t1, (a1) ; t1 = ABCD

preceu.ph.qbl t2, t1 ; t2 = 0A0B preceu.ph.qbr t3, t1 ; t3 = 0C0D

addiu a1, a1, 4 ; src++ addiu a2, a2, -4 ; len-- sw t2, 0 (a0) sw t3, 4 (a0) bnez a2, 1b addiu a0, a0, 8 ; dst += 2

Page 22: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

LATIN-1 → UTF-16

Page 23: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

ALPHA BLENDING

Page 24: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

UTF-16 STRICMP()

Page 25: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

RESULTS

Speedup histogram

Page 26: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

UP TO 30% FASTER RENDERINGThanks to:

Orc backend using MIPS DSP instructionsQImage composition operations

Color conversion (RGB16/888→ARGB32)Alpha premultiplication and blendingString conversions and comparisons

Page 27: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

UPSTREAM STATUSOrc backend complete upstream

Initial work based on Qt 4.8Most of the code is already in Qt 5.2

Rest in the next releaseNo backport to Qt 4.8

Page 28: Improving Performance of a WebKit Port MIPS Platform (ELC 2014)

THANK YOUFOR YOUR ATTENTION

perezdecastro.org+AdrianPerezDeCastro

@aperezdc