Ultra Android: High Energy Efficiency Parallel Java ... CoolChipsXII-UltraAndroid... ·...

23
Ultra Android: High Energy Efficiency Parallel Java Objects Processing via Object Request Broker on Heterogeneous Multi-core Processor Takeshi Ohkawa* **, Yukoh Matsumoto**, Kenji Toda* * National Institute of Advanced Industrial Science and Technology (AIST) ** TOPS Systems Corp. 2009/4/15-17 @ Yokohama 2009/4/16 1 TOPS&AIST

Transcript of Ultra Android: High Energy Efficiency Parallel Java ... CoolChipsXII-UltraAndroid... ·...

Ultra Android: High Energy Efficiency Parallel Java Objects Processing via Object Request Broker on Heterogeneous Multi-core Processor

Takeshi Ohkawa* **, Yukoh Matsumoto**, Kenji Toda*

* National Institute of Advanced Industrial Science and Technology (AIST)

** TOPS Systems Corp.

2009/4/15-17 @ Yokohama2009/4/16 1TOPS&AIST

Acknowledgement

Joint Research ProjectTOPS and AIST

Portions of this study were supported by Industrial Technology Research Grant Program in 2007 from New Energy and Industrial Technology Development Organization (NEDO) of Japan.

Object-Oriented Embedded Software Platformfor Heterogeneous Multi-core Processor

Low-power Object Request Broker (ORB) engine for embedded systems

2009/4/16 2TOPS&AIST

CORBA on FPGA

Proposal

ANDOIRD is:◦ A software platform for Mobile phone◦ Proposed by Open Handset Alliance in 2007◦ Target platform: ARM and x86 (in 2009)

Ultra-ANDROID (Our Proposal) is:◦ A technology to reduce power-consumption (1/10) or enhance performance (x10) by employing Heterogeneous Multi-core Processor and Distributed-Object technology◦ Runs ANDROID application as it is

Portions of this page are reproduced from or modifications based on work created and shared by the Android Open Source Project and used according to terms described in the Creative Commons 2.5 Attribution License.

Ultra-ANDROID

2009/4/16 3TOPS&AIST

Android Software Architecture

Portions of this page are reproduced from work created and shared by the Android Open Source Project and used according to terms described in the Creative Commons 2.5 Attribution License.

LINUX

DalvikJavaVM

Java API

Java App

2009/4/16 4TOPS&AIST

Why choose ANDROID?

Target for Object-Oriented Software Platform for Heterogeneous Multi-core Processor◦ To ease the DIFFICULT multi-core programming◦ Programming Model = Object-Oriented (not C)ANDROID Application is written in Java◦ There are many Java developers in the world

What happens if the Java code runs very fast on hetero multi-core without modifying the code?

◦ Independent from Instruction Set ArchitectureNovel microprocessor architecture/Instruction set would be accepted -> BIG Impact

2009/4/16 5TOPS&AIST

Object distribution on cores– Communication via ORB Engine

Heterogeneous MulticoreSingle Core

Function level distribution

WebPage

Image

Image

WebPage

Each core has different Instruction Set Architecture and data structure to optimize processingUse common representation of data to communicate between cores

Letter

Display DCT

Display

DCT Letter

ORBEngine

6

CONCEPT Proposal

JavaObject

2009/4/16 TOPS&AIST

Key = ORB Engine + TOPSTREAM

“ORB Engine” is:◦ A light-weight CORBA implementation in C-lang. (NEDO)

Minimum functionality, small (12KB) and Std. alone◦ CORBA is:

Common Object Request Broker ArchitectureRemote method call between Objects via Message◦ In the context of Object Oriented SoftwareCan connect Java/C/C++/Python/..anything

“TOPSTREAM” is:◦ A Heterogeneous Multi-core Processor

Which operates at very low frequency (typ. 50MHz) for high energy-efficiencyRich inter-core communication resource◦ Concurrent Communication/Processing by multi-bank register

2009/4/16 7TOPS&AIST

2009/4/16 8TOPS&AIST

Proposal of OLP“Object Level Parallelism”

Another buzzwordOLP = DLP+TLP◦ DLP: Data Level Parallelism◦ TLP: Thread Level Parallelism

Because, Object = Data + Action (Thread=Execution)

92009/4/16 TOPS&AIST

Bottleneck of Parallel Computing

ComputationCommunication

2009/4/16 10TOPS&AIST

Measurement summary of various method of Inter Object Call on Android platform (ARM11 400MHz, Linux 2.6.25)

Call method LatencyMin.

Latency1K data Note

Localcall viainterface

0.02ms 0.2ms For reference

AIDL 2ms 12ms Android IDL

CORBAUDP/IP with PC

0.3ms -C lang.originalORB Engine

CORBAprotocol only

0.05ms -C lang.original ORB Engine

2009/4/16 11TOPS&AIST

Simple Scenario

Core 1

Core 2

Core 3

Core 4

Core 1

Core 2

Core 3

Core 4

Core 1

Core 2<<Specialized for B>>

Core 3<<Specialized for B>>

Core 4<<Specialized for C>>

Core 5

Core 6

Core 7

Core 8

SwitchBox

SwitchBox

SwitchBox

Core 1

Single Homo4 Homo8 Hetero4

2009/4/16 12TOPS&AIST

Example Pseudo Java Code

Data processing(Data first) {Data second = taskA.setup(first);Data third = taskB.heavyCalc(second);return taskC.summarize (third); };

A B C

2009/4/16 13TOPS&AIST

Inter-core object call modelData Parallel–Homogeneous multicore

Core 1 Core 2 Core 3

A

B2

B3B

1

C

Core 4

B4

TimeLatency=Throughput

A

B1

B2 B

3

B4

C

Observation: parallelization works fine sometimes

2009/4/16 14TOPS&AIST

Inter-core object call modelData Parallel–Homogeneous multicore

Core 1 #2 #3

A

C

#4Time

Latency=Throughput

#5 #6 #7 #8

B2

B3B

1

B4

B5

B6

B7

B8

Observation: Many-core causes communication bottleneck

2009/4/16 15TOPS&AIST

Heterogeneous Multi-core Setting

Task Cycles ParallelismA 100 sequentialB 2000 (87%) parallelC 200 mixedTotal 2300

Core# Specializedfor

Operation Per Cycle(Speedup)

1 <generic> 12 Task B 10x only for task B3 Task B 10x only for task B4 Task C 2x only for task C

Configuration of theparallel object tasks used in the estimationof parallel efficiency

Configuration of the Heterogeneous four cores used in the estimation of parallel efficiency

2009/4/16 16TOPS&AIST

Inter-core object call modelTask Parallel - Heterogeneous

Core 1 Core 2 Core 3

A

B1

B2

C

Core 4Time

Latency=Throughput

<<for B>> <<for B>> <<for C>>Observation: Heterogeneous reduces computation cycles

2009/4/16 17TOPS&AIST

Inter-core object call modelwith Object MigrationHeterogeneous

Core 1 Core 2 Core 3

A

B1

B2

C

Core 4Time

Latency=Throughput

<<for B>> <<for B>> <<for C>>Observation: Object Migration reduces communication

2009/4/16 18TOPS&AIST

Inter-core object call modelwith Object MigrationHeterogeneous + Pipelining

Core 1 Core 2 Core 3

A

B1

B2

C

Core 4Time

Latency

<<for B>> <<for B>> <<for C>>

A

B1

B2

C

Observation: OM enables pipelining without modifying code

2009/4/16 19TOPS&AIST

Throughput

Homogeneous vs. Heterogeneous Multi-core Processors

Latency

Observation: Homo many core cause communication bottleneck

Speedup

Total computation cycles = 2300Task size = 100, 2000(500, 250), 200

Communication =1/10 Computation

Communication =1/10 Computation

2009/4/16 20TOPS&AIST

Pipeline Processingby Object Migration

Effective Latency Speedup

Total computation cycles = 2300Task size = 100, 2000(500), 200

Communication =1/10 Computation Communication =

1/10 Computation

Observation: Pipeline max performance without modifying code

2009/4/16 21TOPS&AIST

Conclusion

1. The method call latency dominates the results. Increasing the number of cores causes an increase in communication delay

2. IPC has greater impact than the number of cores◦ IPC: Instruction Per Cycle◦ Heterogeneous multi-core: Specialized Instruction Set

3. Pipeline parallelization is effective and object migration enables pipelining

Ultra-ANDROID technology enables 10x energy efficiency than ANDROID, by Heterogeneous Multi-core Processor and Distributed Java Object

2009/4/16 22TOPS&AIST

Cool System Innovator

Ultra-ANDROID

2009/4/16 23TOPS&AIST