1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM...

14
1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research

Transcript of 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM...

Page 1: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

1

NoMali: Understanding the Impact of Software Rendering Using a Stub

GPUAndreas Sandberg

ARM Research

Page 2: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

2

What can be modeled in gem5?Subsystems that can be modeled in gem5 No GPU model!

Note: gem5 models the subsystems above, not the actual products.

Simple models exist

Page 3: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

3

What a real system does

Modern mobile systems contain a GPU

Even watches have GPUs nowadays! The GPU is obviously used for 3D … but also used for 2D:

Composition & alpha blending Rotation & scaling

Driver stack is complex: Easily 100k+ lines of code Contains an optimizing compiler Can contribute to around 10M

instructions/frame for complex workloads

CPU

L1D

L1I

LPDDR3

GPU

Android

Workload

CPU

L1D

L2

L1I

Display Controller

GPU drivers

Page 4: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

4

What we normally model

Software renderer instead of a real GPU

Optimization friendly code Can be vectorized Easy-to-predict branches Large memory foot print

Workload and software renderer compete for resources

Can significantly skew core behavior

Affects 2D applications and 3D applications

CPU

L1D

L1I

LPDDR3

Android

Workload

CPU

L1D

L2

L1I

Display Controller

SW Renderer

Page 5: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

5

What about simulating the GPU?

Pros: Golden reference – everything looks

like a real system! Captures memory system

interactions Graphics output

Cons: GPU models add a lot of simulation

overhead Realistic models not available to the

research community

Solution: Don’t simulate the GPU!

CPU

L1D

L1I

LPDDR3

GPU

Android

Workload

CPU

L1D

L2

L1I

Display Controller

GPU drivers

Page 6: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

6

Introducing NoMali

Looks like a GPU Provides the same register interface Simulates interrupts

Runs the full driver stack Pretends to run rendering jobs

Doesn’t render anything Signals job completion immediately

Available to the community

… but doesn’t produce any display output

CPU

L1D

L1I

LPDDR3

NoMali

Android

Workload

CPU

L1D

L2

L1I

Display Controller

GPU drivers

Page 7: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

7

Mali GPU overview

CPU

SharedMemory

MidgardHardware

Application

Driver

Job Descriptor

s

Input Data

Output Data

Display Processor

Shader Cores

Intermediate Data

Job Manager

Registers &Interrupts

Memory

See AnandTech for a good architecture overview

Page 8: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

8

Mali GPU overview: The Job Manager

Abstracts the underlying µ-architecture

Controls most aspects of the GPU: Job scheduling Interupts Address translation Caches …

Job submission through a register interface

Job parameters in main memory: Job Descriptor

Interrupt on job completion

Application

Driver

Job Descriptor

s

Input Data

Job Manager

Page 9: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

9

NoMali overview

CPU

SharedMemory

MidgardHardware

Application

Driver

Job Descriptor

s

Input Data

Output Data

Display Processor

Shader Cores

Intermediate Data

Job Manager

Registers &Interrupts

Memory

No output data

No rendering

NoMaliJob

Manager

Blank screen

Page 10: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

10

Comparing simulation strategies

In-struc-tions

IPC BP Miss Ratio

DL1 Miss Ratio

IL1 Miss Ratio

L2 Miss Ratio

DRAM Read BW

0%

10%

20%

30%

40%

50%

Relative Error

Software Rendering NoMali

Three experiments: Software rendering, NoMali, GPU

reference

Experimental setup: Android 4.4 (KitKat ) BBench Identical disk images in all

experiments

Software rendering results in useless CPU performance

NoMali is within 2% for CPU performance

… and 35% faster!

73%103% 135% 54%

Page 11: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

11

Model limitation: GPU bandwidth

In-struc-tions

IPC BP Miss Ratio

DL1 Miss Ratio

IL1 Miss Ratio

L2 Miss Ratio

DRAM Read BW

0%

10%

20%

30%

40%

50%

Relative Error

Software Rendering NoMali

GPU memory system interactions not simulated

Could potentially be faked using traffic generators

Absolute difference for bbench ~75 MiB/s

Not likely to be a problem for CPU-centric studies

Graphics workloads would experience a larger impact from the GPU

NoMali was never intended for that use case.

73%103% 135% 54%

Page 12: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

12

gem5 Issue: inefficient uncacheable memory

Mali Midgard-series GPUs are IO coherent

GPU snoops into CPU caches CPUs can’t snoop into the GPU’s

caches

The driver disables caching for many regions used by both the GPU and CPU

Wasn’t handled efficiently by gem5

Uncacheable accesses were always strictly ordered

Resulted in CPIs ~50 (should’ve been ~2)

Fix committed in early May 2015

CPU

L1D

L1I

LPDDR3

Mali GPU

Android

Workload

CPU

L1D

L2

L1I

GPU drivers

GPU Memory System

Page 13: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

13

NoMali Model available on GitHub https://github.com/ARM-software/nomali-model

gem5 integration on Review Board [RB2867, RB2869]

Requires drivers Will make use of drivers available from MaliDeveloper Requires a recent Android version (KitKat or LolliPop)

Android KitKat build instructions will be on the Wiki shortly

gem5 integration plan

Page 14: 1 NoMali: Understanding the Impact of Software Rendering Using a Stub GPU Andreas Sandberg ARM Research.

14

Questions?