Image, Vision, and AI Processor For Mobile Devices Google's Fully … · 2018-08-20 · Pixel...

Pixel Visual Core:Google's Fully Programmable Image, Vision, and AI Processor For Mobile DevicesJason Redgrave, Albert Meixner, Nathan Goulding-Hotta, Artem Vasilyev, and Ofer Shacham

1

Motivation: The Vision

2

No HDR+ HDR+

Software Motivation

3

● Imaging, Vision, and AI are fast evolving fields● Productizing new algorithms is difficult

○ ASIC design adds long delay into Research→Product cycle○ CPU efficiency limits complexity in mobile space○ Reliance on external vendors limits changes to the stack

● Software wants control, flexibility, and efficiency

Hardware Motivation

4

CPU

Ene

rgy

per O

p (p

J/O

p)

Performance (Op/sec)

GPU

ASICPVC’s IPU

10-20x

10-20x

10x

Not-programmable<1 pJ/Op>1 Tera Ops/Sec

Programmable>100 pJ/Op~1-10 Giga Ops/Sec

Programmable Image Processing Unit (IPU)<1 pJ/Op>1 Tera Ops/Sec

Domain-specific Software

● High-level programming model is Halide○ Domain-specific language for Image Processing

● IPU supports a subset of Halide language○ No floating point○ Limits on available memory access patterns

● Halide backend generates kernels and all API calls○ Proprietary API for resource allocation and execution control

5

Compilation

● Halide backend generates high-level, virtual ISA (vISA)○ RISC ISA with streaming-friendly memory model○ Cross-generational & Architecture-independent

● Final compilation into physical ISA (pISA)○ Compilation can be offline or on-device○ Generation-specific VLIW ISA○ All memory movements are explicit (no caches)

6

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem7

Conceptual View of Hardware: DAG of Kernels

LineBufferLineBuffer

DRAM

2D Stencil Processor

LineBuffer

LineBuffer


LineBuffer



DRAM

Camera

LineBuffer

LineBuffer

● Many cores, each fully programmable● Configurable DAG topology

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Hardware resources view

8

Line Buffer 0

Line Buffer 1

Line Buffer 2

Line Buffer N

From DRAM

To DRAM

Stencil processor 0

Shift Reg Shift Reg

Bus Ifc

Stencil processor M-1

Shift Reg Shift Reg

Bus Ifc


Pixel Visual Core Architecture

9

IPU Core 2

IPU Core 1

IPU Core 4

IPU Core 3

IPU Core 6

IPU Core 5

IPU Core 8

IPU Core 7

PCIe

MIPIA53

LPD

DR

4

IPU IO Block● A53● LPDDR4● MIPI● PCIe● IPU

DRAM IPU SoC

Chip Specs: TSMC 28nm 6.0 x 7.2 mm 426 MHz 512 MB DRAM Power <4500 mW


IPU Architecture

10

● 8x Cores○ STP○ LBP

● I/O○ DMA○ MMU○ SSP (Buffer)

● Ring NoC


Stencil Processor (STP)

11

● Scalar lane○ Instruction RAM

● SHG (Load/Store)● 2D Array

○ 256 Compute lanes○ 144 Halo lanes○ Shared RAMs○ Shift network


Compute Lane

12

● Single cycle!● Dual 16b ALU● 16b x 16b Multiply-Add● Memory

○ Register File○ Shared RAM (L0)


Read Neighbor Network

13

● Output pixel depends on neighboring inputs

● One vertical or horizontal shift per cycle

● Shifts are circular (toroidal)

● 1/2/3/4 hops in each direction

ALU ALU

ALU ALU


Line Buffer Pool (LBP)

14

● Data storage● Synchronization● 2D Line Buffer abstraction

(2D FIFO)


Virtually Tall Line Buffer

15

Sliding buffer memory (sb_memory)

extended fullwidth buffer

fb_rows

sb_rows

sb_cols

img_width

Read Sheet

full width buffer memory (fb_memory)

fb_rows

Reused from last pass

Will be reused in the next pass

Used only by the current pass

● Memory saving over full buffer

● Elasticity set by SB width● FB height set by kernel

reuse

Results

16

Align Merge Finish

2.8x 2.8x

5.8xPerformance

Align Merge Finish

6.9x 7.1x

16.7xEnergy Efficiency

7-16x more energy-efficient despite 3-generation process gap!

Mobile SoC (10nm) Pixel Visual Core (28nm)

17

T.J. Alumbaugh, Ke Bai, Fabrizio Basso, Trevor Bunker, Dinesh Chandrasekaran, Ed Chang, Robert Chapman, Arun Chauhan, Albert Chen, Chien-Yu Chen, Matt Cockrell, Jamison Collins, Joe Dao, Neeti Desai, Dusty DeWeese, Andrea Di Blas, Daniel Finchelstein, Arnd Geis, Filippo Gioachin, Richard Goe, Nathan Goulding-Hotta, Ben Gribstad, Cheng Gu,

Penny Gur, Nico Hailey, Ashok Halambi, Adam Hampson, Mahdi Hamzeh, Vince Harron, Sam Hasinoff, Zhijun He, Viresh Hingarh, Sean Howarth, Richard Hsu, John Keen, Asif Khan, Ji Yun Kim, Timothy Knight, Victor Leung, Marc Levoy, Yuan

Lin, Joshua Litt, Chenjie Luo, Bill Mark, Albert Meixner, Jon Michelson, Rob Mohr, Michael Moreno, Ben Mossawir, Rolf Mueller, Sean O'Boyle, Sarang Padalkar, Hyunchul Park, David Patterson, Alexander Perez, Karthika Periyathambi, Bob

Pflederer, Theodore Popp, Todd Poynor, Jing Pu, Shahriar Rabii, Ramesh Ramarao, Jason Redgrave, Masumi Reynders, Shac Ron, Sabarish Sankaranarayanan, Vaidyanathan Seetharaman, Ofer Shacham, Dillon Sharlet, Sean Silva, Saravana Soundararajan, Don Stark, Eino-Ville Talvala, Pei Zhao Tang, David Tolnay, Michelle Tomasko, Artem Vasilyev, Jaakko

Ventela, Nate Voorhies, David Warren, Kevin Watts, Huachang Xu, Catherine Yu, Rong Zhou, Qiuling Zhu

Questions?

18

Image, Vision, and AI Processor For Mobile Devices Google's Fully … · 2018-08-20 · Pixel...

Documents

Transcript of Image, Vision, and AI Processor For Mobile Devices Google's Fully … · 2018-08-20 · Pixel...