2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to...

38
Scanner: Ecient Video Analysis at Scale Will Crichton Alex Poms Pat Hanrahan Kayvon Fatahalian

Transcript of 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to...

Page 1: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Scanner: Efficient Video Analysis at Scale

Will Crichton Alex Poms

Pat Hanrahan Kayvon Fatahalian

Page 2: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Emerging space of video-driven applications

Page 3: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

CMU Panoptic studioCapture of 480 640x480 video streams 147 MPixel capture at 24 fps (3.5 GPixel/sec)

[Image Credit: Joo et al. 2015]

Page 4: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

3D human pose reconstruction40-second sequence (captured human social interactions) Reconstruction time: hand-coded solution by MS grad student — 7 hours on a 4-GPU machine [Cao 2016]

[Joo et al. 2015]

Page 5: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Example: human pose detection

▪ Attempt #1: do it in Python

▪ Attempt #2: import multiprocessing

▪ Attempt #3: rewrite it in C++

▪ Attempt #4: use multiple GPUs (6 month checkpoint)

▪ Attempt #5: scale across a cluster

Page 6: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Video application challenges

▪ Video is huge! - Requires compression, decode (efficient pixel delivery) - Varying access patterns - 3-25x blowup in size for explode to frames - 1000x vs. raw pixels

▪ Execution requires heterogeneous hardware

▪ Stateful/stenciled computation patterns

Page 7: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

No existing system well suited for the job

▪ Distributed data-analytics frameworks [MapReduce, Spark]

▪ Distributed machine learning frameworks [TensorFlow, MxNet]

▪ Array/raster databases [SciDB, RasDaMan, GIS databases]

Page 8: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Goal: design a system that makes it easy to rapidly author video analysis pipelines that scale.

Page 9: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Scanner: design goals / principlesDesign principle 1: keep it simple - Enable non-expert programmers - Focus on video applications

Design principle 2: be efficient - “Near-HW-peak single-node perf” then scale out - Utilize heterogenous hardware

Non goal: - Do not be a new kernel description language - Do not be a general purpose dataflow/streaming engine

Page 10: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Setupmyvideos/cam000.mp4myvideos/cam001.mp4myvideos/cam002.mp4…myvideos/cam479.mp4

I have a list of videos in a directory…

And I have a library of parallel pixel processing kernels for CPUs and GPUs:

Image crop/rescale (Halide)

Optical flow (OpenCV)

Video Tracker (CUDA)

Depth from disparity (Halide)

Eigen (C)

Caffe DNN EvalNVIDIA cuDNN

Face detection network

Human Pose estimation

Object detector

Depth/normal estimator

[Cao16]

[Redmon16]

[Bansal17]

[Hu17]

Page 11: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Represent videos as relations (tables)myvideos/cam000.mp4myvideos/cam001.mp4myvideos/cam002.mp4…myvideos/cam479.mp4

Scanner dataset: capture_session

table: cam000.mp4frame_id image

table: cam001.mp4frame_id image

table: cam479.mp4frame_id image

table: cam002.mp4frame_id image

Ingest into Scanner…

Page 12: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Computations: DAGs of image processing operations

Pipeline kernels map to heterogeneous resources: CPUs, GPUs, ASICs

......... ...Image Resize DNN Eval Estimate pose

Halide code Caffe lib C++ code

Resource: GPU

Resource: GPU

Resource: 4 CPU cores

640x480 frames

496x368 frames

62x46x15 activations

(x,y) pos for body parts

Page 13: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Scanner maps pipelines onto a stream of video frames from tables

Pipeline:

Input:

Output: pose_000

cam_000id frame

pose_000id 2dpose

taskPipeline: Input: cam_000 [0, 1, 2, 3, ...]Output: pose_000

0 1 2 3 ...

pose_000id 2dpose

task0 1 2 3 ...

pose_000id 2dpose

task0 1 2 3 ...

Page 14: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Sampling/joining input frames

cam_000id frame

pose_000id 2dpose

taskPipeline: Input: cam_000 [0, 2, 4, 6, ...]Output: pose_000

0 2 4 6...

cam_000id frame

depth_0_1id depth

taskPipeline: Input: cam_000 [0, 1, 2, 3, ...]

Output: depth_0_1 cam_001id frame

cam_001 [0, 1, 2, 3, ...]

0 1 2 3 ...

0 1 2 3 ...

Stride, gather, range, join, …

Stride by 2:

Join two tables:

cam_000id frame

pose_000id 2dpose

taskPipeline: Input: cam_000 [0, 2, 4, 6, ...]Output: pose_000

0 2 4 6...

cam_000id frame

depth_0_1id depth

taskPipeline: Input: cam_000 [0, 1, 2, 3, ...]

Output: depth_0_1 cam_001id frame

cam_001 [0, 1, 2, 3, ...]

0 1 2 3 ...

0 1 2 3 ...

Page 15: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Distributing tasks across a cluster

I/O

GPU HW Decoder

Resize Instance 0

DNN Eval Instance 0

Estimate Pose Instance 0

Node 0: multi-core CPU + GPU

GPU 0

Node 1: multi-core CPU + 2 GPUs

GPU 1

GPU 2

I/O

I/O

I/O

I/O

GPU HW Decoder

Resize Instance 1

DNN Eval Instance 1

Estimate Pose Instance 1I/O

I/O

I/O

GPU HW Decoder

Resize Instance 2

DNN Eval Instance 2

Estimate Pose Instance 2

Cluster

cam_000id frame

pose_000id 2dpose

taskPipeline: Input: cam_000 [0, 1, 2, 3, ...]Output: pose_000

0 1 2 3 ...

Page 16: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Demo

Page 17: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Video challenges

Page 18: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Storing video collections efficiently

table: cam_000frame_id frame

H.264 encoded bytestreamcam_000:frames

cam_000::meta0 864b 54 5478b 240 8194b

iframe # offset

Stored RepresentationLogical relation

Page 19: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Background: h264 Encoded Video

H.264 encoded bytestream

Stored Representation

cam_000:frames

Page 20: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Background: h264 Encoded Video

H.264 encoded bytestream

Stored Representation

GOP GOP G.. GOP GOP GOP

cam_000:frames

Page 21: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Background: h264 Encoded Video

H.264 encoded bytestream

Stored Representation

GOP GOP G.. GOP GOP GOPI P P B P

GOP

cam_000:frames

Page 22: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Background: h264 Encoded Video

H.264 encoded bytestream

Stored Representation

GOP GOP G.. GOP GOP GOPI P P B P

GOP

0 864b 54 5478b 240 8194b …. …

iframe # offset

cam_000:frames

cam_000::meta

Page 23: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Maintain keyframe index to assist parallel decode

frame 0 (keyframe)

byte 0

frame 120 (keyframe)

byte 4840

frame 270 (keyframe)

byte 6796

frame 340 (keyframe)

byte 12480

frame 310 (keyframe)

byte 11284

130 131 192 320

Page 24: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Kernels need more than one frame

▪ Goal #1: Temporal extent

▪ Goal #2: Sampling

▪ Goal #3: Batching

▪ Goal #4: Modularity

Page 25: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Two-level hierarchy on stream elements

Pipeline Instance 0

Pipeline Instance 1

span 0 frames 0-1023

span 1 frames 1024-2048

batch 0 batch 1

Pipeline batch size: Consecutive elements:

4 1024

Page 26: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Two-level hierarchy on stream elements

Pipeline Instance 0

Pipeline Instance 1

span 0 frames 0-1023

span 1 frames 1024-2048

batch 0 batch 1

Pipeline batch size: Consecutive elements:

4 1025

Warmup size = 1

Page 27: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Two-level hierarchy on stream elements

Pipeline Instance 0

Pipeline Instance 1

span 0 frames 0-1023

span 1 frames 1024-2048

batch 0 batch 1

Pipeline batch size: Consecutive elements:

4 1024

Warmup size = 1

cam_000id frame

pose_000id 2dpose

taskPipeline: Input: cam_000 [0, 2, 4, 6, ...]Output: pose_000

0 2 4 6...

Stride by 2:

Page 28: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Stenciling patterns Stencil

Stencil & stride

Stride & stencil

Page 29: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Stenciling patterns 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12 1 6 11

1 2 3 4 5 6 7 8 9 10 11 12 1 2 6 7 11 12

Page 30: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Solution: change kernel interface

1 2 3 4 5 6 7 8 9 10 11 12 1 2 6 7 11 12

1 6 6 111 2 3 4 5 6 7 8 9 10 11 12

Page 31: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Evaluation

Page 32: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Scanner has low overheadTh

roug

hput

HistogramCPU GPU

Optical Flow DNN Eval GatherCPU GPU CPU GPUGPU

1

0

ScannerBaseline

Single Node: Throughput Relative to Hand-Tuned Pipelines

Page 33: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Scaling to dense GPU nodesTh

roug

hput

(re

lativ

e to e

xper

t-tun

ed)

HistogramCPU GPU

Optical Flow DNN Eval GatherCPU GPU CPU GPUGPU

1

0

2

3

Scanner (4-GPU)

Page 34: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Multi-node scaling

Num nodes

Thro

ughp

ut (f

ps)

4K

8K

16K

12K

01 2 4 168 Num nodes

150

300

450

0Thro

ughp

ut (f

ps)

1 2 4 168

1.5K

3K

6K

4.5K

0

Num nodes1 2 4 168Num nodes

Thro

ughp

ut (f

ps)

4K

8K

16K

12K

01 2 4 168

Scanner (CPU only)Scanner (GPU only)

Histogram Optical Flow

DNN Eval

Page 35: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

3D pose reconstruction

Grad student hand-tuned:Scanner:Scanner on cluster:

2.6 hrs (4 GPU node)38 mins (4 node x 4 GPU/node cluster)

7 hrs (4 GPU node)

Approaching viability for marker less motion capture.

Processing 40 seconds of video from CMU Panoptic studio

Page 36: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Shot segmentation (cinematography analysis)

608 feature length films (2.4 TB) 103M frames Histogram-based shot segmentation of all films: 4.7 hrs (4 node cluster, 4 GPUs/node)

Page 37: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

What’s next?

▪ Scale out with Google Cloud

▪ 3 PB Internet Archive dataset

▪ Youtube 8M face database

▪ Esper frontend for Magic Grant

Page 38: 2017-04-18 Scanner PlatformLab Talks/Will_Crichton.pdfGoal: design a system that makes it easy to rapidly author video analysis pipelines that scale. Scanner: design goals / principles

Thank you!https://github.com/scanner-research/scanner