Video Analytics on Hadoop webinar victor fang-201309

40
A NEW PLATFORM FOR A NEW ERA

description

Video Analytics on Hadoop webinar. Presented by Pivotal Data Science Team 201309.

Transcript of Video Analytics on Hadoop webinar victor fang-201309

Page 1: Video Analytics on Hadoop webinar victor fang-201309

A NEW PLATFORM FOR A NEW ERA

Page 2: Video Analytics on Hadoop webinar victor fang-201309

2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved.

What You Can Do With Hadoop Webinar Series

Unstructured Data – Video AnalyticsSeptember 6, 2013

Dr. Chunsheng (Victor) Fang, Sr. Data ScientistAnnika Jimenez, Global Head of Data Science ServicesNikesh Shah, Sr. Product Marketing Manager

Page 3: Video Analytics on Hadoop webinar victor fang-201309

3© Copyright 2013 Pivotal. All rights reserved.

What You Will Learn

Pivotal Data Science Lab Services

New Emerging Trends for Unstructured Data 

Video Analytics on Hadoop

Analytics with SQL

Page 4: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Pivotal Platform

Cloud Storage

Virtualization

Data & AnalyticsPlatform

CloudApplication

Platform

Data-DrivenApplication

Development

Pivotal Data Science Labs

Page 5: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Pivotal Data Science

Page 6: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Data Science Value Chain

Instrumen-tation

Logs Capture Store

Transform and

PrepareAccess Model

Development Deploy Applications Process Change

Product Engineer

Platform Engineer DBA

Data Engineer/Program

mer

Data Engineer Data

Scientist

Platform Engineer

Application Developer

PMO

Page 7: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

How We Help Our Customers

1. Data Science Strategy Definition

2. Point Proof-of-Value Model Development

3. Multiple Model Development + Apps

4. DSIC Transformation to “Predictive Enterprise”

5. Also:– Algorithm development– Pushing the envelope in problem-solving

Pivotal Data Science Labs

Page 8: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Pivotal Data Science Knowledge Development

Page 9: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Pivotal Data Science Dream Team Derek Lin – Network Security, Fraud Detection, Speech and Language

Processing, (Principal Scientist at RSA, M.S. in Signal Processing, USC) Hulya Farinas – Optimization, Resource Allocation in Healthcare

(Modeler at M-Factor, IBM, Ph.D. in Operations Research, University of Florida)

Kaushik Das – Mathematical Modeling in Energy, Retail and Telco(Director of Analytics at M-Factor, M.S. in Mineral Engineering, UC Berkeley)

Sarah Aerni – Genomics and Machine Learning (Ph.D. in Biomedical Informatics, Stanford)

Mariann Micsinai – Next Generation Sequencing (Market Risk Management Associate at Lehman Brothers, Ph.D. in Computational Biology, NYU and Yale)

Victor Fang – Imaging and Graph Analytics, Machine Learning (Sr. Scientist at Riverain Medical, SDE at Amazon.com, Ph.D. in Computer Sciences, University of Cincinnati)

Emily Kawaler – Clinical Informatics and Machine Learning (M.S. in Computer Sciences, University of Wisconsin-Madison)

Anirudh Kondaveeti – Trajectory Data Mining and Machine Learning (Ph.D. in Computing & Dec. Systems Eng, Arizona State University)

Hong Ooi – Insurance and Finance Risk Modeling (Statistician at ANZ, Ph.D. in Statistics, Australian National University)

Michael Brand –Text, Speech and Video Research for Retail, Finance and Gaming (Chief Scientist at Verint Systems, M.S. in Applied Mathematics, Weizmann Institute)

Kee Siong Ng – Data Mining in Healthcare (Sr. Data Miner at Medicare Australia, Ph.D. in Computer Science, and Postdoctoral Fellow, Australian National University)

Noelle Sio – Digital Media Analytics and Mathematical Modeling(Sr. Analyst at eHarmony, Fox Interactive Media (Myspace), M.S. in Applied Mathematics, Cal Poly Pomona)

Jin Yu – Stochastic Optimization, Robust Statistics in Machine Learning, Computer Vision (Research Associate at U of Adelaide, Ph.D. in Machine Learning, Australian National University)

Rashmi Raghu – Computational Methods and Analysis (Ph.D. in Mechanical Engineering, Stanford)

Woo Jung – Bayesian Inference and Demand Analysis (Sr. Statistician at M-Factor, M.S. in Statistics, Stanford)

Jarrod Vawdrey – Marketing Analytics & SAS (Analytics Consultant at Aspen Marketing, B.S. in Mathematics, Kennesaw State University)

Niels Kasch – Text Analytics and NLP (Ph.D. in Computer Science, UMBC)

Vivek Ramamurthy – Online Learning, Stochastic Modeling, Convex Optimization (Ph.D. in Operations Research, UC Berkeley)

Srivatsan Ramanujam – NLP and Text Mining(Natural Language Scientist at Sony, Salesforce.com, M.S. in Computer Sciences, UT Austin)

Alexander Kagoshima – Time Series, Statistics and Machine Learning (M.S. in Economics/Computer Science, TU Berlin)

Page 10: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Data Science Labs: Packaged Services

LAB PRIMER(2-Week Strategy)

• Customized Analytics Roadmap

• 1-day Moderated Brainstorming Session

• Prioritized Opportunities

• Architectural Recommendations

LAB 600(6-Week Lab)

• Prof. Services(Data Load)

• Data Science Model Building

• Project Management

• Ready-to-DeployModel(s)

LAB 1200(12-Week Lab)

• Prof. Services(Data Load)

• Data Science Model Building

• Project

• Management

• Ready-to-DeployModel(s)

LAB 100(2-Week Lab)

• On-site PivotalAnalytics Training

• Rapid Model/InsightBuild on CustomerData(2 weeks)

Page 11: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Approach: Data Science Lab 1200Week

1 2 3 4 5 6 7 8 9 10 11 12

Data Exploration

Features Building

Model Development

Code QA and Scoring

Model Optimization& Validation

Data Loaded

InsightsPresentation

Training

PreliminaryModel Review

Feature ReviewData Review

Documentation

Page 12: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Program Management Data Architecture and Engineering Data Scientists Training and Skills

Development

Facilitate data loading processes from source systems to Pivotal Data Fabric

Coordinate data needs with Data Scientists

Best practice education for analytics performance

Data migration to support new applications

Oversight and communication plans

Organizational alignment

Risk mitigation

Resource planning

Prioritize deliverables

Socialize progress of overall initiative

Instill data collaboration culture

Execute Data Science Lab engagements around revenue generation or cost saving efforts

Hands on education with new data analysis techniques

Introduce new analytics tools and methodologies

Identify candidates for deeper data science training

Create training curriculum

Recruiting Methodology

Parallel computing techniques defined and demonstrated

Build institutional knowledge for client data science team

Data Science Innovation Center (DSIC)Key Principles• Building a predictive enterprise is, first and foremost, about building a human infrastructure.• Analytics is an iterative knowledge discovery process and needs to be managed as such.• Discovery starts from asking the right questions – that can be as important as finding

answers to those questions.

Page 13: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved.

Large Scale Video Analytics Platform on HadoopDr. Chunsheng (Victor) Fang, Sr. Data Scientist

Page 14: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Pivotal Video Analytics Taskforce Chunsheng (Victor) Fang, Ph.D.

– Sr. Data Scientist

Regunathan Radhakrishnan, Ph.D.– Sr. Data Scientist

Derek Lin, – Principal Data Scientist

Sameer Tiwari– Hadoop Architect

Kenneth Dowling & Michael Nemesh– DCA Admin

Page 15: Video Analytics on Hadoop webinar victor fang-201309

16© Copyright 2013 Pivotal. All rights reserved.

Industry Use CaseSurveillance Video Anomaly Detection

Page 16: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Anomaly Detection in Surveillance Video Detect anomalous objects in a restricted perimeter.

Typical large enterprise collects TB’s video per day.

Hadoop MapReduce runs computer vision algorithms in parallel and captures violation events.

Post-Incident monitoring enabled by Hadoop / HAWQ.

Page 17: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Unstructured Video Data Workflow

Unstructured data as input

ETL: Distributed Video Transcoder

Analytics: Distributed Video Analytics

Structured Insights in relational database for advanced analytics

ETL AnalyticsUnstructured

DataStructured

Insights

Page 18: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Real World Video Data Benchmark Surveillance Videos (i-LIDS) from United Kingdom Home

Office– Library of HiDef CCTV video footage based around ‘scenarios’ central to the

government’s requirements. – The footage accurately represents real operating conditions and potential threats.

Anomaly Detection: Sterile zone dataset

Night Day

Page 19: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Most Common Video Standards

MPEG & ITU: responsible for many video standards

MPEG-2 (1995): Widely adopted, DVDs, Digital TV broadcast, set-top boxes

Page 20: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Intro to MPEG Standard MPEG standard encodes video frames

– Redundancy in time: inter-frame encoding– Redundancy in space: intra-frame encoding

Motion compensation– I-frame: (Key frame) intra-frame encoding– P-frame: (Predicted frame) Predicting regions of

current frame from previous frame – B-frame: (Bi-predictive frame) Predicting regions of

current frame using both previous and next frame

Page 21: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved. 22© Copyright 2013 Pivotal. All rights reserved.

Distributed Video Transcoder on HadoopDistributed MapReduce MPEG Transcoder

Page 22: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Motivation of Distributed Video Transcoding

Can we decode the individual frames from an arbitrary block in Hadoop File System (HDFS)?

Hadoop splits any file into 64MB or 128MB blocks in HDFS.

Each block can be processed in parallel by customized Map-Reduce function

Most video file standards are Not Hadoop-Friendly.

Page 23: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Decoding MPEG-2 with MapReduce

Two key observations– Video header information: available only at the header in the bitstream– Group of Pictures (GOP) header repeats

Steps to decode arbitrary blocks– Step 1: Configure each mapper to extract the header information from each file;

▪ Totals ~20 videos at 5GB

– Step 2: Start searching for GOP header in each block in parallel;– Step 3: Decode frames into a suitable image format (JPEG, BMP, etc);– Step 4: Consolidate all time-stamped frames into Hadoop Sequence File.

▪ Reduces to sequence file at 500MB

Transcoding MPEG-2 video into Hadoop-friendly format

Page 24: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved.

Distributed Video Analytics Platform on Hadoop

Page 25: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Object Detection with Gaussian Mixture Model The video data is much more noisier than we realize.

You don’t realize it because your visual cortex can denoise.

For computer, it requires good statistical models (e.g. GMM) for robustness.

Distribution of pixel intensities over time

Page 26: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Typical Video Analytics Workflow Video/image data are highly unstructured

Hadoop proven to be excellent in extracting structured insights from Big Data

A typical workflow:

ANALYTIC RESULT

Foreground Extraction

Background Stat Model

Visual KeyComposite

Key

Feature Extraction

/Classification

((Key, Time), Loc)

Page 27: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Use Case 1: Anomaly Detection Extracting structured info from Unstructured data

Computer vision algorithms fit into Mapper/Reducer framework

Intermediate (Key, Value)– (RestrictedArea, IntrusionEvent(Time, ViolatorImage) )

Map Reduce

HDFS

Map

Map

Map

HDFS / GPDB

Reduce

Reduce

2012-09-01 07:00:00

Page 28: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Use Case 2: Trajectory Analysis

Tracking multiple objects in Big Data video archives

Building high level summarization e.g. moving trajectory time series

T1 T2 T3

T4 T5 T6

Page 29: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Use Case 2: Trajectory Analysis “Map”

Map

Foreground Extraction

Background Stat Model

Visual KeyComposite

Key

Feature Extraction

/Classification

((VisKey, time), loc)

Emit(K,V)

Page 30: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Use Case 2: Trajectory Analysis “Reduce”

Reduce

Aggregate

User definedTrajectory

model

(Object, Trajectory)

2nd Sort on Composite key

((VisKey, time), loc)

Page 31: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Video Analytics Platform Supports Video ETL

– Support standard formats: MPG, AVI, MP4. – Sequence file in HDFS

Image Processing Toolkit– Support standard formats (e.g. JPEG, BMP, PNG)– Color space conversion– Edge/key point detection– Morphological processing– Filtering: convolutional, median, etc.

PHD MapReduce for scalable computer vision algorithms

HAWQ SQL for high level analytics

Page 32: Video Analytics on Hadoop webinar victor fang-201309

34© Copyright 2013 Pivotal. All rights reserved.

Video Analytics Demo

Page 33: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Performance Quick Facts

Each frame takes 103 millisecond to process a 720x576 video frame (near real time even in Java)

Detection algorithm: Linearly scale with #processors

• Impacts: • Enhance public security• Improve security officers’ producitivity

Page 34: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Querying the Analytics Results Average speed of the red car on yesterday, using window function

SELECT sqrt(power(avg(abs(x_diff)),2) + power(avg(abs(y_diff)),2))*FPS_MPS_FACTOR

FROM (SELECT X-lag(X,1) OVER (ORDER BY TIME ) AS x_diff, Y-lag(Y,1) OVER (ORDER BY TIME ) AS y_diffFROM SANMATEO WHERE TARGET = AND TIME > (CURRENT_TIMESTAMP – INTERVAL ‘1’ DAY)AND TIME < (CURRENT_TIMESTAMP );

) x_tmp;

RESULT:

7.2 mph

Page 35: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

More Use Cases Most of computer vision algorithms are embarrassingly parallel

No data sharing between processes– Feature extraction– Object detection/classification

Video Categorization for user generated contents– Find out trending in Youtube videos by topic modeling

Object Detection– Detect known categories of objects, e.g. face, bar code,

vehicle.

Object Search– Given a known object, using template matching to locate

the object

Haar-like + AdaBoost Cascade Face Detector

Page 36: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

Summary Hadoop : a great tool for data scientists to crunch Unstructured

Big Data!

Hadoop extracts Structured insights from Unstructured video with customized computer vision algorithms.

Scalable framework with ease of experimenting, developing, deploying!

Pivotal HD demonstrates large scale video analytics use cases:– Anomaly detection– Trajectory analysis– More …

Page 37: Video Analytics on Hadoop webinar victor fang-201309

48© Copyright 2013 Pivotal. All rights reserved. 48© Copyright 2013 Pivotal. All rights reserved.

Q&A

Page 38: Video Analytics on Hadoop webinar victor fang-201309

© Copyright 2013 Pivotal. All rights reserved.

More Information

Pivotal Blog Site August 12, 2013

Large Scale Video Analytics

Contact the Data Science Lab Services

[email protected]

Page 39: Video Analytics on Hadoop webinar victor fang-201309

50© Copyright 2013 Pivotal. All rights reserved. 50© Copyright 2013 Pivotal. All rights reserved.

Thank You

Page 40: Video Analytics on Hadoop webinar victor fang-201309

A NEW PLATFORM FOR A NEW ERA