Video Analytics on Hadoop webinar victor fang-201309
-
Upload
drvictorfang -
Category
Technology
-
view
106 -
download
1
description
Transcript of Video Analytics on Hadoop webinar victor fang-201309
A NEW PLATFORM FOR A NEW ERA
2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved.
What You Can Do With Hadoop Webinar Series
Unstructured Data – Video AnalyticsSeptember 6, 2013
Dr. Chunsheng (Victor) Fang, Sr. Data ScientistAnnika Jimenez, Global Head of Data Science ServicesNikesh Shah, Sr. Product Marketing Manager
3© Copyright 2013 Pivotal. All rights reserved.
What You Will Learn
Pivotal Data Science Lab Services
New Emerging Trends for Unstructured Data
Video Analytics on Hadoop
Analytics with SQL
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Platform
Cloud Storage
Virtualization
Data & AnalyticsPlatform
CloudApplication
Platform
Data-DrivenApplication
Development
Pivotal Data Science Labs
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Data Science
© Copyright 2013 Pivotal. All rights reserved.
Data Science Value Chain
Instrumen-tation
Logs Capture Store
Transform and
PrepareAccess Model
Development Deploy Applications Process Change
Product Engineer
Platform Engineer DBA
Data Engineer/Program
mer
Data Engineer Data
Scientist
Platform Engineer
Application Developer
PMO
© Copyright 2013 Pivotal. All rights reserved.
How We Help Our Customers
1. Data Science Strategy Definition
2. Point Proof-of-Value Model Development
3. Multiple Model Development + Apps
4. DSIC Transformation to “Predictive Enterprise”
5. Also:– Algorithm development– Pushing the envelope in problem-solving
Pivotal Data Science Labs
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Data Science Knowledge Development
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Data Science Dream Team Derek Lin – Network Security, Fraud Detection, Speech and Language
Processing, (Principal Scientist at RSA, M.S. in Signal Processing, USC) Hulya Farinas – Optimization, Resource Allocation in Healthcare
(Modeler at M-Factor, IBM, Ph.D. in Operations Research, University of Florida)
Kaushik Das – Mathematical Modeling in Energy, Retail and Telco(Director of Analytics at M-Factor, M.S. in Mineral Engineering, UC Berkeley)
Sarah Aerni – Genomics and Machine Learning (Ph.D. in Biomedical Informatics, Stanford)
Mariann Micsinai – Next Generation Sequencing (Market Risk Management Associate at Lehman Brothers, Ph.D. in Computational Biology, NYU and Yale)
Victor Fang – Imaging and Graph Analytics, Machine Learning (Sr. Scientist at Riverain Medical, SDE at Amazon.com, Ph.D. in Computer Sciences, University of Cincinnati)
Emily Kawaler – Clinical Informatics and Machine Learning (M.S. in Computer Sciences, University of Wisconsin-Madison)
Anirudh Kondaveeti – Trajectory Data Mining and Machine Learning (Ph.D. in Computing & Dec. Systems Eng, Arizona State University)
Hong Ooi – Insurance and Finance Risk Modeling (Statistician at ANZ, Ph.D. in Statistics, Australian National University)
Michael Brand –Text, Speech and Video Research for Retail, Finance and Gaming (Chief Scientist at Verint Systems, M.S. in Applied Mathematics, Weizmann Institute)
Kee Siong Ng – Data Mining in Healthcare (Sr. Data Miner at Medicare Australia, Ph.D. in Computer Science, and Postdoctoral Fellow, Australian National University)
Noelle Sio – Digital Media Analytics and Mathematical Modeling(Sr. Analyst at eHarmony, Fox Interactive Media (Myspace), M.S. in Applied Mathematics, Cal Poly Pomona)
Jin Yu – Stochastic Optimization, Robust Statistics in Machine Learning, Computer Vision (Research Associate at U of Adelaide, Ph.D. in Machine Learning, Australian National University)
Rashmi Raghu – Computational Methods and Analysis (Ph.D. in Mechanical Engineering, Stanford)
Woo Jung – Bayesian Inference and Demand Analysis (Sr. Statistician at M-Factor, M.S. in Statistics, Stanford)
Jarrod Vawdrey – Marketing Analytics & SAS (Analytics Consultant at Aspen Marketing, B.S. in Mathematics, Kennesaw State University)
Niels Kasch – Text Analytics and NLP (Ph.D. in Computer Science, UMBC)
Vivek Ramamurthy – Online Learning, Stochastic Modeling, Convex Optimization (Ph.D. in Operations Research, UC Berkeley)
Srivatsan Ramanujam – NLP and Text Mining(Natural Language Scientist at Sony, Salesforce.com, M.S. in Computer Sciences, UT Austin)
Alexander Kagoshima – Time Series, Statistics and Machine Learning (M.S. in Economics/Computer Science, TU Berlin)
© Copyright 2013 Pivotal. All rights reserved.
Data Science Labs: Packaged Services
LAB PRIMER(2-Week Strategy)
• Customized Analytics Roadmap
• 1-day Moderated Brainstorming Session
• Prioritized Opportunities
• Architectural Recommendations
LAB 600(6-Week Lab)
• Prof. Services(Data Load)
• Data Science Model Building
• Project Management
• Ready-to-DeployModel(s)
LAB 1200(12-Week Lab)
• Prof. Services(Data Load)
• Data Science Model Building
• Project
• Management
• Ready-to-DeployModel(s)
LAB 100(2-Week Lab)
• On-site PivotalAnalytics Training
• Rapid Model/InsightBuild on CustomerData(2 weeks)
© Copyright 2013 Pivotal. All rights reserved.
Approach: Data Science Lab 1200Week
1 2 3 4 5 6 7 8 9 10 11 12
Data Exploration
Features Building
Model Development
Code QA and Scoring
Model Optimization& Validation
Data Loaded
InsightsPresentation
Training
PreliminaryModel Review
Feature ReviewData Review
Documentation
© Copyright 2013 Pivotal. All rights reserved.
Program Management Data Architecture and Engineering Data Scientists Training and Skills
Development
Facilitate data loading processes from source systems to Pivotal Data Fabric
Coordinate data needs with Data Scientists
Best practice education for analytics performance
Data migration to support new applications
Oversight and communication plans
Organizational alignment
Risk mitigation
Resource planning
Prioritize deliverables
Socialize progress of overall initiative
Instill data collaboration culture
Execute Data Science Lab engagements around revenue generation or cost saving efforts
Hands on education with new data analysis techniques
Introduce new analytics tools and methodologies
Identify candidates for deeper data science training
Create training curriculum
Recruiting Methodology
Parallel computing techniques defined and demonstrated
Build institutional knowledge for client data science team
Data Science Innovation Center (DSIC)Key Principles• Building a predictive enterprise is, first and foremost, about building a human infrastructure.• Analytics is an iterative knowledge discovery process and needs to be managed as such.• Discovery starts from asking the right questions – that can be as important as finding
answers to those questions.
© Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved.
Large Scale Video Analytics Platform on HadoopDr. Chunsheng (Victor) Fang, Sr. Data Scientist
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Video Analytics Taskforce Chunsheng (Victor) Fang, Ph.D.
– Sr. Data Scientist
Regunathan Radhakrishnan, Ph.D.– Sr. Data Scientist
Derek Lin, – Principal Data Scientist
Sameer Tiwari– Hadoop Architect
Kenneth Dowling & Michael Nemesh– DCA Admin
16© Copyright 2013 Pivotal. All rights reserved.
Industry Use CaseSurveillance Video Anomaly Detection
© Copyright 2013 Pivotal. All rights reserved.
Anomaly Detection in Surveillance Video Detect anomalous objects in a restricted perimeter.
Typical large enterprise collects TB’s video per day.
Hadoop MapReduce runs computer vision algorithms in parallel and captures violation events.
Post-Incident monitoring enabled by Hadoop / HAWQ.
© Copyright 2013 Pivotal. All rights reserved.
Unstructured Video Data Workflow
Unstructured data as input
ETL: Distributed Video Transcoder
Analytics: Distributed Video Analytics
Structured Insights in relational database for advanced analytics
ETL AnalyticsUnstructured
DataStructured
Insights
© Copyright 2013 Pivotal. All rights reserved.
Real World Video Data Benchmark Surveillance Videos (i-LIDS) from United Kingdom Home
Office– Library of HiDef CCTV video footage based around ‘scenarios’ central to the
government’s requirements. – The footage accurately represents real operating conditions and potential threats.
Anomaly Detection: Sterile zone dataset
Night Day
© Copyright 2013 Pivotal. All rights reserved.
Most Common Video Standards
MPEG & ITU: responsible for many video standards
MPEG-2 (1995): Widely adopted, DVDs, Digital TV broadcast, set-top boxes
© Copyright 2013 Pivotal. All rights reserved.
Intro to MPEG Standard MPEG standard encodes video frames
– Redundancy in time: inter-frame encoding– Redundancy in space: intra-frame encoding
Motion compensation– I-frame: (Key frame) intra-frame encoding– P-frame: (Predicted frame) Predicting regions of
current frame from previous frame – B-frame: (Bi-predictive frame) Predicting regions of
current frame using both previous and next frame
© Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved. 22© Copyright 2013 Pivotal. All rights reserved.
Distributed Video Transcoder on HadoopDistributed MapReduce MPEG Transcoder
© Copyright 2013 Pivotal. All rights reserved.
Motivation of Distributed Video Transcoding
Can we decode the individual frames from an arbitrary block in Hadoop File System (HDFS)?
Hadoop splits any file into 64MB or 128MB blocks in HDFS.
Each block can be processed in parallel by customized Map-Reduce function
Most video file standards are Not Hadoop-Friendly.
© Copyright 2013 Pivotal. All rights reserved.
Decoding MPEG-2 with MapReduce
Two key observations– Video header information: available only at the header in the bitstream– Group of Pictures (GOP) header repeats
Steps to decode arbitrary blocks– Step 1: Configure each mapper to extract the header information from each file;
▪ Totals ~20 videos at 5GB
– Step 2: Start searching for GOP header in each block in parallel;– Step 3: Decode frames into a suitable image format (JPEG, BMP, etc);– Step 4: Consolidate all time-stamped frames into Hadoop Sequence File.
▪ Reduces to sequence file at 500MB
Transcoding MPEG-2 video into Hadoop-friendly format
© Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved.
Distributed Video Analytics Platform on Hadoop
© Copyright 2013 Pivotal. All rights reserved.
Object Detection with Gaussian Mixture Model The video data is much more noisier than we realize.
You don’t realize it because your visual cortex can denoise.
For computer, it requires good statistical models (e.g. GMM) for robustness.
Distribution of pixel intensities over time
© Copyright 2013 Pivotal. All rights reserved.
Typical Video Analytics Workflow Video/image data are highly unstructured
Hadoop proven to be excellent in extracting structured insights from Big Data
A typical workflow:
ANALYTIC RESULT
Foreground Extraction
Background Stat Model
Visual KeyComposite
Key
Feature Extraction
/Classification
((Key, Time), Loc)
© Copyright 2013 Pivotal. All rights reserved.
Use Case 1: Anomaly Detection Extracting structured info from Unstructured data
Computer vision algorithms fit into Mapper/Reducer framework
Intermediate (Key, Value)– (RestrictedArea, IntrusionEvent(Time, ViolatorImage) )
Map Reduce
HDFS
Map
Map
Map
HDFS / GPDB
Reduce
Reduce
2012-09-01 07:00:00
© Copyright 2013 Pivotal. All rights reserved.
Use Case 2: Trajectory Analysis
Tracking multiple objects in Big Data video archives
Building high level summarization e.g. moving trajectory time series
T1 T2 T3
T4 T5 T6
© Copyright 2013 Pivotal. All rights reserved.
Use Case 2: Trajectory Analysis “Map”
Map
Foreground Extraction
Background Stat Model
Visual KeyComposite
Key
Feature Extraction
/Classification
((VisKey, time), loc)
Emit(K,V)
© Copyright 2013 Pivotal. All rights reserved.
Use Case 2: Trajectory Analysis “Reduce”
Reduce
Aggregate
User definedTrajectory
model
(Object, Trajectory)
2nd Sort on Composite key
((VisKey, time), loc)
© Copyright 2013 Pivotal. All rights reserved.
Video Analytics Platform Supports Video ETL
– Support standard formats: MPG, AVI, MP4. – Sequence file in HDFS
Image Processing Toolkit– Support standard formats (e.g. JPEG, BMP, PNG)– Color space conversion– Edge/key point detection– Morphological processing– Filtering: convolutional, median, etc.
PHD MapReduce for scalable computer vision algorithms
HAWQ SQL for high level analytics
34© Copyright 2013 Pivotal. All rights reserved.
Video Analytics Demo
© Copyright 2013 Pivotal. All rights reserved.
Performance Quick Facts
Each frame takes 103 millisecond to process a 720x576 video frame (near real time even in Java)
Detection algorithm: Linearly scale with #processors
• Impacts: • Enhance public security• Improve security officers’ producitivity
© Copyright 2013 Pivotal. All rights reserved.
Querying the Analytics Results Average speed of the red car on yesterday, using window function
SELECT sqrt(power(avg(abs(x_diff)),2) + power(avg(abs(y_diff)),2))*FPS_MPS_FACTOR
FROM (SELECT X-lag(X,1) OVER (ORDER BY TIME ) AS x_diff, Y-lag(Y,1) OVER (ORDER BY TIME ) AS y_diffFROM SANMATEO WHERE TARGET = AND TIME > (CURRENT_TIMESTAMP – INTERVAL ‘1’ DAY)AND TIME < (CURRENT_TIMESTAMP );
) x_tmp;
RESULT:
7.2 mph
© Copyright 2013 Pivotal. All rights reserved.
More Use Cases Most of computer vision algorithms are embarrassingly parallel
No data sharing between processes– Feature extraction– Object detection/classification
Video Categorization for user generated contents– Find out trending in Youtube videos by topic modeling
Object Detection– Detect known categories of objects, e.g. face, bar code,
vehicle.
Object Search– Given a known object, using template matching to locate
the object
Haar-like + AdaBoost Cascade Face Detector
© Copyright 2013 Pivotal. All rights reserved.
Summary Hadoop : a great tool for data scientists to crunch Unstructured
Big Data!
Hadoop extracts Structured insights from Unstructured video with customized computer vision algorithms.
Scalable framework with ease of experimenting, developing, deploying!
Pivotal HD demonstrates large scale video analytics use cases:– Anomaly detection– Trajectory analysis– More …
48© Copyright 2013 Pivotal. All rights reserved. 48© Copyright 2013 Pivotal. All rights reserved.
Q&A
© Copyright 2013 Pivotal. All rights reserved.
More Information
Pivotal Blog Site August 12, 2013
Large Scale Video Analytics
Contact the Data Science Lab Services
50© Copyright 2013 Pivotal. All rights reserved. 50© Copyright 2013 Pivotal. All rights reserved.
Thank You
A NEW PLATFORM FOR A NEW ERA