deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto...
Transcript of deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto...
![Page 1: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/1.jpg)
Deep Learning in ObjectDeep Learning in Object Detection
Wanli OuyangWanli OuyangDepartment of Electronic Engineering, Th Chi U i i f H KThe Chinese University of Hong Kong
![Page 2: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/2.jpg)
Face alignment
Deep learningObject detection
Human pose estimation
Deep learning
Pedestrian detection
![Page 3: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/3.jpg)
Outline• General object detectionj
• Pedestrian Detection
l li i• Human part localization
![Page 4: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/4.jpg)
2 “How to”s2 How to s• How to effectively train a deep model Data augmentation Label more dataPre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)
• How to formulate a vision problem with deep learning Tune hyper‐parameters, e.g. number of hidden nodes, filter Tune hyper parameters, e.g. number of hidden nodes, filter
size, number of layers, activation function, dropout …Make use of experience and insights obtained in CV researchp g
Sequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)
![Page 5: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/5.jpg)
2 “How to”s2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)
• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters, e.g. number of hidden nodes,
number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)
![Page 6: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/6.jpg)
Object detectionObject detectionP l VOC I t ILSVRCPascal VOC
~ 20 object classesi i 00 i
Image‐net ILSVRC~ 200 object classesi i 39 000 iTraining: ~ 5,700 images
Testing: ~10,000 imagesTraining: ~ 395,000 imagesTesting: ~ 40,000 images
![Page 7: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/7.jpg)
SIFT HOG LBP DPMSIFT, HOG, LBP, DPM …
[Regionlets. Wang et al. ICCV’13] [SegDPM. Fidler et al. CVPR’13]
![Page 8: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/8.jpg)
With CNN featuresWith CNN features
![Page 9: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/9.jpg)
2 “How to”s2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)
• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters, e.g. number of hidden nodes,
number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Short and long range temporal relationship (Li fei‐fei and Yu kai’s works)
![Page 10: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/10.jpg)
R CNN: regions + CNN featuresR‐CNN: regions + CNN features
Input image Extract region proposals (~2k/image)
Compute CNN f t
2‐class linear SVMproposals ( 2k/image) featuresRegion: 91.6%/98Selective Search [van de Sande, Uijlings et al.]. % recall rate on ImageNet/PASCAL
CNN feature : Krizhevsky, Sutskever & Hinton. NIPS 2012. Also called “AlexNet”SVM: Liblinear
![Page 11: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/11.jpg)
CNN trainingCNN training
T i S Vi i CNN* f th 1000• Train a SuperVision CNN* for the 1000‐way ILSVRC image classification task (1.2 million images)
• Fine‐tune the CNN for detectionFine tune the CNN for detectionTransfer the representation learned from ILSVRC
Classification to PASCAL (or ImageNet) detectionClassification to PASCAL (or ImageNet) detection
ImageNetCls Pre‐Train ImageNetCls
Det Fine‐tune
Network from Krizhevsky, Sutskever & Hinton. NIPS 2012Also called “AlexNet
Det
![Page 12: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/12.jpg)
Overfeat [Sermanet et al 2014]Overfeat [Sermanet et al. 2014]
M id ti d d l d i• More considerations on deep model design– Multi‐resolution, dense pooling
• Sliding window (not region proposal)• Does not use ImageNet Cls for pretraining• Does not use ImageNet‐Cls for pretraining
![Page 13: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/13.jpg)
Experimental results on ILSVRC 2013Experimental results on ILSVRC 2013
![Page 14: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/14.jpg)
2 “How to”s2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)
• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters, e.g. number of hidden nodes,
number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Short and long range temporal relationship (Li fei‐fei and Yu kai’s works)
![Page 15: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/15.jpg)
Pedestrian Detection
Improve state‐of‐the‐art average miss detection rateaverage miss detection rate on the largest Caltech dataset from 63% to 38%
ICCV’13
CVPR’12 CVPR’13 ICCV’13 CVPR’14
![Page 16: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/16.jpg)
Joint Deep LearningJoint Deep Learning
![Page 17: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/17.jpg)
What if we treat an existing deep model as a black box in pedestrian detection?
ConvNet−U−MS
– Sermnet, K. Kavukcuoglu, S. Chintala, and LeCun, “Pedestrian Detection with Unsupervised Multi‐Stage Feature Learning,” CVPR 2013.
![Page 18: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/18.jpg)
Results on Caltech TestResults on ETHZ
![Page 19: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/19.jpg)
2 “How to”s2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)
• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters, e.g. number of hidden nodes,
number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Short and long range temporal relationship (Li fei‐fei and Yu kai’s works)
![Page 20: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/20.jpg)
• N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. CVPR 2005 (6000 citations)CVPR, 2005. (6000 citations)
• P. Felzenszwalb, D. McAlester, and D. Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR, 2008. (2000 citations)
• W. Ouyang and X. Wang. A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling. CVPR, 2012.
![Page 21: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/21.jpg)
Our Joint Deep Learning ModelOur Joint Deep Learning Model
![Page 22: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/22.jpg)
Modeling Part DetectorsModeling Part Detectors
• Design the filters in the second convolutional layer with variable sizes
Part models learned from HOG
Part models Learned filtered at the second convolutional layer
![Page 23: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/23.jpg)
Deformation LayerDeformation Layer
![Page 24: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/24.jpg)
Visibility Reasoning with Deep Belief NetVisibility Reasoning with Deep Belief Net
![Page 25: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/25.jpg)
Experimental ResultsExperimental Results• Caltech Test dataset (largest most widely used)• Caltech – Test dataset (largest, most widely used)
90
100 95%68%( %)
70
80
90 68%63% (state‐of‐the‐art)
53%
ss ra
te (
50
60
39% (best performing)age mis
2000 2002 2004 2006 2008 2010 2012 201430
40( p g)
Improve by ~ 20%
Avera
W. Ouyang, X. Zeng and X. Wang, "Modeling Mutual Visibility Relationship in Pedestrian Detection ", CVPR 2013.W. Ouyang, Xiaogang Wang, "Single‐Pedestrian Detection aided by Multi‐pedestrian Detection ", CVPR 2013.X Zeng W Ouyang and X Wang ” A Cascaded Deep Learning Architecture for Pedestrian Detection ” ICCV 2013
W. Ouyang and X. Wang, "A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling,“ CVPR 2012.
X. Zeng, W. Ouyang and X. Wang, A Cascaded Deep Learning Architecture for Pedestrian Detection, ICCV 2013.W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,” IEEE ICCV 2013.
![Page 26: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/26.jpg)
Results on Caltech Test Results on ETHZ
![Page 27: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/27.jpg)
DN‐HOGUDN‐HOGUDN‐HOGCSSUDN‐CNNFeatUDN‐DefLayer
![Page 28: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/28.jpg)
Multi‐Stage Contextual Deep Learning
![Page 29: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/29.jpg)
Motivated by Cascaded Classifiers and Contextual Boost
• The classifier of each stage deals with a specific set of samples
• The score map output by one classifier can serve as contextual information for the next classifier
Only pass one detection score to the next stage Classifiers are trained Classifiers are trained sequentially
Conventional cascaded classifiers for detection
![Page 30: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/30.jpg)
2 “How to”s2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)
• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters, e.g. number of hidden nodes,
number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Short and long range temporal relationship (Li fei‐fei and Yu kai’s works)
![Page 31: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/31.jpg)
• Our deep model keeps the score map output by the current classifier and it serves as contextual information to support the decision at the next stageserves as contextual information to support the decision at the next stage
• Cascaded classifiers are jointly optimized instead of being trained sequentially• To avoid overfitting, a stage‐wise pre‐training scheme is proposed to regularize
ti i tioptimization• Simulate the cascaded classifiers by mining hard samples to train the network stage‐by‐stage
![Page 32: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/32.jpg)
Training StrategiesTraining Strategiesd l b l• Unsupervised pre‐train Wh,i+1 layer‐by‐layer, setting Ws,i+1 = 0, Fi+1 = 0
• Fine‐tune all the Wh,i+1 with supervised BP• Train Fi 1 andW i 1 with BP stage‐by‐stageTrain Fi+1 and Ws,i+1 with BP stage by stage• A correctly classified sampled at the previous stage does not influence the
update of parameters• Stage‐by‐stage training can be considered as adding regularization
constraints to parameters, i.e. some parameters are constrained to be zeros in the early training stagesy g g
Log error function:
Gradients for updating parameters:
![Page 33: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/33.jpg)
Experimental ResultsExperimental Results
Caltech ETHZ
![Page 34: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/34.jpg)
DeepNetNoneFilter
![Page 35: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/35.jpg)
Comparison of Different Training StrategiesComparison of Different Training Strategies
Network‐BP: use back propagation to update all the parameters without pre‐trainingPretrainTransferMatrix‐BP: the transfer matrices are unsupervised pertrained, and then all the parameters are fine‐tunedMulti‐stage: our multi‐stage training strategy
![Page 36: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/36.jpg)
Switchable Deep Network for Pedestrian DetectionPedestrian Detection
![Page 37: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/37.jpg)
Switchable Deep Network for Pedestrian Detection
• Background clutter and large variations of pedestrianappearance.
• Proposed Solution. A Switchable Deep Network (SDN)for learning the foreground map.
![Page 38: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/38.jpg)
2 “How to”s2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)
• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters, e.g. number of hidden nodes,
number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)
![Page 39: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/39.jpg)
Switchable Deep Network for Pedestrian Detection
• Switchable Restricted Boltzmann Machine
![Page 40: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/40.jpg)
Switchable Deep Network for Pedestrian Detection
• Switchable Restricted Boltzmann Machine
ForegroundBackground
![Page 41: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/41.jpg)
Switchable Deep Network for Pedestrian Detection
(a) Performance on Caltech Test (b) Performance on ETH
![Page 42: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/42.jpg)
Human part localizationHuman part localization
• Facial Keypoint Detection• Human pose estimationHuman pose estimation
CVPR’ 13
CVPR’ 14
![Page 43: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/43.jpg)
2 “How to”s2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)
• How to formulate a vision problem with deep learning• How to formulate a vision problem with deep learning Tune hyper‐parameters, e.g. number of hidden nodes,
number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)
![Page 44: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/44.jpg)
Facial Keypoint DetectionFacial Keypoint Detection
• Y. Sun, X. Wang and X. Tang, “Deep Convolutional Network Cascade for Facial Point Detection,” CVPR 2013
![Page 45: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/45.jpg)
![Page 46: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/46.jpg)
Comparison with Belhumeur et al. [4], Cao et al. [5] on LFPW test images.
1. http://www.luxand.com/facesdk/2. http://research.microsoft.com/en‐us/projects/facesdk/.3. O. Jesorsky, K. J. Kirchberg, and R. Frischholz. Robust face detection using the hausdorff distance. In Proc. AVBPA, 2001.4. P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar. Localizing parts of faces using a consensus of exemplars. In Proc. CVPR, 2011.5. X. Cao, Y. Wei, F. Wen, and J. Sun. Face alignment by explicit shape regression. In Proc. CVPR, 2012.6. L. Liang, R. Xiao, F. Wen, and J. Sun. Face alignment via component‐based discriminative search. In Proc. ECCV, 2008.7. M. Valstar, B. Martinez, X. Binefa, and M. Pantic. Facial point detection using boosted regression and graph models. In Proc. CVPR, 2010.
![Page 47: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/47.jpg)
Validation.
BioID.
LFPWLFPW.
![Page 48: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/48.jpg)
Benefits of Using Deep ModelBenefits of Using Deep Model
• Take the full face as input to make full use of texture context information over the entire face to locate each keypointTh fi t t k th t t k th h l f i t d• The first network that takes the whole face as input needs deep structures to extract high‐level features
• Since the networks are trained to predict all the keypoints• Since the networks are trained to predict all the keypointssimultaneously, the geometric constraints among keypointsare implicitly encodedp y
• Global geometric constraints among keypoints can also be explicitly encoded by deep model.
![Page 49: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/49.jpg)
Human pose estimationHuman pose estimation• W Ouyang X Wang and X Tang “Multi‐W. Ouyang, X. Wang and X. Tang, Multisource Deep Learning for Human Pose Estimation” CVPR 2014Estimation CVPR 2014.
![Page 50: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/50.jpg)
Multiple information sourcesp
• Appearance
![Page 51: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/51.jpg)
Multiple information sourcesp
• Appearance• Appearance mixture typeAppearance mixture type
![Page 52: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/52.jpg)
Multiple information sourcesp
• Appearance• Appearance mixture typeAppearance mixture type• Deformation
![Page 53: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/53.jpg)
Multi‐source deep modelp
![Page 54: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/54.jpg)
Experimental resultsExperimental resultsPARSE
Method Torso U.leg L.leg U.arm L.arm head Total
Yang&Ramanan [58] 82.9 68.8 60.5 63.4 42.4 82.4 63.6
O 89 3 78 0 72 0 67 8 47 8 89 3 71 0Ours 89.3 78.0 72.0 67.8 47.8 89.3 71.0
UIUC PeopleMethod Torso U.leg L.leg U.arm L.arm head Total
Yang&Ramanan [58] 81.8 65.0 55.1 46.8 37.7 79.8 57.0
Ours 89 1 72 9 62 4 56 3 47 6 89 1 65 6Ours 89.1 72.9 62.4 56.3 47.6 89.1 65.6
LSPMethod Torso U.leg L.leg U.arm L.arm head Total
Yang&Ramanan [58] 82.9 70.3 67.0 56.0 39.8 79.3 62.8
Ours 85 8 76 5 72 2 63 3 46 6 83 1 68 6Ours 85.8 76.5 72.2 63.3 46.6 83.1 68.6
Up to 8.6 percent accuracy improvement with global geometric constraints
![Page 55: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/55.jpg)
Experimental resultsExperimental resultsRightLeftOursYang&Ramanan
![Page 56: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/56.jpg)
Conclusion 2 “How to”sConclusion ‐ 2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)
• How to formulate a vision problem with deep learning?• How to formulate a vision problem with deep learning? Tune hyper‐parameters, e.g. number of hidden nodes,
number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)
![Page 57: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/57.jpg)
ReferenceReferenceR Gi hi k J D h T D ll J M lik “Ri h F Hi hi f A Obj• R. Girshick, J. Donahue, T. Darrell, J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” CVPR, 2014.
• Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection i l ti l t k " Xi i t Xi 1312 6229 (2013)using convolutional networks." arXiv preprint arXiv:1312.6229 (2013).
• W. Ouyang and X. Wang, "Joint Deep Learning for Pedestrian Detection," in Proceedings of IEEE International Conference on Computer Vision (ICCV) 2013X Z W O d X W "M l i S C l D L i f P d i• X. Zeng, W. Ouyang and X. Wang, "Multi‐Stage Contextual Deep Learning for Pedestrian Detection," in Proceedings of IEEE International Conference on Computer Vision (ICCV)2013
• P Luo Y Tian X Wang and X Tang "Switchable Deep Network for Pedestrian• P. Luo, Y. Tian, X. Wang, and X. Tang, "Switchable Deep Network for Pedestrian Detection", IEEE Conf. on Computer Vision and Pattern Recognition, June 2014
![Page 58: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/58.jpg)
ReferenceReferenceW O X Z d X W "M d li M l Vi ibili R l i hi i h D• W. Ouyang, X. Zeng and X. Wang, "Modeling Mutual Visibility Relationship with a Deep Model in Pedestrian Detection," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3222‐3229, 2013 W O d X W "A Di i i ti D M d l f P d t i D t ti ith• W. Ouyang, and X. Wang, "A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3258‐3265, 2012
• Y Sun X Wang and X Tang "Deep Convolutional Network Cascade for Facial Point• Y. Sun, X. Wang and X. Tang, Deep Convolutional Network Cascade for Facial Point Detection," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3476‐3483, 2013
• W Ouyang X Chu and X Wang "Multi‐source Deep Learning for Human Pose Estimation"W. Ouyang, X. Chu, and X. Wang, Multi source Deep Learning for Human Pose Estimation , IEEE Conf. on Computer Vision and Pattern Recognition, June 2014
![Page 59: deep learning icme detection - CUHK Electronic Engineering ...xgwang/detection.pdf · 2 “How toto s”s ... W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,”](https://reader036.fdocuments.us/reader036/viewer/2022070708/5eb96f12a28e0367ab1c0404/html5/thumbnails/59.jpg)
Q&AQ&A
mmlab.ie.cuhk.edu.hk/ www.ee.cuhk.edu.hk/~xgwang/ www.ee.cuhk.edu.hk/~wlouyang/