Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their...

14
1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene Y.H. Gu Signal Processing Group, Dept. of Signals and Systems Chalmers University of Technology, Göteborg, Sweden April 14, 2016 Acknowledgement Chalmers ZH Khan, K Fu, Y Yun, M Bolbat, S Haner, P Strandström, MH Changrampadi, M Emami, D Moro, DP Kumar, H Fundin, A Johannesson, P Shams, G Sowulewski1 Other collaborators L Li (I2R, Singapore), H Aghajan (Stanford, USA), J Yang, X long (Shanghai Jiao Tong Univ., China), M Thordstein, A Flisberg (SU, Sweden) Contents 1. Why using video/image-based techniques ? 2. Applications 3. Demo: our results 4. Our recently developed ML methods 5. Conclusion 1. Why ML using visual sensors Data from visual sensors (RGB-D and IR images/videos) provide important information. Machine learning (ML) using visual data: * important in its own right for theoretical studies * wide applications Our methods focus on video/image information analysis, modeling for machine learning 2. Applications we address ML for autonomous driving /traffic analysis / driver assistence: - road traffic mornitoring: speed, frequency, lane, traffic light, surrounding vehicles - traffic sign detection and recognition - drivers’ attention, other status (sleepyness, attention, actions, identification) ML for activity recognition and e-healthcare ML for online learning in tracking/surveillance systems Dynamic: videos:dynamic background in addition to moving objects Static: videos: static background (though background changes in images due to lighting, camera jitter, occlusion…) Visual (RGB) Thermal IR Near IR Depth Cameras Cameras can be mounted in different ways: Cameras types we use:

Transcript of Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their...

Page 1: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

1

Machine Learning Methods and Their

Applications to Autonomous Driving and

e-Healthcare using Video Sensor Data

Prof. Irene Y.H. Gu

Signal Processing Group, Dept. of Signals and Systems

Chalmers University of Technology, Göteborg, Sweden

April 14, 2016

Acknowledgement

Chalmers

ZH Khan, K Fu, Y Yun,

M Bolbat, S Haner, P Strandström,

MH Changrampadi, M Emami, D Moro, DP Kumar,

H Fundin, A Johannesson, P Shams, G Sowulewski1

Other collaborators

L Li (I2R, Singapore),

H Aghajan (Stanford, USA),

J Yang, X long (Shanghai Jiao Tong Univ., China),

M Thordstein, A Flisberg (SU, Sweden)

Contents

1. Why using video/image-based techniques ?

2. Applications

3. Demo: our results

4. Our recently developed ML methods

5. Conclusion

1. Why ML using visual sensors

Data from visual sensors (RGB-D and IR

images/videos) provide important information.

Machine learning (ML) using visual data:

* important in its own right for theoretical studies

* wide applications

Our methods focus on video/image information analysis,

modeling for machine learning

2. Applications we address

ML for autonomous driving /traffic analysis /

driver assistence:

- road traffic mornitoring: speed, frequency, lane, traffic

light, surrounding vehicles …

- traffic sign detection and recognition

- drivers’ attention, other status (sleepyness, attention,

actions, identification)

ML for activity recognition and e-healthcare

ML for online learning in tracking/surveillance systems

Dynamic:

videos:dynamic background in addition to moving objects

Static:

videos: static background (though background changes in images

due to lighting, camera jitter, occlusion…)

Visual (RGB)

Thermal IR

Near IR

Depth

CamerasCameras can be mounted in different ways:

Cameras types we use:

Page 2: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

2

3. Demo: Examples from Our

Experimental Results

Demo-1: Traffic applications

+ vehicle tracking

+ road traffic analysis

+ traffic sign recognition

Results: Automatic traffic sign detection and recognition

Results of tracking vehicles

static camera

Page 3: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

3

Results: traffic monitoring

Demo-2:

video tracking applications

Tracking under a range of complex scenarios

by using single/multiple cameras

+ non-plannar (out-of-plane) changes

(using single camera video)

+ partial/full occlusions

(using single/multiple camera videos)

Page 4: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

4

Tracking using multi-camera

videos

(PETS 2006, Scenario 7,

3-cameras)

Tracking using multi-camera

videos

(PETS 2006, Scenario 7,

3-cameras)

Tracking using multi-camera

videos

(PETS 2006, Scenario 7,

3-cameras)

Tracking using multi-camera

videos

(TUG dataset, hard scenario,

3-cameras)

+ tracking: human faces (single camera)

+ analyze / classify: eye states

(sleepy, blink, open, close …)

Demo-3: face tracking and analysis

IR:

Page 5: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

5

Detect eye states for early warning: too sleepy to drive?

Demo-4: Point feature-based tracking

+ limb movement

+ video object tracking (with occlusions+intersection)

Tracking limb-movement for analyzing abrupt

movement related to infant neurological dysfunctions

Demo-5:

Identification of human activities …

by fusion, classification from RGB-D videos

Page 6: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

6

Identify human activities from videos

(for healthcare, assisted-living)Chalmers RGB-D video dataset:

Falling down, lying down, eating, drinking, reading, playing laptop,

sitting down, walking …(each activity: 500 videos by 19 subjects)

Identify human activities from images(can be used to identify driver’s status: use cell phone, reading, chatting, eating …)

Page 7: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

1

4. Our recently developed ML methods

4.1. Domain-shift online learning /classification on Riemannian

manifolds

4.2. Enhanced ML from salient object /region detection

4.3. Applications:

Traffic sign recognition

Human fall detection and activity classification

4.1. Domain-Shift learning/classification

Remannian manifold learning of large-size video

objects with out-of-plane pose changes

What is a manifold ?

A set of all low dimensional subspaces in a high dim.

space: {𝑅𝑘} ∈ 𝑅𝑛

Nonlinear, e.g. curved space

May define a set of matrics, or calculus on a manifold

Geometry, topology, and some essential properties of

signals are maintained

Local Eulidean, but not globally

A set of metrics may be defined on manifolds

Differential/smooth manifolds: particularly attractive

e.g. Riemannian, Grassmann

Characterize dynamic signals/object whose statistics evolve in time,

not lying in a single vector space.

=> employ smooth shifting domains to characterize such

dynamic objects

e.g a dynamic object in images with out-of-plane pose changes does not lie in the

same vector space; rather, lies in a set of subspaces (or, on a manifold).

Efficiently represent a signal by a set of low dim. subspaces

e.g a 1D curve embedded in 3D space; “walking” is cyclic on a manifold.

Motivation: manifold learning and classification

Domain-shift characterization: dynamic objects

Object in each image frame

a manifold point

dynamic video object

a curve on the manifold

Riemannian manifolds: some notations

Geodesic: shortest curve on the manifold

Geodesic distance on Riemannian manifold (under log-Euclidean metric)

Matrics: Riemannian metrics are inner products on manifolds

(preserve geodesic distance, symmetric positive)

e.g. Log Eucldean matric, affine-invariant metric

Means:

Riemannian (extrinsic) mean

Exponential mapping

Logarithmic mapping

Mapping functions:

=

(under log Euclidean metric)

(p,q: covariance matrix)

Karcher (intrinsic) mean

domain-shift learning for visual object tracking

method, with main novelties:

Sequential Bayesian online learning and tracking on the manifold;

A dynamic NL model for object appearance on the manifold:

both manifold point and its velocity are included in state vector;

Extend particle filters on the manifold;

Domain-shift online learning with occlusion handling

Main issues:

ML and Classification on Riemannian manifolds.

The method is particularly attractive for tracking large-size

deformable objects with significant out-of-plane pose changes

Page 8: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

2

NL dynamic modeling on the Manifold

state vector:

piecewise geodesic:

constant velocity:

Sequential Bayesian estimation of by employing a

particle filter on the manifold

PF-1 on the manifold for online learning:

Likelihood between new observation and predicted manifold point

MMSE estimate: (expected value of weighted particles)

geodesic:

PF-1 weights:

new observation from the tracking

Posterior reference object

before the occlusion handling

Occusion handling

Main challenge: Ambiguity in changes due to:

- out-of-plane target object ?

- other occluding objects/background clutter ?

Rationale: similarity between the candidate and reference object:

- Occluding object/clutter is generally less similar (to target)

- Target with slightly changed views are more similar

Similarity measure: short geodesic dist. between target and ref. object

Strategy adopted:

perform ref. object learning, only when occlusion is unlikely!

geodesic distance: ref. object at (t-1) and posteriori estimated ref. object at t

Proposed Riemannian manifold online learning

and tracking (learning and tracking in alternation)

Results: with (red) /without (blue) online learning

Results: with/without occlusion handling

Euclidean distance

between 4 corners of

box

Evaluation: average tracking accuracy (ATA)

larger ATA values better performance

Page 9: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

3

4.2. Enhanced ML

from Salient Object /Region Detection

Motivations

Detect attention-grabbing objects/regions from

image scenes

Improve ML by using segmented objects/regions

Object detection and recognition

Video summarization

Content-based image editing

Image retrieval

Object enhancement through regression

….

Applications may be found in:

Original Superpixels

Global contrast

Harris convex hull

Coarse saliency

Geodesic propagation

Merging

Update salient values based on geodesic propagation:

Area of a

soft region

Define: geodesic distance for 𝑹𝒊 ∶

Connectivity measure [0,1] Coarse energy

Shortest

path on the

graph

Method-A: Saliency Detection by Geodesic Propagation

Estimate saliency map by propagating saliency regions from a coarse

map, based on geodesic distance (It is a geodesic-based filtering/regression).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cis

ion

CA

IT

SR

FT

LC

HC

RC

SF

GS

Ours

Global Contrast Convex hull Coarse saliencyGeodesic Saliency

Propagation

Results and performance

Graph nodes are formed from the 2-ring graph topology: green connections (immediate neighbor superpixels) + blue connections (2nd layer neighbor superpixels) + brown connections (boundary superpixels)

Graph 𝑮𝟏 = 𝑽,𝑬,𝑾 𝐟𝐫𝐨𝐦 𝐢𝐦𝐚𝐠𝐞 𝐮𝐬𝐢𝐧𝐠 𝐬𝐮𝐩𝐞𝐫𝐩𝐢𝐱𝐞𝐥s as the basic components

TIP 2015

Method Description:

Superpixel graph

A node belonging to the salient object

A node belonging to the background

Input image

Method-B: Ncut-based saliency detection by

adaptive multi-level region merging Main ideas: Apply Ncut to salient region detection, and induce a saliency map by Ncut eigenvectors for visual clustering.

Define graph edge weight: affinity (dissimilarity)

Superpixel color differences Intervening edge magnitude

Intervening edge magnitude may help delineate object v.s. background!

Object and

background

have similar

colors but

different

textures

E(p):

- Line connects 2 superpixels

- Intensity of edge point on the line

Saliency computation: Ncut + adaptive region mergingNcut generate a partition , that minimizes the cut cost:

where: 𝑎𝑠𝑠𝑜𝑐 𝐴𝑖, 𝑉 = 𝑣𝑚∈𝐴𝑖,𝑣𝑗∈𝑉 𝑤𝑚𝑗

Page 10: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

4

Graph spectral analysis to obtain clustering information

Ncut: Solve from 𝐺1 (generalized eigen-decomposition)

Update the graph edge weight 𝑒𝑖𝑗 in graph-2: 𝐺2 = (𝑉,𝐸, 𝒆):

Remark: eigenvectors: soft indicating vectors for the Ncut, eij: a measure of inter-cluster distance of nodes

Pick up nvec (e.g. nvec=8) eigenvectors with smallest non-zero eigenvalues TIP 2015

Saliency computation by multilevel adaptive merging of graph-2 nodes

1) Merging starts from initial super-pixels 2) At level l, two regions are merged, if

3) At the next level l+1:

4) Continue merging 2)-3) until convergence5) Final saliency map: sum of saliency-maps in all levels + smoothing 𝒇 = 𝑫− 𝛽𝑾 −1𝒔

≤ 𝑇ℎ

where:

Reconstructed graph edges from Ncut

Cluster information gradually discovered

in 𝐺2

𝐺1

Evaluation: precision-recall curve + F-measure + MAE

Results and performance

Results: quantitative evaluation

Method-C: Manifold Diffusion-based Saliency

Detection by Adaptive Graph Weight Construction

Problems: Existing methods use fixed Gaussian bandwidths to measure graph affinity.

Do not always reach optimum for images with different FG/BG contrast

Diffusion

matrix

Diffused saliency

values

Seed vector

Diffusion matrix A* highly depends on the graph edge weights:

Saliency seeds Saliency map

High

contrastlow

contrast

Apply MPD to saliency detection in a 2-stage manner

Input image Superpixel graph

Manifold smoothness

Manifold reconstruction

Two-stage detection scheme

Saliency map

vi

vj

vi

vj

MPD

MPDS

2 graphs (G1: node smoothness, G2: local linear embedding LLE)

Page 11: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

5

Proposed diffusion MPD : manifold assumptions + adaptive weight construction

1. Compute: weights for edges (affinity matrix W); and reconstruction matrix A: min error bylocal linear embedding (LLE)

2. Diffusion energy is formulated as: (given G=(V,E), and adaptively estimated W and A)

(not nb’s)

3. Generate saliency map by: using BG seeds, FG seeds, Harris convex hull, and apply MPD.

PerformanceComparison with Yang’13 (fixed-bandwidth diffusion)

Quantitative evaluation: precision-recall curve + MAE + F-measure

Resultsfixed-bandwidth

diffusion

4.3. Application:

(a) Enhanced ML for automatic detection and

recognition of saliency-segmented traffic signs

Aims: automatic traffic sign recognition (ATR)

Applications:

Advanced driver assistance systems

Intelligent autonomous driving

Road/highway maintenance

Sign inventory

Recognizing traffic signs captured from street-view images/videos.

Enhance performance through salient object detection and classification

Google self-

driving car

Recognition of signs from street-view images

What are street-view images/videos?

Street images are captured by multiple cameras mounted on the top of a moving vehicle.

Images captured in different orientations are stitched to generate a 360 degree full street scene.

Publically available online street-view data:

Google (covers many countries in the world)

Tencent (covers main cities in China)

A Google

street-view car

Appearance distortions of signs due to, e.g.: lighting, view angle changes, image compression, scale changes, occlusion, motion blur …

Background noise, e.g., advertisement, logos, dirt/clutter, partial occlusion …

Similarity within and across sign categories

Main challenges

Page 12: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

6

Saliency-enhanced coarse-to-fine learning/classification

Category detector #1

Training

samples

Category detector #2

Learning

Training

samples

Learning

Category detector #3

Training

samples

Learning

No

n-m

axim

um

su

pp

ressio

n a

mo

ng

ca

teg

ori

es

Sliding

windows

Sliding

windows

Coarse classification

Sample street view image #1(960*640)

Sample street view image #2 (960*640)

Detected signs for image #1

Detected signs for image #2

a) Coarse-step: detect sign categories Sliding window candidate detection

Integral channel features (Dollar’09) + discrete AdaBoost

Non-maximum suppression across categories

Robust

segmentation

of salient

sign regions

(ROIs)

Fin

e c

las

sif

icati

on

in

cate

go

ries

Feature extraction and

dimension reduction

Sign recognition of image #1

Sign recognition of image #2

b) Fine-step: Saliency-based segmentation and classification of

sign classes within each category

Study traffic signs in 3 categories

Performance:

Experiments and Results

b) Classification of signs within each category

a) Classification of sign categories

C is within-category

confusion matrix

Average classification rate

(overall classification

accuracy):

Performance: Classification of 11 classes within “indication”

category on signs (894/905) from the testset (3237 images)

1 2 3 4 5 6 7 8 9 10 11 121 'min100' 42 1 0 0 0 0 0 0 0 0 0 02 'min110' 3 21 0 2 0 1 0 0 0 0 0 03 'min50' 0 0 0 3 0 0 0 0 0 0 0 04 'min60' 1 0 0 524 0 0 0 0 1 0 0 15 'min70' 0 0 0 0 7 2 6 0 0 0 0 16 'min80' 1 0 0 16 0 83 1 0 0 0 0 07 'min90' 1 1 0 5 0 1 77 0 0 0 0 08 'must-horn' 0 0 0 0 0 0 0 0 0 0 0 09 'must-left' 0 0 0 0 0 0 0 0 7 0 0 110 'must-right' 0 0 0 0 0 0 0 0 0 27 0 011 'must-straight' 0 0 0 0 0 0 0 0 0 0 1 012 'unknown' 0 0 0 4 0 0 0 0 0 0 0 7

Act

ual

Cla

sses

Classified classes

Average

precision

= 0.917958

Average recall

= 0.842536

Total No signs

(with/without

Unknown

class)

= 894/905

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181 'unknown' 105 5 2 1 8 0 1 0 4 2 2 1 4 3 14 6 1 02 'warn-accident' 0 17 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 03 'warn-construct' 0 0 37 0 0 0 0 0 0 0 0 0 0 0 1 0 0 14 'warn-cross' 0 0 0 6 0 0 0 0 0 1 0 0 0 0 0 0 0 15 'warn-danger' 0 0 0 0 4 0 1 0 0 0 0 0 0 0 0 0 0 06 'warn-go-right' 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 07 'warn-human' 0 0 1 0 1 0 7 0 0 0 0 0 0 0 0 0 0 08 'warn-kids' 10 0 0 0 0 0 0 13 0 0 0 0 0 1 0 2 0 09 'warn-left-T' 2 0 0 0 0 0 2 0 7 0 0 0 0 0 0 0 0 010 'warn-left-turn' 0 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 011 'warn-narrow' 1 0 0 0 0 0 0 0 0 0 59 0 5 0 0 0 0 012 'warn-railway' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 013 'warn-right-Lane' 0 0 0 0 2 0 0 0 0 1 3 0 247 0 0 0 0 014 'warn-right-T' 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 015 'warn-right-turn' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25 0 0 016 'warn-slow' 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 17 0 017 'warn-tunnel' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 018 'warn-zzz' 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 11

Performance: Classification of 33 classes within “Warning”

category on signs (536/698) from the testset (3237 images)

Act

ual

Cla

sses

Classified classes

Average

precision

=0.754189

Average recall

= 0.849863

Total No signs

(with/without

Unknown

class)

= 536/698

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 341 'SpLim:10' 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 02 'SpLim:100' 0 573 0 5 0 0 0 0 3 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 33 'SpLim:110' 0 0 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 04 'SpLim:120' 0 6 0 454 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 'SpLim:20' 0 0 0 12 47 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 'SpLim:30' 0 0 0 1 0 38 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27 'SpLim:40' 0 0 0 1 0 0 315 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 08 'SpLim:50' 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 09 'SpLim:60' 0 1 0 1 0 0 0 0 324 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 010 'SpLim:70' 0 0 0 0 0 0 1 0 0 60 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 011 'SpLim:80' 0 2 1 1 0 0 0 0 1 0 337 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 212 'SpLim:90' 0 2 0 0 0 0 1 0 2 0 1 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 013 'SpLm:5' 0 0 0 0 0 0 0 0 0 0 0 0 54 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 114 'combination' 0 0 0 0 0 1 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 215 'enable-overtake' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 016 'no-U-turn' 0 0 0 6 0 0 0 0 0 0 0 0 0 1 0 522 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 517 'no-bike' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 218 'no-bus' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 219 'no-car' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 120 'no-entry' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0 0 0 0 0 0 0 0 0 0 0 0 0 021 'no-explosive' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 022 'no-horn' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 1 0 0 0 0 0 0 0 323 'no-left-turn' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 224 'no-motor-bike' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 125 'no-overtake' 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 60 0 1 0 0 0 0 0 0 026 'no-parking' 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 164 13 0 0 0 0 0 0 427 'no-pass' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 18 0 0 0 0 0 0 228 'no-phone' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 029 'no-right-turn' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 030 'no-stopping' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 031 'no-tractor' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 032 'no-truck' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 1 0 0 0 0 52 0 133 'no-walking' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 434 'unknown' 0 2 0 3 1 1 1 1 0 2 4 1 0 1 0 6 0 2 0 0 0 0 0 0 0 0 1 0 2 1 0 0 0 25

Performance: Classification of 33 classes within “Prohibitary”

category on signs (3494/3549) from the testset (3237 images)

Classified classes

Act

ual

Cla

sses

Average

precision

=0.916468

Average

recall

= 0.885022

Total No

signs

(with/without

Unknown

class)

= 3494 / 3549

Page 13: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

7

Comparison: German traffic sign recognition benchmark

(preliminary: without optimizing parameters in our training)

http://benchmark.ini.rub.de/?section=gtsrb&subsection=results

Team Method All Signs

[3] INSIA committee of CNNs 99,46%

[1] INI-RTCV Human performance 98.84%

[4] Sermanet Mult-scale CNNs 98,31%

[2] CAOR Random forests 96,14%

Ours Saliency-enhanced 95,80 %

[6] INI-RTCV LDA on HOG 2 95,68%

[5] INI-RTCV LDA on HOG 1 93,18%

[7] INI-RTCV LDA on HOG 3 92,34%

4.3. Application:

(b) ML for privacy-preserving fall detection and

activity classification using RGB-D videos

Addressed Problem

Fall detection and activity recognition from RGB-D videos

Privacy preserving: using low-resolution video, or depth video

only

Applications:

Automatically detect falls and trigger alarms

Detect falls using a single camera view

Exploit spatial-temporal features of shape, pose + appearance

Privacy-preserving issue

Healthcare, assisted living …

Motivations

Main focusing issues: Effective spatio-temporal features:

Global shape + motion from RGB videos

Local shape + motion from Depth videos

Combine different features for fall detection

(classify fall vs. lie-down)

Study the contribution of individual component feature to

overall performance

Exploit low-resolution RGB-D videos for privacy preserving

fall detection

Riemannian manifold classify of a list of activities (onging)

The big picture (ongoing work)

Manifold-based

video activity

classification

Normalize ROI size: maintain object aspect ratio (filling BG)

(w, h) ⇒ max(w, h) ⇒ (λ, λ) Appearance feature is represented by HOG

Motion feature is represented by HOG-OF

RGB videos: motion and appearance features

Page 14: Machine Learning Methods and Their Acknowledgement ... · 1 Machine Learning Methods and Their Applications to Autonomous Driving and e-Healthcare using Video Sensor Data Prof. Irene

8

Depth videos: dynamic shape/shape features

Shape dynamic features

Shape features Time-Dependent Features

Time-dependent feature matrix:

Conclusion

ML methods: several of our recent ML methods (smooth

manifold-based ML, saliency-enhanced ML) are presented

ML applications: several applications (traffic sign

recognition, activity classification, visual object tracking) are

presented.

RGB-D video data from camera sensors is shown to contain

important information, and is useful for ML

More ML research attentions should be put on image/video

analysis techniques for ML.