Feature Extraction for lifelog management
September 25, 2008
Sung-Bae Cho
1
• Feature Extraction
– Temporal Feature Extraction
– Spatial Feature Extraction
• Feature Extraction Example
– Tracking
• Summary & Review
Agenda
Feature Extraction: Motivation
• Data compression: Efficient storage
• Data characterization
– Data understanding: analysis
• Discovering data characteristics
– Clustering: unknown labels
– Classification: known labels
– Pre-processing for further analysis
• Tracking
• Visualization: reduction of visual clutter
• Comparison
• Search: large collections of data sets
• Database management: efficient retrieval
– Data characterization
• Data simulation: synthesis
• Modeling data
• Model selection
• Model parameter estimation
• Prediction
• Feature forecast
• Raw data forecast
3
Features
• Features are confusable
• Regions of overlap represent the classification error
• Error rates can be computed with knowledge of the joint probability distributions
• Context can be used to reduce overlap
• In real problems, features are confusable and represent actual variation in the data
• The traditional role of the signal processing engineer has been to develop better features
4
An Example (1)
• Problem: Sorting fish
– Incoming fish are sorted according to species using optical sensing (sea bass or salmon?)
• Problem Analysis:
– Set up sensors and take some sample images to extract features
– Consider features
• Length
• Lightness
• Width
• Number and shape of fins
• Position of mouth
• …
5
An Example (2)
• Length is a poor discriminator
• We can select the lightness feature
• We can also combine features
• Lightness is a better feature than length because it reduces the misclassification error
6
Feature: Definition
• Feature or attribute: Usually physical measurement or category associated with spatial location and temporal instance
– Continuous, e.g., elevation
– Categorical, e.g., forest label
• Every domain has a different definition for features, regions of interest, or objects
• A feature is a cluster or a boundary/region of points that satisfy a set of pre-defined criteria
– The criteria can be based on any quantities, such as shape, time, similarity, orientation, and spatial distribution
7
Feature Categories (1)
• Statistical features
– Density distribution of spatially distributed measurements
• e.g., nests of eagles and hawks, tree types
– Statistical central moments per region computed from raster measurements over region definitions
• e.g., average elevation of counties
• Temporal features
– Temporal rate of spatial propagation
• e.g., AIDS spreading from large cities
– Seasonal spatially-local changes
• e.g., precipitation changes
8
Feature Categories (2)
• Geometrical features
– Distance, e.g., Optical Character Recognition (OCR)
– Circular, e.g., SAR scattering centers
– Arcs, e.g., semiconductor wafers
– Linear, e.g., roads in aerial photography
– Curve-linear, e.g., isocontours in DEM
– Complex, e.g., map symbols & annotations
• Spectral features
– Areas with a defined spectral structure (morphology)
• Areas with homogeneous measurements (color, texture)
9
Feature Extraction
• Feature extraction
– Transforming the input data into the set of features still describing the data with sufficient accuracy
– In pattern recognition and image processing, feature extraction is a special form of dimensionality reduction
• Why?
– When the input data to an algorithm is too large to be processed and it is suspected to be redundant (much data, but not much information)
– Analysis with a large number of variables generally requires a large amount of memory and computation power or a classification algorithm which overfits the training sample and generalizes poorly to new samples
Need to transform input into a reduced representation set of features
10
Goal of Feature Extraction
• Transform measurements from one space into another space in order to (a) compress data or (b) characterize data
• Examples:
– Data compression:
• Noise removal: filtering
• Data representation: raster vector
• Information redundancy removal: multiple band de-correlation
– Data characterization:
• Similarity and dissimilarity analysis
• Statistical, geometrical and spectral analysis
11
Feature Extraction Methods
• Dimensionality reduction techniques
– Principal components analysis (PCA): A vector space transform used to reduce multidimensional data sets to lower dimensions for analysis
– Multifactor dimensionality reduction (MDR): Detecting and characterizing combinations of attributes that interact to influence a dependent or class variable
– Nonlinear dimensionality reduction: To assume the data of interest lies on an embedded non-linear manifold within the higher dimensional space
– Isomap: Computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points
• Latent semantic analysis (LSA): Analyzing relationships between a set of documents and terms by producing a set of concepts related to them
• Partial least squares (PLS-regression): Finding a linear model describing some predicted variables in terms of other observable variables
• Feature Selection Methods: feature selection is a kind of feature extraction
12
Feature Selection Methods
• Search approaches
– Exhaustive
– Best first
– Simulated annealing
– Genetic algorithm
– Greedy forward selection
– Greedy backward elimination
• Filter metrics
– Correlation
– Mutual information
– Entropy
– Inter-class distance
– Probabilistic distance
13
Spatial Feature Extraction Example
• Distance features
• Mutual point distance features
• Density features
• Orientation features
14
Temporal Feature Extraction Example
• Temporal features from point data
– Deformation changes over time
• Extracted features: Horizontal, Vertical, Diagonal
• Temporal features from raster data
– Precipitation changes over time
• Example: Image subtraction to obtain features that can be clustered
15
Feature Extraction Applications
• Activity recognition
• Place tracking
• Face recognition
• Remote sensing
• Bioinformatics
• Structural engineering
• Robotics
• Biometrics
• GIS (Geographic information system)
• Semiconductor defect analysis
• Earthquake engineering
• Plant biology
• Medicine
• Sensing
• …
16
• Feature Extraction
– Temporal Feature Extraction
– Spatial Feature Extraction
• Feature Extraction Example
– Tracking
• Summary & Review
Agenda
Tracking
• A well-known research area using temporal feature extraction method
• Observing persons or objects on the move and supplying a timely ordered sequence of respective location data to a model
– e.g., Capable to serve for depicting the motion on a display capability
• Finding the location of an object of the scene on each frame of the sequence, when processing a video sequence
• Tracking example
– Human/objects tracking: e.g., GPS sensor based car position tracking
– Tracking a part of human: e.g., Accelerometer based hand/leg movement tracking
– Eye tracking: analyzing eye image
– Object tracking in camera
18
An Example of Tracking
• Tracking of human behavior
– Recognize behaviors acting on Cricket game
– Reference:
• M. Ko, G. West, S. Venkatesh, and M. Kumar, Using dynamic time warping for online temporal fusion in multisensor systems, Information Fusion, 2007
• Used tracking method
– DTW (dynamic time warping)
• An algorithm for measuring similarity between two sequences which may vary in time or speed
– e.g., Automatic speech recognition coping with different speaking speeds
• Any data which can be turned into a linear representation can be analyzed with DTW
19
Motivation
• We need a method for temporal fusion between raw data or feature data
– Fusion level: Raw, Feature, Decision level
• Requirements for temporal fusion method of multi sensors
– Variable type: multi dimension, time, discrete, continuous sensor
– Variable length of data
• Proposition: Multi-sensor fusion using DTW
– Expanding DTW algorithm
• Considering end-point
• Supporting fusion of diverse heterogeneous sensory data
20
Used Sensor Data
• Sensor: ADXL202 sensor: 3-axis, ±2g, 150Hz accelerometer
– 2 sensors for each wrist
– 6 channel data
• Data
– 4 Human subjects & 65 (20 + 15 * 3) samples
– 12 gestures in Cricket game: Cancel call, dead ball, last hour, …
212121
Behavior System Structure based on DTW
• Sliding window: Transmit a specified size of data units
• Data pre-processing: Convert raw data into test template
• DTW recognizer: Measure similarity between test & class template
• Decision module: Select a behavior of best matching template
22
Preprocessing
• Input data
– online : streaming sensor values
– offline : segmented sensor values
• Preprocessing methods
– Signal filter: noise & outlier elimination
– Normalization
• Preprocessing for temporal data
– Sliding window
– End point detection based on DTW
23
• Minimum warping path:
– NF : Normalization factor
• Distance table (D):
Dynamic Time Warping (1)
2424
2
5 5
2
3 Input sample
0
3
6
0
6
1
ClassTemplate
1
2 2
12
1
24
Dynamic Time Warping (2)
• Local distance:
– : Class template with length I
– : Test template with length J
– d(I , j) : distance between class & test templates
• Warping path(W) definition
– i(q) ∈{1 ,…, I) , j(q) ∈{1 ,…, J)
– Constraints
• Continuity
• End-point
• Monotonicity
25
Class Template Selection
• Class template selection method
– Random selection
– Normal selection
– Minimum selection
– Average selection
– Multiple selection
– Random, minimum, multiple selection
• End region
– Band-DP( E = E2-E1)
• Rejection threshold
26
Distance Measurement
• Distance calculation in DTW
– Extended Euclidian distance
– Cosine correlation coefficient
– where
• Multi sequence of class template : C( I x V )
• Multi sequence of test template : T( I x V )
• V : num. of variables
• WV : positive weight vector
272727
Decision Module
• Nearest neighbor algorithm
– Normal, minimum, average selection
– where
• N : no. of class templates, 1 <= n <= N
• Cn : class template, Dn : distance table
• Method: kNN
– Multiple selection : Cn,m
– M : no. of selected class template, K : 1 <= k <= M
282828
Experimental Setup
• Environments
– Pentium 4, 3.2G, 1G RAM, Window XP
• Comparison
– HMM
• Experiments
– Off-line temporal fusion
– On-line temporal fusion
– Sensor based
• Gesture recognition based on accelerometer
• Scenario recognition based on diverse sensor
29
Experiment: Sensor Data
• W : sliding window size, O : overlap size, F : features303030
Experiment: : Results (DTW vs. HMM)(DTW vs. HMM)
• Performance of DTW was better
– Raw data: Data in – decision out
– Filtered data: Feature in – decision out
Data HMM DTW
Raw data 85.7~86.5% 97.9%
Filtered data 87.8~88.1% 92.5~96.4%
W≠50, O≠30 73.9~78.8% 96~98%
31
Experiment: : Results (Online) (1)(Online) (1)
• Class template selection methods comparison
• Min-1 : Minimum selection, Min-4 : Minimum + multiple selection
• RD-1 : Random selection, RD-4 : Random + multiple selection
K : param. For kNNNF : Normalization factor
32
Experiment: : Results (Online) (2)(Online) (2)
• Gesture recognition
– 12 gestures
– Minimum distance comparison between sample & class
33
Experiment 2: Setup
• Multiple sensor fusion
• Sensors
– 3-axis Accelerometer
– Light
– Temperature
– Humidity
– Microphone
– …
• Data: J.Mantyjarvi et al, 2004
– 5 scenario, 5 times
• 1 ~ 5 min.
– 32 sensor data
– 46,045 samples
343434
Experiment 2: 2: Results (Offline)(Offline)
• DTW classification rate
• HMM classification rate
– With randomly selected training data
• T1:20 samples, 75.1~88.1%
• T2: minimal selection, 72.5~78%
35
Experiment 2: 2: Results (Online)(Online)
• Classification rate
36
• Feature Extraction
– Temporal Feature Extraction
– Spatial Feature Extraction
• Feature Extraction Example
– Tracking
• Summary & Review
Agenda
Summary
• Feature extraction
– Data sources
– Feature categories
– Applications
• Review
– Why is feature extraction important?
– How would you extract important features from data?
– What features would you recommend for tracking from sensor data?
38
Further Information
• Feature Selection for Knowledge Discovery and Data Mining (Book)
• An Introduction to Variable and Feature Selection (Survey)
• Toward integrating feature selection algorithms for classification and clustering (Survey)
• JMLR Special Issue on Variable and Feature Selection: Link
• Searching for Interacting Features: Link
• Feature Subset Selection Bias for Classification Learning: Link
• M. Hall 1999, Correlation-based Feature Selection for Machine Learning: Link
• Peng, H.C., Long, F., and Ding, C., "Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp.1226-1238, 2005.: Link
39
Top Related