Sparsity Control for Robustness and Social Data Analysis
description
Transcript of Sparsity Control for Robustness and Social Data Analysis
1
Sparsity Control for Robustness and Social Data Analysis
Gonzalo MateosECE Department, University of Minnesota
Acknowledgments: Profs. Georgios B. Giannakis, M. Kaveh G. Sapiro, N. Sidiropoulos, and N. Waller MURI (AFOSR FA9550-10-1-0567) grantMinneapolis, MN
December 9, 2011
22
Learning from “Big Data” `Data are widely available, what is scarce is the ability to extract wisdom from them’
Hal Varian, Google’s chief economist
BIG Fast
Productive
Revealing
Ubiquitous
SmartK. Cukier, ``Harnessing the data deluge,'' Nov. 2011. Messy
33
Social-Computational Systems
The means: leverage dual role of sparsity Complexity control through variable selection Robustness to outliers
Complex systems of people and computers
The vision: preference measurement (PM), analysis, management Understand and engineer SoCS
44
Conjoint analysis Marketing, healthcare, psychology [Green-Srinivasan‘78]
Success story [Wind et al’89]
Attributes: room size, TV options, restaurant, transportation
Goal: learn consumer’s utility function from preference data Linear utilities: `How much is each part worth?’
Optimal design and positioning of new products Strategy: describe products by a set of attributes, `parts’
55
Modeling preliminaries Respondents (e.g., consumers)
Rate profiles Each comprises attributes
Linear utility: estimate vector of partworths
Conjoint data collection formats
(M1) Metric ratings:
(M2) Choice-based conjoint data:
Online SoCS-based preference data exponentially increases Inconsistent/corrupted/irrelevant data Outliers
6
residuals discarded
6
Robustifying PM Least-trimmed squares [Rousseeuw’87]
(LTS)
Q: How should we go about minimizing nonconvex (LTS)?A: Try all subsets of size , solve, and pick the best
is the -th order statistic among
G. Mateos, V. Kekatos, and G. B. Giannakis, ``Exploiting sparsity in model residuals for robust conjoint analysis,'' Marketing Sci., Dec. 2011 (submitted).
Simple but intractable beyond small problems Near optimal solvers [Rousseeuw’06], RANSAC [Fischler-Bolles’81]
77
Modeling outliers Outlier variables s.t. outlier
otherwise
Both and unknown, typically sparse!
Natural (but intractable) nonconvex estimator
Nominal ratings obey (M1); outliers something else -contamination [Fuchs’99], Bayesian model [Jin-Rao’10]
88
LTS as sparse regression Lagrangian form
Tuning parameter controls sparsity in number of outliers
(P0)
Formally justifies the preference model and its estimator (P0) Ties sparse regression with robust estimation
Proposition 1: If solves (P0) with chosen s.t. , then in (LTS).
99
Just relax!
(P1)
(P1) convex, and thus efficiently solved Role of sparsity-controlling is central
Q: Does (P1) yield robust estimates ?A: Yap! Huber estimator is a special case
where
(P0) is NP-hard relax e.g., [Tropp’06]
1010
Lassoing outliers Suffices to solve Lasso [Tibshirani’94]
Data-driven methods to select Lasso solvers return entire robustification path (RP)
Proposition 2: ,Minimizers of (P1) are
Coeff
s.
Decreasing
1111
Nonconvex regularization Nonconvex penalty terms approximate better in (P0)
Options: SCAD [Fan-Li’01], or sum-of-logs [Candes et al’08]
Iterative linearization-minimization of around
Initialize with , use Bias reduction (cf. adaptive Lasso [Zou’06])
1212
Comparison with RANSAC , i.i.d.
Nominal:
Outliers:
1313
Nonparametric regression
If one trusts data more than any parametric model Go nonparametric regression: lives in a space of “smooth’’ functions
Ill-posed problem Workaround: regularization [Tikhonov’77], [Wahba’90] RKHS with kernel and norm
Interactions among attributes? Not captured by Driven by complex mechanisms hard to model
1414
Function approximationTrue function Nonrobust predictions
Robust predictions Refined predictions
Effectiveness in rejecting outliers is apparentG. Mateos and G. B. Giannakis, ``Robust nonparametric regression via sparsity control with application to load curve data cleansing,'' IEEE Trans. Signal Process., 2012
1515
Load curve data cleansing Load curve: electric power consumption recorded periodically
Reliable data: key to realize smart grid vision [Hauser’09]
Uruguay’s power consumption (MW)
Faulty meters, communication errors Unscheduled maintenance, strikes, sport events
B-splines for load curve prediction and denoising [Chen et al ’10]
1616
NorthWrite data
Data: courtesy of NorthWrite Energy Group, provided by Prof. V. Cherkassky
Outliers: “Building operational transition shoulder periods” No manual labeling of outliers [Chen et al’10]
Energy consumption of a government building (’05-’10) Robust smoothing spline estimator, hours
1717
Principal Component Analysis
Our goal: robustify PCA by controlling outlier sparsity
Motivation: (statistical) learning from high-dimensional data
Principal component analysis (PCA) [Pearson’1901] Extraction of low-dimensional data structure Data compression and reconstruction PCA is non-robust to outliers [Jolliffe’86]
DNA microarray Traffic surveillance
1818
Our work in context
Robust PCA Robust covariance matrix estimators [Campbell’80], [Huber’81] Computer vision [Xu-Yuille’95], [De la Torre-Black’03] Low-rank matrix recovery from sparse errors, e.g., [Wright et al’09]
Contemporary applications tied to SoCS Anomaly detection in IP networks [Huang et al’07], [Kim et al’09] Video surveillance, e.g., [Oliver et al’99] Matrix completion for collaborative filtering, e.g., [Candes et al’09]
1919
PCA formulations Training data
Minimum reconstruction error Compression operator Reconstruction operator
Maximum variance
Component analysis model
Solution:
2020
Robustifying PCA Outlier-aware model
G. Mateos and G. B. Giannakis , ``Robust PCA as bilinear decomposition with outlier sparsity regularization,'' IEEE Trans. Signal Process., Nov. 2011 (submitted).
Interpret: blind preference model with latent profiles
(P2)
-norm counterpart tied to (LTS PCA) (P2) subsumes optimal (vector) Huber -norm regularization for entry-wise outliers
2121
Alternating minimization(P2)
update: SVD of outlier-compensated data update: row-wise vector soft-thresholding
Proposition 3: Alg. 1’s iterates converge to a stationary point of (P2).
1
2222
Video surveillance
Data: http://www.cs.cmu.edu/~ftorre/
Original PCA Robust PCA `Outliers’
2323
Big Five personality factors Five dimensions of personality traits [Goldberg’93][Costa-McRae’92]
Measure the Big Five Short-questionnaire (44 items) Rate 1-5, e.g.,
`I see myself as someone who……is talkative’…is full of energy’
Big Five Inventory (BFI)
Handbook of personality: Theory and research, O. P. John, R. W. Robins, and L. A. Pervin, Eds. New York, NY: Guilford Press, 2008.
Discovered through factor analysis WEIRD subjects
24
BFI data
24
Robust PCA identifies 8 outlying subjects Validated via `inconsistency’ scores, e.g., VRIN [Tellegen’88]
Eugene-Springfield community sample [Goldberg’08] subjects, item responses, factors
Data: courtesy of Prof. L. Goldberg, provided by Prof. N. Waller
2525
Online robust PCA Motivation: Real-time data and memory limitations Exponentially-weighted robust PCA
At time , do not re-estimate
2626
Online PCA in action
Outliers:
Nominal:
2727
Robust kernel PCA Kernel (K)PCA [Scholkopf ‘97]
Challenge: -dimensionalKernel trick:
Input space
Feature space
Related to spectral clustering
2828
Unveiling communities
Data: http://www-personal.umich.edu/~mejn/netdata/
Network: NCAA football teams (nodes), F’00 games (edges) teams, kernel
Identified exactly: Big 10, Big 12, ACC, SEC, Big East Outliers: Independent teams
ARI=0.8967
29
Spectrum cartography
Goal: find s.t. is the spectrum at position
Approach:Basis expansion model for , nonparametric basis pursuit
Idea: collaborate to form a spatial map of the spectrum
SPECTRUM MAP
Original Estimated
J. A. Bazerque, G. Mateos, and G. B. Giannakis, ``Group-Lasso on splines for spectrum cartography,'' IEEE Trans. Signal Process., Oct. 2011.
30
Technical Approaches: Consensus-based in-network operation in ad hoc WSNs Distributed optimization using alternating-direction methods Online learning of statistics using stochastic approximation Performance analysis via stochastic averaging
0 100 200 300 400 500 600 700 80010
-3
10-2
10-1
100
101
102
Time t
Lear
ning
Cur
ve
Jmin
Centralized-LMS
D-LMSD-LMS w/ noisy links
Local-LMS
Diffusion LMS
Distributed adaptive algorithms
Issues and Significance: Fast varying (non-)stationary processes Unavailability of statistical information Online incorporation of sensor data Noisy communication links
Improved learning through cooperation
G. Mateos, I. D. Schizas, and G. B. Giannakis, ``Distributed recursive least-squares for consensus-based in-network adaptive estimation,'‘IEEE Trans. Signal Process., Nov. 2009.
Wireless sensor
31
Unveiling network anomalies
Anomalies across flows and timeEnhanced detection capabilities
Approach: Flag anomalies across flows and time via sparsity and low rank
Payoff: Ensure high performance, QoS, and security in IP networks
M. Mardani, G. Mateos, and G. B. Giannakis, ``Unveiling network anomalies across flows and time via sparsity and low rank,'' IEEE Trans. Inf. Theory, Dec 2011 (submitted).
32
OUTLIER-RESILIENT ESTIMATION
SIGNAL PROCESSING
LASSO
32
Concluding summary Research issues addressed
Sparsity control for robust metric and choice-based PM Kernel-based nonparametric utility estimation Robust (kernel) principal component analysis Scalable distributed real-time implementations
Control sparsity in model residuals for robust learning
Application domains Preference measurement and conjoint analysis Psychometrics, personality assessment Video surveillance Social and power networks
Experimental validation with GPIPP personality ratings (~6M)
Gosling-Potter Internet Personality Project (GPIPP) - http://www.outofservice.com