A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal...
Transcript of A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal...
![Page 1: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/1.jpg)
A Generalized Linear Model for PrincipalComponent Analysis of Binary Data
January 6, 2003
Andrew I. Schein, Lyle H. Ungar and Lawrence K. Saul.Department of Computer and Information ScienceThe University Of Pennsylvania. Philadelphia, PA.
Ninth International Workshop on AI and Statistics
![Page 2: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/2.jpg)
PCA and LPCA
Principal Component Analysis is commonly called PCA.
PCA is a widely-used dimensionality reduction technique.
PCA handles real-valued data through a Gaussian assumption.
Today we will explore a different assumption for binary data.
Logistic PCA is to (Linear) PCAas
Logistic Regression is to Linear Regression
Schein et al. AI & Statistics 2003, p.1
![Page 3: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/3.jpg)
Talk Outline
1. Linear PCA of Real-Valued Data (review)
2. Logistic PCA of Binary Data
Our Contributions Consist of:
3. Model Fitting by Alternating Least Squares
4. Experimental Results on 4 Natural Data Sets
Schein et al. AI & Statistics 2003, p.2
![Page 4: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/4.jpg)
Multivariate Binary Data
People
Mov
ies
Person/Movie Co−Occurrence: Who Rated What
200 400 600 800
200
400
600
800
1000
1200
1400
1600
Schein et al. AI & Statistics 2003, p.3
![Page 5: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/5.jpg)
Visualizing PCA
−1 −.5 0 .5 1−1
−.5
0
.5
1
X Axis
Y A
xis
Schein et al. AI & Statistics 2003, p.4
![Page 6: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/6.jpg)
Applications of PCA
• Noise Removal
• Dimensionality Reduction
• Data Compression
• Visualization
• Exploratory Data Analysis
• Feature Extraction
Schein et al. AI & Statistics 2003, p.5
![Page 7: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/7.jpg)
PCA as Least-Squares Decomposition
Error(V L, U) =N
∑
n=1
||Xn − UnV L||2
X = The Data: N × D
U = Latent Coordinates: N × L
V = Orthogonal Latent Axes: L × D
L << D, L is the dimensionality of the latent space.
Schein et al. AI & Statistics 2003, p.6
![Page 8: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/8.jpg)
Gaussian Interpretation of PCA
When σ is known there is an equivalent model:
Xnd ∼ N ((UV L)nd, σ2)
Maximum Likelihood Objective = Least Squares Loss
So PCA assumes a Gaussian distribution on X.
Schein et al. AI & Statistics 2003, p.7
![Page 9: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/9.jpg)
Generalized Principal Component Analysis (GPCA)
Collins et al. (2001) propose a generalized scheme for PCA.
Define a constrained decomposition of the natural parameter
Θnd = (UV )nd, dim(U) = N × L, dim(V ) = L × D.
N = Number of ObservationsD = Dimensionality of DataL = Dimensionality of Latent Space
L(V,U) = −∑
n
∑
d
log P(Xnd|Θnd)
Insert your favorite exponential family distribution to instantiate P.
Schein et al. AI & Statistics 2003, p.8
![Page 10: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/10.jpg)
The Logistic Function
σ(θ) =1
1 + exp(−θ)
−4 −3 −2 −1 0 1 2 3 40
0.2
0.4
0.6
0.8
1
θ
σ(θ)
Schein et al. AI & Statistics 2003, p.9
![Page 11: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/11.jpg)
Logistic PCA Model
Inserting Bernoulli Distribution We Get Log-Likelihood:
L =∑
n,d
[Xnd log σ(Θnd) + (1 − Xnd) log σ(−Θnd)]
subject to constraint
Θnd =∑
l
UnlVld
N = Number of ObservationsD = Dimensionality of DataL = Dimensionality of Latent Space
Schein et al. AI & Statistics 2003, p.10
![Page 12: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/12.jpg)
How to Fit LPCA?
Collins et al. (2001) propose a general strategy for fitting GPCA.
Applying this framework to the LPCA case looks hard if not intractable.
We take the approach of fitting LPCA through specialized strategies.
Our methods exploit bounds on the logistic function.
Schein et al. AI & Statistics 2003, p.11
![Page 13: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/13.jpg)
Defining the Auxiliary Function for LPCA
A useful fact:
log σ(θ) = − log 2 + θ/2 − log cosh(θ/2)
We exploit a bound:
log cosh(θ/2) ≤ log cosh(θ0/2) + (θ2 − θ2
0)
[
tanh(θ0/2)
4θ0
]
[Jaakkola and Jordan, 1997. Tipping, 1999.]
The bound is concave and quadratic in the parameter θ.
Schein et al. AI & Statistics 2003, p.12
![Page 14: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/14.jpg)
Visualizing the Approximation of σ
−4 −2 0 2 40
0.5
1
θ
σ(θ)
σ(θ)
approximation θ0 = 2
Schein et al. AI & Statistics 2003, p.13
![Page 15: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/15.jpg)
Model Fitting by Alternating Least Squares
We develop a model fitting strategy that alternates between two steps:
• Fix V , find the least squares solution for U rows.
• Fix U , find the least squares solution for V columns.
Each iteration guarantees an improvement in log-likelihood.
The likelihood structure: global vs. local maxima is unknown.
Schein et al. AI & Statistics 2003, p.14
![Page 16: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/16.jpg)
A Related Use of the Bound
Normal Linear Factor Analysis (NLFA) is a generative cousin of PCA.
[Tipping, 1999], uses the bound to fit a logit/normit factor analysis.
Logit/normit factor analysis is a type of factor analysis for binary data.
LPCA is binary PCAwhile
logit/normit factor analysis is binary factor analysis
We follow Tipping’s factor analysis strategy in fitting LPCA.
Schein et al. AI & Statistics 2003, p.15
![Page 17: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/17.jpg)
Logit/Normit Factor Analysis
Maximize:
L =∑
n,d
Xnd log σ(Θnd) + (1 − Xnd) log σ(−Θnd)
subject to constraint
Θnd =∑
l
UnlVld, where l is a latent space dimension
and Un ∼ N (0, I)
Schein et al. AI & Statistics 2003, p.16
![Page 18: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/18.jpg)
Logit/Normit Factor Analysis
Un ∼ N (0, I)
Fitting the Un in logit/normit factor is harder than in LPCA.
It requires an additional variational approximation and iterative process.
Model fitting improves the lower bound on the log-likelihood.
...not necessarily the log-likelihood itself.
In contrast, ALS for LPCA guarantees an increase in the log-likelihood.
Schein et al. AI & Statistics 2003, p.17
![Page 19: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/19.jpg)
Example in 3 Dimensions
00.25
0.50.75
1
0
0.25
0.5
0.75
10
0.25
0.5
0.75
1
1
X Axis
6
6
Y Axis
2
Z A
xis
Schein et al. AI & Statistics 2003, p.18
![Page 20: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/20.jpg)
Example in 3 Dimensions
00.25
0.50.75
1
0
0.25
0.5
0.75
10
0.25
0.5
0.75
1
1
X Axis
6
6
Y Axis
2
Z A
xis
Schein et al. AI & Statistics 2003, p.19
![Page 21: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/21.jpg)
Example in 3 Dimensions
00.25
0.50.75
1
0
0.25
0.5
0.75
10
0.25
0.5
0.75
1
1
X Axis
6
6
Y Axis
2
Z A
xis
Schein et al. AI & Statistics 2003, p.20
![Page 22: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/22.jpg)
Empirical Evaluation: Data Reconstruction
CompressedData
Original
Data Data
Reconstructed
X = Original Data N = Number of ObservationsR = Reconstructed Data D = Dimensionality of the Data
Error =
∑N
n
∑D
d |Xnd − Rnd|
N ∗ D
Schein et al. AI & Statistics 2003, p.21
![Page 23: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/23.jpg)
Microsoft Web Log Reconstruction Results
Web log shows URL visitation by anonymized users.
Data Set: a matrix of users and URLS clicked on
N = 32711 Observations are a sessionD = 285 URLS ClickedDensity = 0.011
Data Set Task: Build a recommender system of URLS.
Our Task: Data Reconstruction
Schein et al. AI & Statistics 2003, p.22
![Page 24: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/24.jpg)
Microsoft Web Log Reconstruction Results
Error Rates (%)
L Linear PCA Logistic PCA1 1.52 1.282 1.41 1.154 1.36 0.7608 1.11 0.355
1 LPCA dimension ' 6 PCA dimensions
Schein et al. AI & Statistics 2003, p.23
![Page 25: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/25.jpg)
Advertising Data Reconstruction Results
A UC Irvine data set of web linked images and surrounding features.
N = 3279 image linksD = 1555 context featuresdensity = 0.072
Data Set Task: Predict whether an image is an advertisement
Our Task: Data Reconstruction
Features include phrases in the anchor text and around image:
microsoft.com, toyotaofroswell.com, home+page
Schein et al. AI & Statistics 2003, p.24
![Page 26: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/26.jpg)
Advertising Data Reconstruction Results
Error Rates (%)
L Linear PCA Logistic PCA1 2.68 1.972 2.39 1.204 2.17 0.6268 1.76 0.268
1 LPCA dimension ' 7 PCA dimensions
Schein et al. AI & Statistics 2003, p.25
![Page 27: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/27.jpg)
Other Data Sets (in paper)
• Microarray Gene Expression Data
– Observations are genes– Attributes are environmental conditions– Binary values indicate whether genes are expressed or not
• MovieLens Movie Ratings Data
– Observations are users– Attributes are movies– Binary values indicate whether a user rated a movie or not
Schein et al. AI & Statistics 2003, p.26
![Page 28: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/28.jpg)
Related Models
Both of these models share a decomposition:
Θnd = (UV )nd
• Factor Analysis: A generative relative of PCA
• Multinomial PCA (MPCA): A multinomial, generative variant of PCA
MPCA is represented in the proceedings: [Buntine and Perttu, 2003].
Schein et al. AI & Statistics 2003, p.27
![Page 29: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar](https://reader035.fdocuments.us/reader035/viewer/2022081405/5f0c166e7e708231d433ae25/html5/thumbnails/29.jpg)
Summary
• We derive and implement the ALS algorithm for fitting LPCA.
• In data reconstruction experiments, LPCA outperforms PCA.
• LPCA is well suited for smoothed probability models of binary data:
– People and URLs they click– Phrase features surrounding image links
• Future work will explore LPCA in other traditional PCA tasks.
– Feature extraction– Machine learning
Schein et al. AI & Statistics 2003, p.28