Random Projections - Healthiness and Stochastic Simulation · Stochastic Simulation of Multiple...

Post on 13-Jul-2020

3 views 0 download

Transcript of Random Projections - Healthiness and Stochastic Simulation · Stochastic Simulation of Multiple...

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Random ProjectionsHealthiness and Stochastic Simulation

Joao Brazuna

Statistical Methods in Data MiningInstituto Superior Tecnico

November 29, 2016

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Contents

Defining Random Projections

How Do Random Projections Work?

How to Apply Random Projections?

Multiple Linear Regression Model

Stochastic Simulation of Multiple Linear Regression Models

Multiple Logistic Regression Model

Diagnosing Leukaemia

Other Applications

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Information Era

Characterized by:

I High dimensional data;

I Difficult to process.

Random Projections’ Goal:

Efficiently reduce the data dimension.

Some Applications of Random Projections

I Classification;

I Clustering;

I Regression.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

“Dimensionality Curse”

It affects data analysis in two different ways:

I A lot of features and samples;

I A lot of features and few samples.

An Efficient Solution:Random Projections

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Notation

I Sample dimension n;

I For the i-th sample, with i ∈ {1, · · · , n}, we check pfeatures;

I Vector with the p features for sample i :

x i = (xi1, · · · , xip) ∈ Rp, ∀i ∈ {1, ..., n} ;

I We join the n vectors (by rows) in a n × p dimensionalmatrix:

X =

x t1...x tn

=

x11 · · · x1p

.... . .

...xn1 · · · xnp

∈ Rn×p.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Random Projections’ Goal

What do we have?n vectors in Rp

What do we want?n vectors in Rk , with k < p and all the squared distancespreserved by a (1± ε) factor.

So, we keep the sample size but we significantly reduce thenumber of features.

What do we need?A function f : Rp → Rk such that, for anyu, v ∈ {x1, · · · , xn},

(1−ε)∥∥f (u)− f (v)

∥∥2 ≤‖u − v‖2 ≤ (1+ε)∥∥f (u)− f (v)

∥∥2.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Random Projections’ Goal

Figure: Example of our goal. Source:[1]

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Random Projections’ Goal

Figure: Example of our goal. Source:[1]

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Random Projections’ Goal

Figure: Example of our goal. Source:[1]

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Random Projections’ Goal

Figure: Example of our goal. Source:[1]

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Principal Component Analysis vs. RandomProjections

PCA’s GoalTo preserve data variability:

I 1st Principal Component ⇒ Direction of the maximumvariability;

I 2nd Principal Component ⇒ Direction of maximumvariability that is orthogonal to the 1st ;...

RP’s GoalTo preserve distances between vectors: For all rows u, v ofX , the distance between the projected values f (u) and f (v)is similar to the original squared distance between u and v :is is between (1− ε)‖u − v‖2 and (1 + ε)‖u − v‖2.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Some Questions on Random Projections

I How can we define that function?

I Can we use it regardless of the values of n or p?

I Is there any restriction on k? How can we determinethe smallest possible k (dimension of the space wherewe want to project the n vectors of Rp)?

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

How Do Random Projections Work?

Lemma (Johnson-Lindenstrauss)

Let:

I X be a matrix in Rn×p, whose rows are denoted byx i ∈ Rp, ∀i ∈ {1, · · · , n}:

X =

x t1...x tn

=

x11 · · · x1p

.... . .

...xn1 · · · xnp

∈ Rn×p;

I ε ∈]0, 1[ arbitrary;

I k ∈ N such that d 243ε2−2ε3 log ne ≤ k < p.

Then, there exists f : Rp → Rk such that for anyu, v ∈ {x1, · · · , xn} we have that

(1−ε)∥∥f (u)− f (v)

∥∥2 ≤‖u − v‖2 ≤ (1+ε)∥∥f (u)− f (v)

∥∥2.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

The AnswersIt is not possible to randomly project an arbitrary set ofvector. We must have a value of p of at leastd 24

3ε2−2ε3 log ne+ 1, because k < p.

Figure: Smallest value of k vs. ε and n

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

ConclusionsI Fixing ε, k slowly increases (logarithmically) as n

increases;I Fixing n, k quickly decreases as ε increases.

Figure: Smallest value of k vs. ε and n

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Conclusions

We can produce a table considering n and k as integervalues.

ε0.0001 0.001 0.1 0.3 0.5 0.7 0.9 0.999 0.9999

n

10 1.85× 109 1.85× 107 1974 256 111 71 57 56 56102 3.69× 109 3.69× 107 3948 512 222 141 114 111 111103 5.53× 109 5.53× 107 5921 768 332 212 171 166 166106 1.11× 1010 1.11× 108 11842 1536 664 423 342 332 332109 1.66× 1010 1.66× 108 17763 2303 995 635 512 498 4981012 2.22× 1010 2.22× 108 23684 3071 1327 846 683 664 664

Table: Smallest value of k for fixed ε and n

For instance, 1 million vectors with dimension 10 million canbe projected, considering ε = 0.5, in R664.We obtain 1 million vectors with dimension 664 each, withall the squared distances preserved by a 1± ε factor.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

How to Apply Random Projections?

I The tool is obtained on the proof ofJohnson-Lindenstrauss’ Lemma;

I Given a data matrix X ∈ Rn×p with n samplescontaining p parameters each, our goal is to find aprojection matrix R such that E = XR is the projectionof matrix X ;

I Let f : Rp → Rk be given by f (u) = 1√kAu, where

A ∈ Rk×p is a matrix verifying Aij ∼i .i .d .

N(0, 1) for all i,j;

I The map preserves all the squared distances by a 1± εfactor when repeatedly applied O(n) times (it occursalmost surely);

I Take R such that Rt = 1√kA and the projection is given

byE = XR.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

How to Apply Random Projections?

1. Fix ε ∈]0, 1[;

2. Choose k ∈ N such that d 243ε2−2ε3 log ne ≤ k < p;

3. Build a matrix R = 1n√kAt , where A =

∑nl=1 Al and Al

are n real k × p dimensional matrices with standardizednormal entries, which means thatAl ij ∼

i .i .dN(0, 1), ∀i ∈ {1, ..., k} , ∀j ∈ {1, ..., p};

4. Get the projection of matrix X computing E = XR.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Interpreting Projected FeaturesThe data matrix X is a n dimensional sample of p features:

X =

x11 · · · x1p

.... . .

...xn1 · · · xnp

∈ Rn×p.

After the projection, the new matrix is the product of X bya matrix R ∈ Rp×k .

E = X R =

x11 · · · x1p

.... . .

...xn1 · · · xnp

r11 · · · r1k...

. . ....

rp1 · · · rpk

=

=

∑p

j=1 x1j rj1 · · ·∑p

j=1 x1j rjk...

. . ....∑p

j=1 xnj rj1 · · ·∑p

j=1 xnj rjk

∈ Rn×k

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Interpreting Projected Features

Each new feature is a linear combination of all the originalfeatures where the coefficients are the elements of matrix R,which are

rij =1

n√k

n∑l=1

Alji

with Alji ∼ N(0, 1).

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Multiple Linear Regression Model

Let us consider the general linear model with Gauss-Markovstructure:

y = XDβ + ε⇔⇔Yi = β0 + β1xi1 + · · ·+ βpxip + εi

I y = (Y1, ...,Yn) is the n dimensional vector containingthe values of the response variable Y ;

I XD =

1 x11 · · · x1p...

.... . .

...1 xn1 · · · xnp

is the n × (p + 1)

dimensional design matrix;

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Multiple Linear Regression Model

y = XDβ + ε

I β = (β0, ..., βp) is the vector of the p + 1 regressionparameters;

I ε = (ε1, ..., εn) is the vector of random errors such that:I E (ε) = 0⇔ E (εi ) = 0 , ∀i ∈ {1, ..., n};I Var(ε) = σ2

I ⇔ Var(εi ) = σ2 , ∀i ∈ {1, ..., n};I Corr(εi , εj) = 0 , ∀i 6= j .

E (Y |x) = XDβ

To make inferences, we additionally suppose that

εi ∼i .i .d .

N(

0, σ2).

So the fitted values are

y = XD β ⇔ yi = β0 + β1xi1 + · · ·+ βpxip.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Simulating a Linear Multiple Regression Model -Ideal Case

We generated on R:

I n = 5000 samples;I p = 1000 features:

I 400 values from the distribution N(20, 25);I 500 values from the distribution Unif (5, 95);I 100 values from the distribution Bin(100, 0.5);

I X ∈ R5000×1000 is the concatenation of all 5000samples of the 1000 generated features;

I βk ∼ N(0, 100), ∀k ∈ {0, 1, · · · , p} are the trueregression parameters;

I σ = 2.055656 is the constant standard deviation of therandom errors;

I εi ∼ N(0, σ2

), ∀i ∈ {1, · · · , n} are the random errors,

with the only restriction imposed by the model!

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Simulating a Linear Multiple Regression Model -Ideal Case

With all this data, we generate the response variable vector:

y = XDβ + ε.

Just for control, we can estimate the parametersβ0, β1, · · · , βp and verify that they are similar to the originalones.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Applying Random Projections to the GeneratedModel - Ideal Case

Using ε = 0.5 (factor from random projections, not the linearmodel), we reduce the number of features from p = 1000 tok = 409.Taking ε = 0.999, we can get k = 205, which is the smallestinteger k that can be used!

ε p k R2a VIF AIC PRESS

0.5 1000 409 71.23% < 5 95433 5.7× 1010

0.999 1000 205 37.58% < 5 98976 1.2× 1010

Model assumptions seem to still be verified in both cases.We obtain good results using ε = 0.5 but they are not sogood when we take ε = 0.999.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Simulating a Linear Multiple Regression Model -Non-Ideal Case

I n = 5000 samples as before;I p = 10000 features, 10 times more, keeping the

proportion and the distributions:I 4000 from the distribution N(20, 25);I 5000 from the distribution Unif (5, 95);I 1000 from the distribution Bin(100, 0.5);

I X ∈ R5000×1000 is the concatenation of all 5000samples of the 1000 generated features;

I βk ∼ N(0, 100), ∀k ∈ {0, 1, · · · , p} are the regressionparameters;

I σ = 2.055656 is the constant standard deviation of therandom errors;

I εi ∼ N(0, σ2

), ∀i ∈ {1, · · · , n} are the random errors,

with the only restriction imposed by the model;I y = XDβ + ε.

There are more features than samples so we cannot estimateregression parameters!

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Applying Random Projections to the GeneratedModel - Non-Ideal Case

Using ε = 0.5 (factor from random projections, not the linearmodel), we reduce the number of features from p = 10000to k = 409 as before.Taking ε = 0.999, we can also get k = 205, which is thesmallest integer k that can be used!

ε p k R2a VIF AIC PRESS

0.5 10000 409 8% < 5 112323 1.7× 1012

0.999 10000 205 4% < 5 112379 1.7× 1012

Model assumptions seem to still be verified in both cases.We do not obtain good results using ε = 0.5 and they areeven worse when we choose ε = 0.999.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Conclusions

I Model assumptions are verified;

I The portion of data variability which is explained by thelinear model seems to decrease, with AIC increasingand R2

a decreasing;

I The total number of influent observations seems toincrease as we reduce the dimension k of the spacewhere we want to project our data, with PRESS gettinglarger or, at least, at the same order.

I Interpreting the regression parameters is more difficult.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Interpreting the Regression Parameters

In General:The coefficient βk tells us how much a fitted value increasesor decreases when the k − th explanatory variable (afterprojection) increases in one unit, keep all the other termsfixed.

Y = β0 + β1x∗1 + · · ·+ βpx

∗p

When Applying Random Projections:

The coefficient βk tells us how much a fitted value increasesor decreases when a linear combination x∗k of all the originalexplanatory variables (before projection) increase in one unit,keep all the other terms fixed. But those other terms alsodepend on the model structure considering the originalvariables...

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Multiple Logistic Regression ModelFor the next dataset, our response variable will be binary.When the response variable is categorical, we should uselogistic regression.Using logistic functions, we can obtain the model

E(Yi |xi1, · · · , xip

)= πi =

eβ0+β1xi1+···+βpxip

1 + eβ0+β1xi1+···+βpxip

that can be linearised by the logit function

π∗i = log

(πi

1− πi

)= β0 + β1xi1 + · · ·+ βpxip.

which is continuous, linear on the parameters and takesvalues in R.

β =

β0

β1...βp

, X i =

1

Xi ,1...

Xi ,p

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Multiple Logistic Regression Model

Figure: Example of a Logistic Function

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Multiple Logistic Regression Model

The Model

π (x) =eβ0+β1x1+···+βpxp

1 + eβ0+β1x1+···+βpxp =ex

1 + ex tβ=[1 + e−x

tβ]−1

where the logit function is given by

π∗(x) = log

[π(x)

1− π(x)

]= β0 + β1x1 + · · ·+ βpxp

Estimating the Parameters

It is possible to estimate β using maximum likelihood, butwe get non-linear likelihood equations for any βk coefficient.We need to apply numerical methods. In R, it is used FisherScoring algorithm.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Another Application - Clinical Data FromLeukaemia Diagnosis

Data from clinical experiments on St. Jude Children’sResearch Hospital, Memphis, Tenessee, USA, was publishedon 2002.It is a microarray for diagnosing acute lymphoblasticleukaemia.

I n = 327 samples - 327 patients from that hospital;

I p = 12625 explanatory variables - 12625 genes;

I Y binary response variable:

Y =

{1, if the patient is ill

0, otherwise.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Another Application - Clinical Data FromLeukaemia Diagnosis

Figure: Construcao de um Microarray

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Building a Regression Model

Ideal Number of Samples:

Between 5 and 10 samples per each explanatory variable.

What do we have?More explanatory variables than samples (0.026 samples pereach explanatory variable). It is not possible to estimateregression parameters!

What can we do?Applying random projections to the explanatory variables.There are n = 327 vectors in Rp, with p = 12625 that canbe projected in Rk , with k < p. By Johnson-Lindenstrauss’Lemma, the smallest k in those conditions, pickingε = 0.999 is 139.

After Projection:

327 samples and 139 explanatory variables. It is not ideal,but at least we can now build a model.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Building a Regression Model

The response variable is binary, so we should apply logisticregression. We will apply both methods: linear and logisticregression.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Applying Multiple Linear Regression

We estimate the p + 1 = 140 regression parameters on R,without considering the response variable as categorical.

Classification RuleWe want to classify the patients as being ill or not.If a fitted value yi is larger than 0.5, we classify the i − thpatient as being ill.Otherwise, we do not consider that patient as ill.

Multiple Linear Regression Model

I R2a ' 16%;

I Several variables with a very high p-value on its t-test;

I F-Test with a small p-value: 0.8%;

I VIF > 10 on 4 variables, VIF < 5 on 92 variables;

I AIC ' 20;

I PRESS ' 27.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Applying Multiple Linear Regression

Model Assumptions

When we try to verify model assumptions, we can easilycheck that they are not verified, so multiple linear regressionmodel does not fit the data.

PredictionThere are 327 patients, 19 of them not ill and 308 ill.Applying the previous classification rule, we get

I 312 ill patients;

I 15 healthy patients.

Also, we did not split the data on training set and test set...

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Applying Multiple Logistic Regression

An Additional ProblemWe have a sample of 19 healthy patients and 308 ill ones in atotal os 327 observations, so the data is not balanced. Thisleads to convergence problems while estimating parameters...

On Possible SolutionAuthorizing a maximum error of 0.1 on parametersestimation and 100000 iterations.

Applying Multiple Logistic Regression

I Very high standard deviation for each coefficient(order of 103

);

I AIC ' 280;

I PRESS ' 4.62.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Applying Multiple Logistic Regression

PredictionIt is on prediction that we have the most importantimprovement.Applying the classification rule defined above, we predict

I 308 ill patients;

I 19 healthy patients.

which correspond to the real data.Once again, we did not split the data on training set andtest set...

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Comparing Random Projections and PCA

Advantages of PCA

I Better adjusted coefficient of determination (concisewith PCA’s goal);

I Faster if we extract few principal components.

Advantages of RP

I Better predictions;

I Faster if we need to extract a lot of principalcomponents when using PCA.

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Other Applications - Linear Regression withCompressive Ordinary Least Squares

Figure: Application to Detection of Musical Patterns withn = 2000, p = 106 and Very Sparse Data. Source: [5]

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Other Applications - Noise Detection on Images

Figure: Application to Noise Detection on Images. Source: [7]

RandomProjections

Joao Brazuna

Defining RandomProjections

How Do RandomProjections Work?

How to ApplyRandomProjections?

Multiple LinearRegression Model

StochasticSimulation ofMultiple LinearRegression Models

Multiple LogisticRegression Model

DiagnosingLeukaemia

Other Applications

Other Applications - Noise Detection on Images

Figure: Impact over required floating point operations after noisedetection on images, using MATLAB. Source: [7]

RandomProjections

Joao Brazuna

Appendix

Bibliography

Bibliography I

Aditya Krishna Menon.Random Projections and Applications to DimensionalityReduction.School of Information Technologies, University ofSydney, Australia, 2007.

Lopez-Paz and Duvenaud.Random Projections.School of Engineering and Applied Sciences, Universityof Harvard, Estados Unidos da America, 2013.

Michael Mahoney.The Johnson-Lindenstrauss Lemma.School of Engineering, University of Standford, EstadosUnidos da America, 2009.

RandomProjections

Joao Brazuna

Appendix

Bibliography

Bibliography II

Sanjoy Dasgupta and Anupam Gupta.An Elementary Proof of a Theorem of Johnson andLindenstrauss.New Jersey, Estados Unidos da America, 2001.

Robert J. Durrant and Ata Kaban.Random Projections for Machine Learning and DataMining: Theory and Applications.University of Birmingham, Reino Unido, 2012.

Conceicao Amado.Regressao Logıstica - Uma Introducao.Instituto Superior Tecnico, Universidade de Lisboa,Portugal, 2010.