BLiNQ MEDIA Praneeth Vepakomma Senior Data Scientist

29
BLiNQ MEDIA Praneeth Vepakomma Senior Data Scientist Generalization in Supervised Machine Learning

description

Generalization in Supervised Machine Learning. BLiNQ MEDIA Praneeth Vepakomma Senior Data Scientist. Hypothetical Knapsack of Coins:. Copper and Gold Coins Total number of coins is fixed and is a large sample. Capture-Recapture What is the proportion of Gold coins?. - PowerPoint PPT Presentation

Transcript of BLiNQ MEDIA Praneeth Vepakomma Senior Data Scientist

Page 1: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

BLiNQ MEDIAPraneeth VepakommaSenior Data Scientist

Generalization in Supervised

Machine Learning

Page 2: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Hypothetical Knapsack of Coins:

Copper and Gold CoinsTotal number of coins is fixed and is a large sample.Capture-RecaptureWhat is the proportion of Gold coins?

Copper and Gold CoinsTotal number of coins is variable and is a large sample.Capture-RecaptureWhat is the proportion of Gold coins?

Page 3: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

BASIC ML/STAT TERMINOLOGY:

Page 4: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

190 Years after Gauss, the core problem of prediction remains an active problem :

Then:

Now:

Page 5: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 6: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

190 Years after Gauss, the core problem of prediction remains an active problem :

Find a mapping♯ from the features:

#Approximation

is a list of parameters, required to represent the function

Page 7: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

ExistingFeatures

KnownLabels

UnavailableFeatures

UnknownLabels

Loss Function

Loss Function

Assumptions

What is Supervised Learning?

Page 8: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Evaluating the Learned Function:

Loss Function quantifies the error in the approximation.

Learn a mapping by optimizing the loss.

Example:

Page 9: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Predictions with varying parameters:

Page 10: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Predictions with varying parameters:

Page 11: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

How do we generalize?

Page 12: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Generalization and Predictability

Empirical Risk Minimization:

True Risk Minimization:

Empirical Risk is the average (expected) loss on seen data.

True Risk is the expected risk on the process generating the X,Y pairs.

Page 13: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 14: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

PARAMETRIC CHARACTERIZATION OF THE MAPPING :

2d-Linear function: Slope, InterceptCubic Spline: Number of knots, Location of KnotsNearest-Neighbor regression: Number of neighborsLasso: L1-L2 WeightsSupport Vector Machines: Kernel width, Margin LengthRandom Forests: Resampling sample size

Page 15: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 16: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 17: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 18: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 19: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Long list of available Supervised Learning Techniques.

Most of the techniques have tuning parameters.

We can minimize out-of-sample performance by tuning the technique with optimal parameters.

Tuning can be performed by cross-validation over a discrete grid of parameter combinations.

Page 20: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 21: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

CURSE OF DIMENSIONALITY-Flat World-10D World:

Page 22: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

CURSE OF DIMENSIONALITY-Flat World-10D World:

Page 23: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

CURSE OF DIMENSIONALITY-Flat World-10D World:

Page 24: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

CURSE OF DIMENSIONALITY-Let us validate:

Page 25: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Structural Risk Minimization via Regularization:

Page 27: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Brief Description

Technology Overview

Hiring (What we’re looking for)http://blinqmedia.com/contact/job-openings/

Page 28: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Lets work with Abalone

Page 29: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Thank You!