Introduction to Model-Based Machine Learning for Transportation

Post on 07-Jan-2017

100 views 0 download

Transcript of Introduction to Model-Based Machine Learning for Transportation

Introduction Probabilistic Programming Case Study

Introduction to Model-Based MachineLearning

A Webinar to TRB ADB40 Big Data Initiative

by

Daniel Emaasit1

1Ph.D. StudentDepartment of Civil and Environmental Engineering

University of Nevada, Las Vegas, USAemaasit@unlv.nevada.edu

September 27 2016

1 / 20

Introduction Probabilistic Programming Case Study

Introduction

2 / 20

Introduction Probabilistic Programming Case Study

Current Challenges in Adopting Machine Learning

I Generally, current challenges in adopting ML:

I Overwhelming number of traditional ML methods to learnI Deciding which algorithm to use or whyI Some custom problems may not fit with any existing

algorithm

3 / 20

Introduction Probabilistic Programming Case Study

Current Challenges in Adopting Machine Learning

I Generally, current challenges in adopting ML:I Overwhelming number of traditional ML methods to learn

I Deciding which algorithm to use or whyI Some custom problems may not fit with any existing

algorithm

3 / 20

Introduction Probabilistic Programming Case Study

Current Challenges in Adopting Machine Learning

I Generally, current challenges in adopting ML:I Overwhelming number of traditional ML methods to learnI Deciding which algorithm to use or why

I Some custom problems may not fit with any existingalgorithm

3 / 20

Introduction Probabilistic Programming Case Study

Current Challenges in Adopting Machine Learning

I Generally, current challenges in adopting ML:I Overwhelming number of traditional ML methods to learnI Deciding which algorithm to use or whyI Some custom problems may not fit with any existing

algorithm

3 / 20

Introduction Probabilistic Programming Case Study

What is Model-Based Machine Learning?

I A different viewpoint for machine learning proposed byBishop (2013)1, Winn et al. (2015)2

I Goal:

I Provide a single development framework which supports thecreation of a wide range of bespoke models

I The core idea:

I all assumptions about the problem domain are madeexplicit in the form of a model

1Bishop, C. M. (2013). Model-Based Machine Learning. PhilosophicalTransactions of the Royal Society A, 371, pp 1–17

2Winn, J., Bishop, C. M., Diethe, T. (2015). Model-Based MachineLearning. Microsoft Research Cambridge. http://www.mbmlbook.com.

4 / 20

Introduction Probabilistic Programming Case Study

What is Model-Based Machine Learning?

I A different viewpoint for machine learning proposed byBishop (2013)1, Winn et al. (2015)2

I Goal:

I Provide a single development framework which supports thecreation of a wide range of bespoke models

I The core idea:

I all assumptions about the problem domain are madeexplicit in the form of a model

1Bishop, C. M. (2013). Model-Based Machine Learning. PhilosophicalTransactions of the Royal Society A, 371, pp 1–17

2Winn, J., Bishop, C. M., Diethe, T. (2015). Model-Based MachineLearning. Microsoft Research Cambridge. http://www.mbmlbook.com.

4 / 20

Introduction Probabilistic Programming Case Study

What is Model-Based Machine Learning?

I A different viewpoint for machine learning proposed byBishop (2013)1, Winn et al. (2015)2

I Goal:I Provide a single development framework which supports the

creation of a wide range of bespoke models

I The core idea:

I all assumptions about the problem domain are madeexplicit in the form of a model

1Bishop, C. M. (2013). Model-Based Machine Learning. PhilosophicalTransactions of the Royal Society A, 371, pp 1–17

2Winn, J., Bishop, C. M., Diethe, T. (2015). Model-Based MachineLearning. Microsoft Research Cambridge. http://www.mbmlbook.com.

4 / 20

Introduction Probabilistic Programming Case Study

What is Model-Based Machine Learning?

I A different viewpoint for machine learning proposed byBishop (2013)1, Winn et al. (2015)2

I Goal:I Provide a single development framework which supports the

creation of a wide range of bespoke models

I The core idea:

I all assumptions about the problem domain are madeexplicit in the form of a model

1Bishop, C. M. (2013). Model-Based Machine Learning. PhilosophicalTransactions of the Royal Society A, 371, pp 1–17

2Winn, J., Bishop, C. M., Diethe, T. (2015). Model-Based MachineLearning. Microsoft Research Cambridge. http://www.mbmlbook.com.

4 / 20

Introduction Probabilistic Programming Case Study

What is Model-Based Machine Learning?

I A different viewpoint for machine learning proposed byBishop (2013)1, Winn et al. (2015)2

I Goal:I Provide a single development framework which supports the

creation of a wide range of bespoke models

I The core idea:I all assumptions about the problem domain are made

explicit in the form of a model1Bishop, C. M. (2013). Model-Based Machine Learning. Philosophical

Transactions of the Royal Society A, 371, pp 1–172Winn, J., Bishop, C. M., Diethe, T. (2015). Model-Based Machine

Learning. Microsoft Research Cambridge. http://www.mbmlbook.com.4 / 20

Introduction Probabilistic Programming Case Study

What is a Model in MBML?

I A Model:

I is a set of assumptions, expressed in mathematical/graphicalform

I expresses all parameters, variables as random variablesI shows the dependency between variables

Figure 1: Description of a model5 / 20

Introduction Probabilistic Programming Case Study

What is a Model in MBML?

I A Model:I is a set of assumptions, expressed in mathematical/graphical

form

I expresses all parameters, variables as random variablesI shows the dependency between variables

Figure 1: Description of a model5 / 20

Introduction Probabilistic Programming Case Study

What is a Model in MBML?

I A Model:I is a set of assumptions, expressed in mathematical/graphical

formI expresses all parameters, variables as random variables

I shows the dependency between variables

Figure 1: Description of a model5 / 20

Introduction Probabilistic Programming Case Study

What is a Model in MBML?

I A Model:I is a set of assumptions, expressed in mathematical/graphical

formI expresses all parameters, variables as random variablesI shows the dependency between variables

Figure 1: Description of a model5 / 20

Introduction Probabilistic Programming Case Study

Key Ideas of MBML?

I MBML is built upon 3 key ideas

I the use of Probabilistic Graphical Models (PGM)

I the adoption of Bayesian ML

I the application of fast, deterministic inference algorithms

6 / 20

Introduction Probabilistic Programming Case Study

Key Ideas of MBML?

I MBML is built upon 3 key ideasI the use of Probabilistic Graphical Models (PGM)

I the adoption of Bayesian ML

I the application of fast, deterministic inference algorithms

6 / 20

Introduction Probabilistic Programming Case Study

Key Ideas of MBML?

I MBML is built upon 3 key ideasI the use of Probabilistic Graphical Models (PGM)

I the adoption of Bayesian ML

I the application of fast, deterministic inference algorithms

6 / 20

Introduction Probabilistic Programming Case Study

Key Ideas of MBML?

I MBML is built upon 3 key ideasI the use of Probabilistic Graphical Models (PGM)

I the adoption of Bayesian ML

I the application of fast, deterministic inference algorithms

6 / 20

Introduction Probabilistic Programming Case Study

Key Idea 1: Probabilistic Graphical Models

I Combine probability theory with graphs (e.g Factor Graphs)

7 / 20

Introduction Probabilistic Programming Case Study

Key Idea 2: Bayesian Machine Learning

I Everything follows from two simple rules of probabilitytheory

8 / 20

Introduction Probabilistic Programming Case Study

Key Idea 3: Inference Algorithms

I the application of fast, approximate inference algorithms bylocal message passing

I Variational BayesI Belief Propagation, Loopy Belief PropagationI Expectation Propagation

(a) Learning by local messagepassing

(b) Inference algorithms

Figure 2: MCMC vs Approximate methods9 / 20

Introduction Probabilistic Programming Case Study

Key Idea 3: Inference Algorithms

I the application of fast, approximate inference algorithms bylocal message passing

I Variational Bayes

I Belief Propagation, Loopy Belief PropagationI Expectation Propagation

(a) Learning by local messagepassing

(b) Inference algorithms

Figure 2: MCMC vs Approximate methods9 / 20

Introduction Probabilistic Programming Case Study

Key Idea 3: Inference Algorithms

I the application of fast, approximate inference algorithms bylocal message passing

I Variational BayesI Belief Propagation, Loopy Belief Propagation

I Expectation Propagation

(a) Learning by local messagepassing

(b) Inference algorithms

Figure 2: MCMC vs Approximate methods9 / 20

Introduction Probabilistic Programming Case Study

Key Idea 3: Inference Algorithms

I the application of fast, approximate inference algorithms bylocal message passing

I Variational BayesI Belief Propagation, Loopy Belief PropagationI Expectation Propagation

(a) Learning by local messagepassing

(b) Inference algorithms

Figure 2: MCMC vs Approximate methods9 / 20

Introduction Probabilistic Programming Case Study

Stages of MBML

I 3 stages of MBML

I Build the model: Joint probability distribution of all therelevant variables (e.g as a graph)

I Incorporate the observed data

I Perform inference to learn parameters of the latentvariables

10 / 20

Introduction Probabilistic Programming Case Study

Stages of MBML

I 3 stages of MBMLI Build the model: Joint probability distribution of all the

relevant variables (e.g as a graph)

I Incorporate the observed data

I Perform inference to learn parameters of the latentvariables

10 / 20

Introduction Probabilistic Programming Case Study

Stages of MBML

I 3 stages of MBMLI Build the model: Joint probability distribution of all the

relevant variables (e.g as a graph)

I Incorporate the observed data

I Perform inference to learn parameters of the latentvariables

10 / 20

Introduction Probabilistic Programming Case Study

Stages of MBML

I 3 stages of MBMLI Build the model: Joint probability distribution of all the

relevant variables (e.g as a graph)

I Incorporate the observed data

I Perform inference to learn parameters of the latentvariables

10 / 20

Introduction Probabilistic Programming Case Study

Special cases of MBML

(a) Special cases

(b) For sequential data

Figure 3: Special Cases of Models

11 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approach

I Provides a systematic process of creating ML solutionsI Allows for incorporation of prior knowledgeI Allows for handling uncertainity in a principled mannerI Does not suffer from overfittingI Custom solutions are built for specific problemsI Allows for quick building of several alternative modelsI Easy to compare those alternativesI It’s general purpose: No need to learn the 1000s of existing

ML algorithmsI Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approachI Provides a systematic process of creating ML solutions

I Allows for incorporation of prior knowledgeI Allows for handling uncertainity in a principled mannerI Does not suffer from overfittingI Custom solutions are built for specific problemsI Allows for quick building of several alternative modelsI Easy to compare those alternativesI It’s general purpose: No need to learn the 1000s of existing

ML algorithmsI Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approachI Provides a systematic process of creating ML solutionsI Allows for incorporation of prior knowledge

I Allows for handling uncertainity in a principled mannerI Does not suffer from overfittingI Custom solutions are built for specific problemsI Allows for quick building of several alternative modelsI Easy to compare those alternativesI It’s general purpose: No need to learn the 1000s of existing

ML algorithmsI Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approachI Provides a systematic process of creating ML solutionsI Allows for incorporation of prior knowledgeI Allows for handling uncertainity in a principled manner

I Does not suffer from overfittingI Custom solutions are built for specific problemsI Allows for quick building of several alternative modelsI Easy to compare those alternativesI It’s general purpose: No need to learn the 1000s of existing

ML algorithmsI Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approachI Provides a systematic process of creating ML solutionsI Allows for incorporation of prior knowledgeI Allows for handling uncertainity in a principled mannerI Does not suffer from overfitting

I Custom solutions are built for specific problemsI Allows for quick building of several alternative modelsI Easy to compare those alternativesI It’s general purpose: No need to learn the 1000s of existing

ML algorithmsI Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approachI Provides a systematic process of creating ML solutionsI Allows for incorporation of prior knowledgeI Allows for handling uncertainity in a principled mannerI Does not suffer from overfittingI Custom solutions are built for specific problems

I Allows for quick building of several alternative modelsI Easy to compare those alternativesI It’s general purpose: No need to learn the 1000s of existing

ML algorithmsI Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approachI Provides a systematic process of creating ML solutionsI Allows for incorporation of prior knowledgeI Allows for handling uncertainity in a principled mannerI Does not suffer from overfittingI Custom solutions are built for specific problemsI Allows for quick building of several alternative models

I Easy to compare those alternativesI It’s general purpose: No need to learn the 1000s of existing

ML algorithmsI Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approachI Provides a systematic process of creating ML solutionsI Allows for incorporation of prior knowledgeI Allows for handling uncertainity in a principled mannerI Does not suffer from overfittingI Custom solutions are built for specific problemsI Allows for quick building of several alternative modelsI Easy to compare those alternatives

I It’s general purpose: No need to learn the 1000s of existingML algorithms

I Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approachI Provides a systematic process of creating ML solutionsI Allows for incorporation of prior knowledgeI Allows for handling uncertainity in a principled mannerI Does not suffer from overfittingI Custom solutions are built for specific problemsI Allows for quick building of several alternative modelsI Easy to compare those alternativesI It’s general purpose: No need to learn the 1000s of existing

ML algorithms

I Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Benefits of MBML

I Potential benefits of this approachI Provides a systematic process of creating ML solutionsI Allows for incorporation of prior knowledgeI Allows for handling uncertainity in a principled mannerI Does not suffer from overfittingI Custom solutions are built for specific problemsI Allows for quick building of several alternative modelsI Easy to compare those alternativesI It’s general purpose: No need to learn the 1000s of existing

ML algorithmsI Separates model from inference/training code

12 / 20

Introduction Probabilistic Programming Case Study

Probabilistic Programming

13 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:

I random variablesI constraints on variablesI inference

I Examples of PP software packages

I Infer.Net (C#, C++)I Stan (R, python, C++)I BUGSI churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:

I random variablesI constraints on variablesI inference

I Examples of PP software packages

I Infer.Net (C#, C++)I Stan (R, python, C++)I BUGSI churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:I random variables

I constraints on variablesI inference

I Examples of PP software packages

I Infer.Net (C#, C++)I Stan (R, python, C++)I BUGSI churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:I random variablesI constraints on variables

I inference

I Examples of PP software packages

I Infer.Net (C#, C++)I Stan (R, python, C++)I BUGSI churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:I random variablesI constraints on variablesI inference

I Examples of PP software packages

I Infer.Net (C#, C++)I Stan (R, python, C++)I BUGSI churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:I random variablesI constraints on variablesI inference

I Examples of PP software packages

I Infer.Net (C#, C++)I Stan (R, python, C++)I BUGSI churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:I random variablesI constraints on variablesI inference

I Examples of PP software packagesI Infer.Net (C#, C++)

I Stan (R, python, C++)I BUGSI churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:I random variablesI constraints on variablesI inference

I Examples of PP software packagesI Infer.Net (C#, C++)I Stan (R, python, C++)

I BUGSI churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:I random variablesI constraints on variablesI inference

I Examples of PP software packagesI Infer.Net (C#, C++)I Stan (R, python, C++)I BUGS

I churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:I random variablesI constraints on variablesI inference

I Examples of PP software packagesI Infer.Net (C#, C++)I Stan (R, python, C++)I BUGSI church

I PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

What is Probabilistic Programming?

I A software package that takes the model and thenautomatically generate inference routines (even source code!)to solve a wide variety of models

I Takes programming languages and adds support for:I random variablesI constraints on variablesI inference

I Examples of PP software packagesI Infer.Net (C#, C++)I Stan (R, python, C++)I BUGSI churchI PyMC (python)

14 / 20

Introduction Probabilistic Programming Case Study

How Probabilistic Programming works

Figure 4: How infer.NET works

15 / 20

Introduction Probabilistic Programming Case Study

Case Study

16 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily TravelI Analysing the distribution of an individual cyclist’s daily

travel time to work

I Identify the variables of interestttn - travel time in the

nth dayat - average travel-timetu - uncertainty

ttn

at

tu

N

17 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily TravelI Analysing the distribution of an individual cyclist’s daily

travel time to workI Identify the variables of interest

ttn - travel time in thenth day

at - average travel-timetu - uncertainty

ttn

at

tu

N

17 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily TravelI Analysing the distribution of an individual cyclist’s daily

travel time to workI Specify relationships between variables

ttn - travel time in thenth day

at - average travel-timetu - uncertainty

ttn

at

tu

N

I Joint distribution is given by

p(tt, at, tu) = p(at) p(tu)︸ ︷︷ ︸priors

×N∏

n=1p(ttn|at, tu)︸ ︷︷ ︸likelihood

18 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily TravelI Analysing the distribution of an individual cyclist’s daily

travel time to workI Specify relationships between variables

ttn - travel time in thenth day

at - average travel-timetu - uncertainty

ttn

at

tu

N

I Joint distribution is given by

p(tt, at, tu) = p(at) p(tu)︸ ︷︷ ︸priors

×N∏

n=1p(ttn|at, tu)︸ ︷︷ ︸likelihood

18 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily Travel

I Analysing the distribution of an individual cyclist’s dailytravel time to work

I Joint distribution is given by

p(tt, as, tu) = p(at) p(tu)︸ ︷︷ ︸priors

×N∏

n=1p(ttn|at, tu)︸ ︷︷ ︸likelihood

I How should we define the likelihood p(ttn|at, tu)?

I the distribution’s mean is the cyclist’s average travel timeI the distribution’s variance determines how much the travel

time varies from day to day (e.g. variations in trafficconditions)

I What distributions should p(at) and p(tu) have?I conjugate priors! (well, at least most of the times...)

19 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily Travel

I Analysing the distribution of an individual cyclist’s dailytravel time to work

I Joint distribution is given by

p(tt, as, tu) = p(at) p(tu)︸ ︷︷ ︸priors

×N∏

n=1p(ttn|at, tu)︸ ︷︷ ︸likelihood

I How should we define the likelihood p(ttn|at, tu)?I the distribution’s mean is the cyclist’s average travel timeI the distribution’s variance determines how much the travel

time varies from day to day (e.g. variations in trafficconditions)

I What distributions should p(at) and p(tu) have?I conjugate priors! (well, at least most of the times...)

19 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily Travel

I Analysing the distribution of an individual cyclist’s dailytravel time to work

I Joint distribution is given by

p(tt, as, tu) = p(at) p(tu)︸ ︷︷ ︸priors

×N∏

n=1p(ttn|at, tu)︸ ︷︷ ︸likelihood

I How should we define the likelihood p(ttn|at, tu)?I the distribution’s mean is the cyclist’s average travel timeI the distribution’s variance determines how much the travel

time varies from day to day (e.g. variations in trafficconditions)

I What distributions should p(at) and p(tu) have?

I conjugate priors! (well, at least most of the times...)

19 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily Travel

I Analysing the distribution of an individual cyclist’s dailytravel time to work

I Joint distribution is given by

p(tt, as, tu) = p(at) p(tu)︸ ︷︷ ︸priors

×N∏

n=1p(ttn|at, tu)︸ ︷︷ ︸likelihood

I How should we define the likelihood p(ttn|at, tu)?I the distribution’s mean is the cyclist’s average travel timeI the distribution’s variance determines how much the travel

time varies from day to day (e.g. variations in trafficconditions)

I What distributions should p(at) and p(tu) have?I conjugate priors! (well, at least most of the times...)

19 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily Travel

I Likelihood given byp(ttn |at, tu) = N (ttn |at, tu)

I We now know what distribution forms to assign to thepriors...

p(at) = N (at|µ, σ2)p(tu) = InvGamma(tu|α, β)

I Set the hyper-parameters based on our prior knowledge ofthe domain (e.g. µ = 12, σ2 = 10, α = 2.0, β = 1.0)

I The choice of the initial parameters of the prior is significantonly if you have a small number of observations

I As the number of observations increases, the influence of theinitial prior on the posterior declines

20 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily Travel

I Likelihood given byp(ttn |at, tu) = N (ttn |at, tu)

I We now know what distribution forms to assign to thepriors...

p(at) = N (at|µ, σ2)p(tu) = InvGamma(tu|α, β)

I Set the hyper-parameters based on our prior knowledge ofthe domain (e.g. µ = 12, σ2 = 10, α = 2.0, β = 1.0)

I The choice of the initial parameters of the prior is significantonly if you have a small number of observations

I As the number of observations increases, the influence of theinitial prior on the posterior declines

20 / 20

Introduction Probabilistic Programming Case Study

A Bicyclist’s Daily Travel

I Likelihood given byp(ttn |at, tu) = N (ttn |at, tu)

I We now know what distribution forms to assign to thepriors...

p(at) = N (at|µ, σ2)p(tu) = InvGamma(tu|α, β)

I Set the hyper-parameters based on our prior knowledge ofthe domain (e.g. µ = 12, σ2 = 10, α = 2.0, β = 1.0)

I The choice of the initial parameters of the prior is significantonly if you have a small number of observations

I As the number of observations increases, the influence of theinitial prior on the posterior declines

20 / 20