Bayesian Regression System for Interval-valued data

Autorizada la entrega del proyecto del alumno:

Ruben Salgado Fernandez

EL DIRECTOR DEL PROYECTO

Carlos Mate Jimenez

Fdo.: Fecha: 12/06/2007

Vo Bo DEL COORDINADOR DE PROYECTOS

Claudia Meseguer Velasco

Fdo.: Fecha: 12/06/2007

UNIVERSIDAD PONTIFICIA DE COMILLAS

ESCUELA TECNICA SUPERIOR DE INGENIERIA (ICAI)

INGENIERO EN ORGANIZACION INDUSTRIAL

PROYECTO FIN DE CARRERA

Bayesian Regression System

for Interval-Valued Data.

Application to the Spanish Continuous

Stock Market

AUTOR: Salgado Fernandez, Ruben

MADRID, Junio 2007

Acknowlegdements

Firstly, I would like to thank my director, Carlos Mate Jimenez, PhD, for giving me the chance of

making this project. With him, I have learnt, not only about Statistics and investigation, but also

about how to enjoy with them.

Special thanks to my parents. Their love and all they have taught me in this life are the things

what have made possible being the person I am now.

Thanks to my brothers, my sister and the rest of my family for their support and for the stolen time.

Thanks to Charo for standing my bad mood in the bad moments, for supporting me and for giving

me the inspiration to go ahead.

Madrid, June 2007

i

Resumen

En los ultimos anos los metodos Bayesianos se han extendido y se han venido utilizando de forma

exitosa en muchos y variados campos tales como marketing, medicina, ingenierıa, econometrıa o mer-

cados financieros. La principal caracterıstica que hace destacar al analisis Bayesiano de datos (AN-

BAD) frente a otras alternativas es que, no solo tiene en cuenta la informacion objetiva procedente de

los datos del suceso en estudio, sino tambien el conocimiento anterior al mismo. Los beneficios que

se obtienen de este enfoque son multiples ya que, cuanto mayor sea el conocimiento de la situacion,

con mayor fiabilidad se podran tomar las decisiones y estas seran mas acertadas. Pero no siempre todo

han sido ventajas. El ANBAD, hasta hace unos anos, presentaba una serie de dificultades que limita-

ban el desarrollo del mismo a los investigadores. Si bien la metodologıa Bayesiana existe como tal

desde hace bastante tiempo, no se ha empezado emplear de manera generalizada hasta los 90’s. Esta

expansion ha sido propiciada en gran parte por el avance en el desarrollo computacional y la mejora y

perfeccionamiento de distintos metodos de calculo como los metodos de cadenas de Markov-Monte

Carlo.

En especial, esta metodologıa se ha mostrado extraordinariamente util en la aplicacion a los mod-

elos de regresion, ampliamente adoptados. En multiples ocasiones en la practica, se dan situaciones

en las que se requiere analizar la relacion entre dos variables cuantitativas. Los dos objetivos fun-

damentales de este analisis seran, por un lado, determinar si dichas variables estan asociadas y en

que sentido se da dicha asociacion (es decir, si los valores de una de las variables tienden a aumentar

-o disminuir- al aumentar los valores de la otra); y por otro, estudiar si los valores de una variable

pueden ser utilizados para predecir el valor de la otra. Un modelo de regresion trata de proporcionar

informacion sobre uno o varios sucesos a traves de su relacion con el comportamiento de otros. Con

la metodologıa Bayesiana se permite incorporar el conocimiento del investigador al analisis, haciendo

los resultados mas precisos, ya que no se aıslan los resultados a los datos de una determinada muestra.

ii

iii

Por otro lado, se esta empezando a aceptar que el siglo XXI en el ambito de la estadıstica va a

ser el siglo de la ”estadıstica del conocimiento” a diferencia del anterior que fue el de la ”estadıstica

de los datos”. El concepto basico para construir dicha estadıstica es el de dato simbolico y se han

desarrollado metodos estadısticos para algunos tipos de datos simbolicos.

En la actualidad, la exigencia del mercado, la demanda y, en general, del mundo crece. Esto

implica que cada vez sea mayor el deseo de predecir la ocurrencia de un evento o poder controlar el

comportamiento de ciertas cantidades con el menor error posible con el fin de ofrecer mejores pro-

ductos, obtener mayores beneficios o adelantos cientıficos y mejores resultados.

Sobre esta realidad, este proyecto trata de responder a dichas necesidades proporcionando una

amplia documentacion sobre varias de las tecnicas mas utilizadas y mas punteras a dıa de hoy, como

son el analisis Bayesiano de datos, los modelos de regresion y los datos simbolicos, y proponiendo

diferentes tecnicas de regresion. De igual forma se desarrollara una herramienta que permita poner

en practica todos los conocimientos adquiridos. Dicha aplicacion estara dirigida al mercado bursatil

espanol y permitira al usuario utilizarla de manera sencilla y amigable. En cuanto al desarrollo de esta

herramienta se empleara uno de los lenguajes mas novedosos y con mas proyeccion del momento: R.

Se trata, por tanto, de un proyecto que combina las tecnicas mas novedosas y con mayor proyeccion

tanto en materia teorica, como es la regresion Bayesiana aplicada a datos de tipo intervalo, como en

materia practica, como es el empleo del lenguaje R.

Abstract

In the recent years, Bayesian methods have been spread and successfully used in many and several

fields such as Marketing, Medicine, Engineering, Econometrics or Financial Markets. The main char-

acteristic that makes Bayesian Data Analysis (BADAN) remarkable compared with other alternatives

is that not only does it take into account the objective information coming from the analyzed event,

but also the pre-event knowledge. The benefits obtained from this approach are innumerable due to

the fact that the more knowledge of the situation one has, the more reliable and accurate decisions

could be taken. However, although Bayesian methodology was set long time ago, it has not been

applied in a general way until the 90’s because of the computational difficulties. Such expansion has

been mainly favoured by the advances in that field and the improvement on different calculus meth-

ods, such as Markov-chain Monte Carlo methods.

Particularly, this Bayesian methodology has been resulted in an extraordinary useful application

for the regression models, which have been adopted by large. There are many times in real life in

which it is necessary to analyse the situation between two quantitive variables. The two main objec-

tives of this analysis would be, on the one hand, to determine whether such variables are associated

and in what sense that association comes about (that is, whether the value of one of the variables

tends to rise- or to decrease- when augmented the value of the other); and on the other hand, to study

whether the values of one variable can be used to predict the value of the other. A regression model

offers information about one or more events through their relationship with the behaviour of the oth-

ers. With the Bayesian methodology it is possible to add the researcher’s knowledge to the analysis,

making thus the results be more accurate due to the fact that the results are not isolated from the data

of one determined sample.

On the other hand, in the Statistics field, it has been more and more accepted the fact that the XXI

century will be the century of the ”Statistics of knowledge” contrary to the last one, which was the

iv

v

one of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolic

data; furthermore, there have been developed more statistics methods for some types of symbolic data.

Nowadays, the requirements of the market, and the demands of the world in general, are growing

up. This implies the continuous increase of the desire for predicting the occurrence of an event or for

the ability of controlling the behaviour of certain quantities with the minimum error with the aim of

offering better products, obtaining more benefits or scientific improvements and better outcomes.

Under this frame, this project tries to responds such needs by offering a large documentation

about several of the most applied and leading nowadays techniques, such as Bayesian data analysis,

regression models, and symbolic data, and suggesting different regression techniques. Similarly, it

has been developed a tool that allow the reader to put all the acquired knowledge into practice. Such

application will be aimed to the Spanish Continuous Stock Market and it will let the user apply it eas-

ily. As far as the development of this tool is concerned, it has been used one of the more innovative

and with more projection languages of the moment: R.

So, the project is about a combination of the techniques that are most innovative and with the

most projection both in theoretical questions such as Bayesian regression applied to interval- valued

data and in practical questions such us the employment of the R language.

List of Figures

1.1 Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Univariate Normal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.1 Interval time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.1 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 73

7.2 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 74

7.3 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . . . . . 75

7.4 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.5 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.6 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 77

7.7 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 78

7.8 Bayesian Centre and Radius Method in testing test . . . . . . . . . . . . . . . . . . 80

7.9 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 81

7.10 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 81

7.11 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.12 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.13 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 85

7.14 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 85

7.15 Bayesian Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . 87

9.1 BARESIMDA MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

10.1 Interface between BARESIMDA and R . . . . . . . . . . . . . . . . . . . . . . . . 104

10.2 Interface between BARESIMDA and Excel . . . . . . . . . . . . . . . . . . . . . . 105

10.3 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

vi

LIST OF FIGURES vii

C.1 Load Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

C.2 Select File Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

C.3 Display Loaded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

C.4 Define New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

C.5 Enter New Variable Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

C.6 Display New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

C.7 Edit Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

C.8 Select Variable to Be Editted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

C.9 Enter New Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

C.10 Confirmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

C.11 New Row data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

C.12 Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

C.13 Look And Feel Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

C.14 Look And Feel Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

C.15 New Look And Feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

C.16 Type Of User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

C.17 Select Type Of User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

C.18 Non-Symbolic Classical Regression Menu . . . . . . . . . . . . . . . . . . . . . . . 131

C.19 Select Non-Symbolic Variables in Simple Regression . . . . . . . . . . . . . . . . . 131

C.20 Brief Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

C.21 Analysis Options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 132

C.22 New Prediction in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . 133

C.23 Graphics options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 133

C.24 Save options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . . . 134

C.25 Non-Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . 134

C.26 Select Variables in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 134

C.27 Analysis options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 135

C.28 Graphics options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . 135

C.29 Save options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . 136

C.30 Intercept in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . . . 136

C.31 Non-Symbolic Bayesian Simple Regression Menu . . . . . . . . . . . . . . . . . . . 136

C.32 Select Variables in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . 137

C.33 Analysis Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 137

C.34 Graphics Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 138

LIST OF FIGURES viii

C.35 Save Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . . 138

C.36 Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regres-

sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

C.37 Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression . . 139

C.38 Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression . . . 139

C.39 Non-Symbolic Bayesian Multiple Regression menu . . . . . . . . . . . . . . . . . . 139

C.40 Analysis Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140

C.41 Graphics Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140

C.42 Save Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . . 140

C.43 Model Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . 141

C.44 Symbolic Classical Simple Regression Menu . . . . . . . . . . . . . . . . . . . . . 141

C.45 Select Variables in Symbolic Classical Simple Regression . . . . . . . . . . . . . . . 141

C.46 Analysis Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142

C.47 Graphics Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142

C.48 Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . . 143

C.49 Select Variables in Symbolic Classical Multiple Regression . . . . . . . . . . . . . . 143

C.50 Analysis Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144

C.51 Graphics Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144

C.52 Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . 145

C.53 Select Variables in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145

C.54 Analysis Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145

C.55 Graphics Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 146

C.56 Model Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . 147

C.57 Symbolic Bayesian Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . 147

C.58 Select Variables in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . . 147

C.59 Graphics Options in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . 148

List of Tables

2.1 Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Comparison between Univariate and Multivariate Normal . . . . . . . . . . . . . . . 15

2.3 Conjugate distributions for other likelihood distributions . . . . . . . . . . . . . . . 16

4.1 Bayes Factor Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Sensitivity Summary I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Sensitivity Summary II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1 Multiple and Simple Regression Comparison . . . . . . . . . . . . . . . . . . . . . 40

5.2 Sensitivity analysis of parameter β . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 Sensitivity analysis of parameter σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.4 Classical and Bayesian regression comparison . . . . . . . . . . . . . . . . . . . . . 48

5.5 Main Prior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.6 Main Posterior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.7 Prior and Posterior Parameters Summary . . . . . . . . . . . . . . . . . . . . . . . . 59

5.8 Main Posterior Predictive Distributions Summary . . . . . . . . . . . . . . . . . . . 60

6.1 Multivalued Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.2 Modal-multivalued Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.1 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 74

7.2 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 76


7.4 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 78

7.5 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 80

7.6 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 82

ix

LIST OF TABLES x



7.9 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 84

7.10 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 86

11.1 Estimated material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

11.2 Amortization Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

11.3 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Contents

Acknowlegdements i

Resumen ii

Abstract iv

List of Figures vi

List of Tables x

Contents xvi

1 Introduction 11.1 Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Bayesian Data Analysis 62.1 What is Bayesian Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Bayesian Analysis for Normal and other distributions . . . . . . . . . . . . . . . . . 10

2.2.1 Univariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.3 Other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Posterior Simulation 203.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

xi

CONTENTS xii

3.2 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.5 Metropolis-Hastings sampler and its special cases . . . . . . . . . . . . . . . . . . . 25

3.5.1 Metropolis-Hastings sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.5.2 Metropolis sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5.3 Random-walk sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5.4 Independence sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.6 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Sensitivity Analysis 284.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.3 Alternative Stats to Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4 Highest Posterior Density Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.5 Model Comparison Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Regression Analysis 355.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2 Classical Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.3 The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.4 Normal Linear Regression Model subject to inequality constraints . . . . . . . . . . 48

5.5 Normal Linear Regression Model with Independent Parameters . . . . . . . . . . . . 49

5.6 Normal Linear Regression Model with Heteroscedasticity and Correlation . . . . . . 51

5.6.1 Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.6.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.7 Models Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Symbolic Data 616.1 What is symbolic data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.2 Interval-valued variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.3 Classical regression analysis with Interval-valued data . . . . . . . . . . . . . . . . . 67

6.4 Bayesian regression analysis with Interval-valued data . . . . . . . . . . . . . . . . 70

CONTENTS xiii

7 Results 727.1 Spanish Continuous Stock Market data sets . . . . . . . . . . . . . . . . . . . . . . 72

7.2 Direct Relation between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.3 Uncorrelated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8 A Guide to Statistical Software Today 888.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

8.2 Commercial Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

8.2.1 The SAS System for Statistical Analysis . . . . . . . . . . . . . . . . . . . . 89

8.2.2 Minitab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

8.2.3 BMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

8.2.4 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.2.5 S-PLUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.2.6 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8.3 Public License Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

8.3.1 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

8.3.2 BUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

8.4 Analysis Packages with Statistical Libraries . . . . . . . . . . . . . . . . . . . . . . 94

8.4.1 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

8.4.2 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8.4.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8.5 Some General Languages with Statistical Libraries . . . . . . . . . . . . . . . . . . 95

8.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8.5.2 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.6 Developed Software Tool: BARESIMDA . . . . . . . . . . . . . . . . . . . . . . . 96

9 Software Requirements Specification 989.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

9.2 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

9.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

9.3.1 Classical Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 99

9.3.2 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . 99

9.3.3 Bayesian Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 100

9.3.4 Bayesian Regression with interval- valued data . . . . . . . . . . . . . . . . 100

CONTENTS xiv

9.3.5 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

9.3.6 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

9.3.7 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

9.4 External Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

9.4.1 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

9.4.2 Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

10 Software Architecture Study 10310.1 Hardware/ Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

10.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

11 Project Budget 10611.1 Engineering Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

11.2 Investment and Elements Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

11.2.1 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

12 Conclusions 11012.1 Bayesian Regression applied to Symbolic Data . . . . . . . . . . . . . . . . . . . . 110

12.2 BARESIMDA Software Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

12.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A Probability Distributions 113A.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

A.1.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

A.1.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

A.1.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

A.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

A.2.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

A.2.2 Univariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

A.2.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

A.2.4 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

A.2.5 Inverse- Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

A.2.6 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square . . . . . . . . . . . . . . 118

CONTENTS xv

A.2.8 Univariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

A.2.9 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

A.2.10 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

A.2.11 Multivariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

A.2.12 Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

A.2.13 Inverse- Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

B Installation Guide 122B.1 From source folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

B.2 From installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

C User’s Guide 123C.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

C.1.1 Loading an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

C.1.2 Defining a new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

C.1.3 Editing an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

C.1.4 Deleting an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . 127

C.1.5 Typing in a new data row . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

C.1.6 Deleting an existing data row . . . . . . . . . . . . . . . . . . . . . . . . . . 128

C.1.7 Modifying an existing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

C.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

C.2.1 Setting the look& feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

C.2.2 Selecting the type of user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

C.3 Non Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

C.3.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 131

C.3.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 133

C.3.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 136

C.3.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 139

C.4 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

C.4.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 140

C.4.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 143

C.4.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 144

C.4.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 146

CONTENTS xvi

D Obtaining and Installing R 149D.1 Binary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

D.2 Installation from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

D.3 Package installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

E Obtaining and installing Java Runtime Environment 152E.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

E.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

E.2.1 Installation of Self-Extracting Binary . . . . . . . . . . . . . . . . . . . . . 153

E.2.2 Installation of RPM File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

E.3 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Bibliography 157

Chapter 1

Introduction

1.1 Project Motivation

Statistics is primarily concerned with the analysis of data, either to assist in arriving at an improved

understanding of some underlying mechanism, or as a means for making informed rational decisions.

Both these aspects generally involve some degree of uncertainty. The statistician’s task is then to

explain such uncertainty, and to reduce it to the extent in which this is possible. Problems of this type

occur throughout all the physical, social and other sciences. One way of looking at statistics stems

from the perception that, ultimately, probability is the only appropriate way to describe and system-

atically deal with uncertainty, as if it were the language for the logic of uncertainty. Thus, inference

statements are precisely framed as probability statements on the possible values of the unknown quan-

tities of interest (parameters or future observations) conditional on the observed, available data. The

scientific discipline based on this understanding is called Bayesian Statistics. Moreover, increasingly

needed and sophisticated models, often hierarchical models, to describe available data are typically

too much complex for conventional statistics to handle, but can be tackled within Bayesian Statistics.

In principle, Bayesian Statistics is designed to handle all situations where uncertainty is found. Since

some uncertainty is present in most aspects of life, it may be argued that Bayesian Statistics should

be appreciated and used by everyone. It is the logic of contemporary society and science. According

to [Rupp04], applying Bayesian methodology is no more discussed, but the question is when this has

to be done.

Bayesian methods have matured and improved in several ways during last fifteen years. Actually,

they are increasingly becoming attractive to researchers as well as successful applications of Bayesian

1

1. Introduction

data analysis have been appeared in many different fields, including Actuarial Science, Biometrics,

Finance, Market Research, Marketing, Medicine, Engineering or Social Science. It is not only that

the Bayesian approach produces appropriate answers to many current important problems, but also

there is an evident need for it, given the inapplicability of conventional statistics to many of them.

Thus, the main characteristic offered by Bayesian data analysis is the possibility of incorporating

researcher’s knowledge about the problem to be handled. This supposes obtaining the better and the

more reliable results as far as prior knowledge is more and more precise. But Bayesian Statistics was

restrained until mid 90’s by its computational complexity. Since then, it has had a great expansion

favoured by the development and improvement of different computational methods in this field such

as Markov chain Monte Carlo.

This methodology has shown to be extremely useful in its application to regression models, which

are widely accepted. Let us remember that the general purpose of regression analysis is to learn more

about the relationship between several independent or predictor variables and a dependent or criterion

variable. Bayesian methodology let the researcher incorporate her or his knowledge to the analysis,

improving the results since they do not only depend on the sampling data.

On the other hand, increasingly, datasets are so large that they must be summarized in some fash-

ion so that the resulting summary dataset is of a more manageable size, while still retaining as much

knowledge inherent to the entire dataset as possible. One consequence of this situation is that data

may no longer be formatted as single values such as is the case for classical data, but rather may be

represented by lists, intervals, distributions, and the like. These summarized data are examples of

symbolic data. This kind of data also lets us represent better the knowledge and beliefs having in our

mind and that it is limited and hardly to take out with classical Statistics. According to [Bill02], this

responds to the current need of changing from a Statistics of data in the past century to a Statistics of

knowledge in XXI century.

Market and demand requirements are increasing continuously throughout the time. This implies

a need of better and more accurate methods to forecast new situations and to control different quanti-

ties with the minimum error in order to supply better products, to obtain higher incomes or scientist

advantages and better results.

Dealing with this outlook, this project is intended to respond to those requirements providing a

2

1. Introduction

wide and exhaustive documentation about some of the currently more used and advanced techniques,

including Bayesian data analysis, regression models and symbolic data. Different examples related

to the Continuous Spanish Stock Market have been explained throughout this writing, making clear

the advantages of employing the described methods. Likewise a software tool with a user- friendly

graphical interface has been developed to practice and to check all the acquired knowledge.

Therefore, this is a project combining the most recent techniques with major future implications

in theoretical issues, as Bayesian regression applied to interval- valued data is, with a technological

part dealing with the problem of interconnecting two software programs: one used to show the graph-

ical user interface and the other one employed to make computations.

Regarding to a more personal motivation, when accepting this project, several factors were taken

into consideration by the author:

• A great challenge: it is an ambitious project with a high technical complexity related to both its

theoretical basis and its technological basis. This represents a very good letter of introduction

in order to be incorporated to the labour world.

• A good planning time: this project was designed to be finished before June of 2007, which

means to be able of finishing the career in June and incorporating to labour world in September.

• Some very interesting issues: on one hand, it deals with the always needed issue of forecasting

and modelling observations and situations in order to get the best possible results. On the other

hand, it focuses on the Stock Market, which meets my personal hobbies.

• A new programming language: the possibility of learning deeply a new and relatively recent

programming language, such as R, was an extra- motivation factor.

• The project director: Carlos Mate is considered a demanding and very competent director by

the students of the university.

• An investigation scholarship: The possibility of being in the Industrial Organization department

of the University learning from people such as the director mentioned above and another very

recognized professors was a great factor.

3

1. Introduction

1.2 Objectives

This project pretends to get the following aims.

• To provide a wide and rigorous documentation about the following issues: Bayesian data anal-

ysis, regression models and symbolic data. From this point, documentation about Bayesian

regression will be developed, as well as the software tool designed.

• To build a software tool in order to fit Bayesian regression models to interval- valued data,

finding out the most efficient way to design the graphical user interface. This must be as user-

friendly as possible.

• To find out the most efficient way to offer that system to future clients from the tests carried out

with the application.

• To design a survey to measure the quality of the tool and users’ satisfaction.

• The possibility to write an article for a scientific journal.

1.3 Methodology

As the title of the project indicates, the last purpose is the development of an application aimed to-

wards stock markets based on a Bayesian regression system and, therefore, some previous knowledge

is required.

The first stage is the familiarization of the Bayesian data analysis, regression models applied to

Bayesian methodology and symbolic data.

Within this phase, Bayesian data analysis will be firstly studied, trying to synthesize and to get

the most important elements. A special dedication will be given to posterior simulation and computa-

tional algorithms. Then, regression models will be treated, reviewing quickly the classical approach,

to deep later into the different Bayesian regression models, applying great part of what was explained

in Bayesian methodology. Finally, this first stage will be completed with the application to symbolic

data, paying special attention to interval- valued data.

The second stage is referred to the development of the software application, employing an incre-

mental methodology for programming and testing iterative prototypes. This methodology has been

4

1. Introduction

considered the most suitable for this project since it will let us introduce successive models into the

application.

The following figure shows the structure of the work packages the project is divided into:

Figure 1.1: Project Work Packages

5

Chapter 2

Bayesian Data Analysis

2.1 What is Bayesian Data Analysis?

Statistics can be defined as the discipline that provides us with a methodology to collect, to organize,

to summarize and to analyze a set of data.

Regarding data analysis, it can be divided into two ways of analysis: exploratory data analysis and

confirmatory data analysis. The former is used to represent, describe and analyze a set of data through

simple methods in the first stages of statistical analysis. The latter is applied to make inferences from

data, based on probability models.

In the same way, confirmatory data analysis is divided into two branches depending on the adopted

approach. The first one, known as frequentist, is used to make the inference of the data resulting from

a sampling through classical methods. The second branch, known as Bayesian, goes further in the

analysis and adds to those data the prior knowledge which the researcher has about the treated prob-

lem. Since the frequentist approach is not worthy to explain everything here, a more extended revision

of different classical methods related to the frequentist approach can be found in [Mont02].

Data Analysis

Exploratory

Confirmatory

{Frequentist

Bayesian

6

2. Bayesian Data Analysis

As far as Bayesian analysis is concerned and according to [Gelm04], the process can be divided

into the following three steps:

• To set up a full probability model, through a joint probability distribution for all observable and

unobservable quantities in a problem.

• To condition on observed data, obtaining the posterior distribution.

• Finally, to evaluate the fit of the model and the implications of the resulting posterior distribu-

tion.

f(θ, ~y), known as the joint probability distribution (or f(~y|~θ), if there are several parameters θ),

is obtained by means of

f(θ, ~y) = f(~y|θ)f(θ) (resp. f(~θ, ~y) = f(~y|~θ)f(~θ)) (2.1)

where ~y is the set of sampled data. So this distribution is the product of two densities that are referred

to as the sampling distribution f(~y|θ) (resp. f(~y|~θ)) and the prior distribution f(θ) (resp. f(~θ)).

The sampling distribution, as its name suggests, is the probability model that the researcher as-

signs to the statistics (resp. set of statistics) to be studied after the data have been observed. Here,

an important problem stands up in relation to parametric approach due to the fact that the probability

model that the researcher chooses could not be adequate. The nonparametric approach overcomes

this inconvenient as it will be seen later.

When ~y is considered fixed, so it is function of θ (resp. ~θ), the sampling distribution is called the

likelihood function and obeys the likelihood principle, which states that for a given sample of data,

any two probability models f(~y|θ) (resp. f(~y|~θ)) with the same likelihood function yield the same

inference for θ, (resp. ~θ).

The prior distribution does not depend upon the data. Accordingly, it contains the information

and the knowledge that the researcher has about the situation or problem to be solved. When there

is not any previous significant population from which the engineer can take his knowledge, that is,

the researcher has not any prior information about the problem, a non-informative prior distribution

must be used in the analysis in order to let the data speak for themselves. Hence, it is assumed that

the prior knowledge will have very little importance in the results. But most non- informative priors

7


are ”improper” in that they do not integrate to 1, and this fact can cause problems. In these cases

it is necessary to be sure that the posterior distribution is proper. Another possibility is to use an

informative prior distribution but with an insignificant weight (around zero) associated to it.

Though the prior distribution can take any form, it is common to choose particular classes of

priors that make computation and interpretation easier. These are the conjugate priors. A conjugate

prior distribution is one which, when combined with the likelihood function, gives a distribution that

falls in the same class of distributions as the prior. Furthermore, and according to [Koop03], a natural

conjugate prior has the additional property that it has the same form as the likelihood does. But it is

not always possible to find this kind of distribution and the researcher has to manage a lot of distribu-

tions to be able to give expression to his prior knowledge about the problem. This is another handicap

that the nonparametric approach reduces.

In relation to the prior, what distribution should be chosen? There are three different points of

view corresponding to different styles of Bayesians:

• Classical Bayesians consider that the prior is a necessary evil and priors that interject the least

information possible should be chosen.

• Modern parametric Bayesians considers that the prior is a useful convenience and priors with

desirable properties such as conjugacy should be chosen. They remark that given a distribu-

tional choice, prior hyper-parameters that interject the least information possible should be

chosen.

• Subjective Bayesians give essential importance to the prior, in the sense they consider it as a

summary of old beliefs. So prior distributions which are based on previous knowledge (either

the results of earlier studies or non-scientific opinion) should be chosen.

Returning to Bayesian data analysis process, simply conditioning on the observed data ~y and

applying the Bayes’ Theorem, the posterior distribution, namely f(θ|~y) (resp. f(~θ|~y)), yields:

f(θ|~y) =f(θ, ~y)f(~y)

=f(θ)f(~y|θ)

f(~y)(resp. f(~θ|~y) =

f(~θ, ~y)f(~y)

=f(~θ)f(~y|~θ)

f(~y)) (2.2)

where

f(~y) =∫ ∞

0f(θ)f(~y|θ)dθ (resp. f(~y) =

∫ ∞

0

∫ ∞

0f(~θ)f(~y|~θ)d~θ) (2.3)

8


is known as the prior predictive distribution, since it is not conditional upon a previous observation of

the process and is applied to an observable quantity.

An equivalent form of the posterior distribution displayed above omits the prior predictive distri-

bution, since it does not involve θ (resp. ~θ) and the interest is based on learning about θ (resp. ~θ).

So, with fixed ~y, it can be said that the posterior distribution is proportional to the joint probability

distribution f(θ, ~y).

Once the posterior distribution is calculated, some kind of summary measure will be required to

estimate the uncertainty about the parameter θ (resp. ~θ). This is due to the fact that the posterior

distribution is a high- dimensional object and its use is not practical for a problem. That measure

which will summarize the posterior distribution can be the posterior mean, mode, median or variance,

apart from others. Its choice will depend on the requirements of the problem. So the posterior dis-

tribution has a great importance since it lets the researcher manage the uncertainty about θ (resp. ~θ)

and provide him information about it (resp. them) taking into account both his prior knowledge and

the data collected from sampling on that parameter.

According to [Mate06], it is not difficult to deduce that posterior inference will fit in the non-

Bayesian one as long as the estimation which the researcher gives to the parameter θ (resp. ~θ) is the

same as the one resulting from the sampling.

Once the data ~y have been observed, a new unknown observable quantity y can be predicted for

the same process through the posterior predictive distribution, namely f(y|~y):

f(y|~y) =∫

f(y, θ|~y)dθ =∫

f(y|θ, ~y)dθ =∫

f(y|θ)f(θ|~y)dθ (2.4)

To sum up, the basic idea is to update the prior distribution f(θ) through Bayes’ theorem by

observing the data ~y in order to get a posterior distribution f(θ|~y). Then a summary measure or a

prediction for new data can be obtained from f(θ|~y). Table 2.1 reflects what has been said.

9


Distribution Expression Information Required Result

Likelihood f(~y|θ) Data Distribution f(~y|θ)

Prior f(θ) Researcher’s Knowledge Parameter Distribution f(θ)

Joint f(~y|θ)f(θ) Likelihood Distribution Prior Distribution f(θ, ~y)

Posterior f(θ)f(~y|θ) Prior Joint Distribution f(θ|~y)

Predictive∫

f(y|θ)f(θ|~y)dθ New Data Distribution Posterior Distribution f(y|~y)

Table 2.1: Distributions in Bayesian Data Analysis

2.2 Bayesian Analysis for Normal and other distributions

2.2.1 Univariate Normal distribution

The basic model to be discussed concerns an observable variable , normally distributed with mean µ

and unknown variance σ2:

y|µ, σ2 ' N(µ, σ2) (2.5)

As it can be seen in Appendix A, the likelihood function for a single observation is

f(y|µ, σ2) ∝ (σ2)−1/2 exp(− 1

2σ2(y − µ)2

)(2.6)

This means that the likelihood function is proportional to a Normal distribution, omitting those

terms that are constant.

Now let us consider we have n independent observations y1, y2, . . . , yn . According to the previ-

ous section, the parameters to be estimated ~θ are µ and σ2:

10


~θ = (θ1, θ2) = (µ, σ2) (2.7)

A full probability model must be set up through a joint probability distribution:

f(~θ, (y1, y2, . . . , yn)) = f(~θ, ~y) = f(~y|~θ)f(~θ) (2.8)

The likelihood function for a sample of n iid observations in this case is

f(~y|~θ) = f(~y|µ, σ2) ∝ (σ2)−1/2 exp

[− 1

2σ2

n∑

i=1

(yi − µ)2]

(2.9)

As it was recommended previously, a conjugate prior will be chosen; in fact, it will be a natural

conjugate prior. According to [Gelm04], this likelihood function suggests a conjugate prior distribu-

tion of the form

f(~θ) = f(µ, σ2) = f(µ|σ2)f(σ2) (2.10)

where the marginal distribution of σ2 is the Scaled Inverse-χ2 and the conditional distribution of µ

given σ2 is Normal (details about these distributions in Appendix A):

µ|σ2 ' N(µ0, σ2V0) (2.11)

σ2 ' Inv − χ2(µ0, s20) (2.12)

So the joint prior distribution is:

f(~θ) = f(µ, σ2) = f(µ|σ2)f(σ2) ∝ N − Inv − χ2(µ0, s20V0, ν0, s

20) (2.13)

Its four parameters can be identified as the location and scale of µ and the degrees of freedom and

scale of σ2, respectively.

As a natural conjugate prior was employed, the posterior joint distribution will have the same

form that the prior has. So, conditioning on the data, and according to Bayes’ Theorem, we have:

f(~θ|~y) = f(µ, σ2|~y) = f(~y|µ, σ2)f(µ, σ2) ∝ N − Inv − χ2(µ1, s21V1, ν1, s

21) (2.14)

where it be can shown that

11


µ1 = (V −10 + n)−1

(V −1

0 µ0 + ny)

(2.15)

V1 =(V −1

0 + n)−1 (2.16)

ν1 = ν0 + n (2.17)

ν1s21 = ν0s

20 + (n− 1)s2 +

V −10 n

V −10 + n

(y − µ0)2 (2.18)

All these formulae evidence that Bayesian inference combines prior and posterior information.

The first term means that posterior mean µ1 is a weighted mean of prior mean µ0 and empirical

mean divided by the sum of their respective weights, where these are represented by V −10 and the

simple size n.

The second term represents the importance that posterior mean has and it can be seen as a com-

promise between the sample size and the significance given to the prior mean.

The third term indicates that the degrees of freedom of posterior variance are the sum of the prior

degrees of freedom and the sample size. That is, the prior degrees of freedom can be understood as a

fictitious sample size on which the expert’s prior information is based.

The last term explains the posterior sum of square errors as a combination of prior and empirical

sum of square errors plus a term that measures the conflict between prior and posterior information.

A more detailed explanation of this last step can be found in [Gelm04], [Koop03] or [Cong06].

It is obvious that the marginal posterior distributions are:

µ|σ2, y ' N(µ1, σ2V0) (2.19)

σ2|y ' Inv − χ2(ν1, s21) (2.20)

If we integrate out σ2, the marginal for µ will be a t-distribution (see Appendix A for details):

µ|y ' tν1(µ1, s21V0) (2.21)

12


Let us see an application to the Spanish Stock market. Let us suppose that the monthly close

values associated with Ibex 35 are normally distributed. If we take the values at which the Span-

ish index closed during the first two weeks in January in 2006, it can be shown that the mean was

10893.29 and the standard deviation was 61.66. So the non- Bayesian approach would inference a

Normal distribution with the previous mean and standard deviation. Let us guess that we had asked

any analyst about the Ibex 35 evolution in January, he would have affirmed strongly that it would

decrease slightly, the mean close value at the end of the month would be around 10870 and, hence,

the standard deviation would be higher, around 100. Then, according to the previous formulas, the

posterior parameters would be

µ1 = (100 + 10)−1(100× 10870 + 10× 10893.29) = 10872.12

V1 = (100 + 10)−1 = 0.0091

ν1 = 100 + 10 = 110

s1 =

√(100× 1002 + 9× 61.66 + 1000

110 (10893.29− 10870)2)110

= 95.60

This means that there is a difference of almost 20 points between the Bayesian estimation and the

non-Bayesian for the mean close value of January. When the month of January would have passed, we

could compare both results and we could note that the Bayesian estimation was closer to the finally

real mean close value and standard deviation: 10871.2 and 112.44. In figure 2.1, it can be seen how

the blue line representing the Bayesian estimation is closer to the cyan line representing the final real

mean close value than the red line representing the frequentist estimation:

2.2.2 Multivariate Normal distribution

Now, let us consider that we have an observable vector ~y of d components with the multivariate

Normal distribution:

y ' N(µ,Σ) (2.22)

where the first parameter is the mean column vector and the second one is the variance-covariance

matrix.

Extending what was said above to the multivariate case, we have:

13


10000 10200 10400 10600 10800 11000 11200 11400 11600 11800 120000

1

2

3

4

5

6

7x 10

−3

Frequentist ApproachBayesian ApproachReal Mean Colse Value in January

Figure 2.1: Univariate Normal Example

f(y|µ,Σ) ∝ Σ−1/2 exp[−1

2(y − µ)′Σ−1(y − µ)

](2.23)

And for n iid observations:

f(y1, y2, . . . , yn|µ,Σ) ∝ Σ−n/2 exp

[−1

2

n∑

i=1

(yi − µ)′Σ−1(yi − µ)

](2.24)

A multivariate generalization of the Scaled-Inverse χ2 is the Inverse Wishart distribution (see

details in Appendix A), so the prior joint distribution is

f(~θ|~y) = f(µ,Σ|~y) ∝ N − Inv −Wishart

(µ0,

Λ0

k0, ν0, Λ0

)(2.25)

due to the fact that

µ|Σ ' N

(µ0,

Σk0

)(2.26)

Σ ' Inv −Wishart(ν0, Λ−1

0

)(2.27)

14


Univariate Normal Multivariate Normal

Expression y ' N(µ, σ2) y ' N(µ,Σ)

Parameters to estimate µ, σ2 µ,Σ

µ|σ2 ' N(µ0,

σ2

k0

)µ|Σ ' N

(µ0,

Σk0

)

Prior Distributions σ2 ' Inv − χ2(ν0, σ

20

)Σ ' Inv −Wishart

(ν0,Λ−1

0

)

µ, σ2 ' N − Inv − χ2(µ0,

σ20

k0, ν0, σ

20

)µ,Σ ' N − Inv −Wishart

(µ0,

Σk0

, ν0, Λ−10

)

µ|σ2 ' N(µ1,

σ2

k1

)µ|Σ ' N

(µ1,

Σk1

)

Posterior Distributions σ2 ' Inv − χ2(ν1, σ

21

)Σ ' Inv −Wishart

(ν1,Λ−1

1

)

µ, σ2 ' N − Inv − χ2(µ1,

σ21

k1, ν1, σ

21

)µ,Σ ' N − Inv −Wishart

(µ1,

Λ1k1

, ν1, Λ1

)

Table 2.2: Comparison between Univariate and Multivariate Normal

The posterior results are the same that were told for the univariate case but applying these distri-

butions. For those interested readers, more information in [Gelm04] or [Cong06].

A summary is shown in Table 2.2 in order to get the most important ideas.

2.2.3 Other distributions

As it has just been made with the Normal distribution, a Bayesian analysis for other distributions could

be done. For instance, the exponential distribution is commonly used in reliability analysis. Because

of this project will deal with the Normal distribution for the likelihood, it will not be explained in detail

the analysis with other distributions. Table 2.3 shows the conjugate prior and posterior distributions

15


for other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06].

Likelihood Parameter Conjugate Prior Hyperparameters Posterior Hyperparameters

Bin(y|n, θ) θ Beta α, β α + y, β + n− y

P (y|θ) θ Gamma α, β α + ny, β + n

Exp (y|θ) θ Gamma α, β α + 1, β + y

Geo(y|θ) θ Beta α, β α + 1, β + y

Table 2.3: Conjugate distributions for other likelihood distributions

2.3 Hierarchical Models

Hierarchical data arise when they are structured or related among them. When this occurs, standard

techniques either assume that these groups belong to entirely different populations or ignore the ag-

gregate information entirely.

Hierarchical models provide a way of pooling the information for the disparate groups without

assuming that they belong to precisely the same population.

Suppose we have collected data about some random variable Y from m different populations with

n observations for each population.

Let yij represent observation j from population i. Now suppose yij ' f(~θi), where ~θi is a vector

of parameters for population i. Furthermore, ~θi ' f(Θ), where Θ may also be a vector. Until this

point, we have only rewritten what it was said previously.

16


Now let us extend the model, and assume that the parameters Θ11, Θ12 that govern the distribution

of the θ’s are themselves random variables and assign a prior distribution to these variables as well:

Θ ' f(~ψ) (2.28)

where Θ is called the hyperprior. The vector parameter ~ψ for the hyperprior may be ”known” and

represents our prior beliefs about Θ or, in theory; we can also assign a probability distribution for

these quantities as well, and proceed to another layer of hierarchy.

According to [Gelm04], the idea of exchangeability will be used to create a joint probability

distribution model for all the parameters ~θ. A formal definition to explain what exchangeability

consists of is:

”The parameters ~θ1, ~θ2, . . . , ~θn are exchangeable in their joint distribution if f(~θ1, ~θ2, . . . , ~θn) is

invariant to permutations in the index 1, 2, . . . , n”.

This means that if no information other than the data is available to distinguish any of the ~θi from

any of the others, and no ordering of the parameters can be made, one must assume symmetry among

the parameters in the prior distribution. So we can treat the parameters for each sub-population as

exchangeable units. This can be formulated by:

f(~θ1, ~θ2, . . . , ~θn|Θ

)= Πl

i=1f(~θi|Θ

)(2.29)

The prior joint distribution is now:

f(~θ1, ~θ2, . . . , ~θn, Θ

)= f

(~θ1, ~θ2, . . . , ~θn|Θ

)f(Θ) (2.30)

And conditioning on the data, it yields:

f(~θ1, ~θ2, . . . , ~θn|~y

)= f

(~θ1, ~θ2, . . . , ~θn, Θ

)f

(~y|~θ1, ~θ2, . . . , ~θn, Θ

)(2.31)

Perhaps the most important point in practice is that non-hierarchical models are usually inappro-

priate for hierarchical data, while non-hierarchical data can be modelled following the hierarchical

structure and assigning concrete values to the hyperprior parameters.

This kind of models will be used in Bayesian regression models with autocorrelated errors, as it

will be seen in the following chapters.

17


For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [Gelm04]

and [Rossi06].

2.4 Nonparametric Bayesian

To overcome the limitations that have been mentioned throughout this chapter, it is the nonparametric

approach which achieves to get through and to reduce the restrictions of the parametric approach.

This kind of analysis can be performed through the so-called Dirichlet Process, which allows us to

express in a simple way the prior distributions or the distribution family of F , where F is the distri-

bution function of the studied variable. This process has a parameter, called α, which is transformed

into a distribution probability.

According to [Mate06], a Dirichlet Process for F (t) requires to know:

• A previous proposal for F (t), F0(t), that corresponds to the distribution function that remarks

the prior knowledge which the engineer has and it is denoted by

F0(t) =α(t)M

(2.32)

• A measure of the confidence about the previous proposal, denoted by M , and whose values can

vary between 0 and ∞, depending on whether there is a total confidence in the data or in the

previous proposal respectively.

It can be demonstrated that the posterior distribution for F (t), Fn(t), with a sampling over n data,

is given by

Fn(t) = pnFn(t) + (1− pn)Fn(t) (2.33)

where Fn(t) is the empirical distribution function and pn = MM+n .

A more detailed information about the nonparametric approach and how Dirichlet processes are

used can be found in [Mull04] or [Gosh03].

18


With this approach not only the limitation of the parametric approach related to the probability

model of the variable to study is avoided, since no hypothesis is required, but also it allows us to

confer a quantified importance to the prior knowledge which the engineer gives, depending on the

confidence on the certainty about this knowledge.

19

Chapter 3

Posterior Simulation

3.1 Introduction

A practical problem with Bayesian inference is the difficulty of summarizing realistically complex

posterior distributions. In most practical problems, posterior densities will not take the form of any

well-known and understood density, so summary statistics, such as the posterior mean and variance of

parameters of interest, will not be analytically available. It is at this point where the importance of the

Bayesian computation arises and any computational tools are required to gain meaningful inference

from the posterior distribution. Its importance is such that the computing revolution of the last 20

years has led to a blossoming of Bayesian methods in many fields such Econometrics, Ecology or

Health.

Regarding to this, the most transcendent simulation methods are the Markov chain Monte Carlo

methods (MCMC). MCMC methods date from the original work of [Metr53], who were interested

in methods for the efficient simulation of the energy levels of atoms in a crystalline structure. The

original idea was subsequently generalized by [Hast70], but its true potential was not fully realized

within the statistical literature until [Gelf90] demonstrated its application to the estimation of inte-

grals commonly occurring in the context of Bayesian statistical inference.

As [Berg05] points up, the underlying principle is simple: if one wishes to sample randomly from

a specific probability distribution then design a Markov chain whose long-time equilibrium is that

distribution, write a computer program to simulate the Markov chain, run it for a time long enough

to be confident that approximate equilibrium has been attained, then record the state of the Markov

20

3. Posterior Simulation

chain as an approximate draw from equilibrium.

The technique has been developed strongly in different fields and with rather different emphases

in the computer science community concerned with the study of random algorithms (where the em-

phasis is on whether the resulting algorithm scales well with increasing size of the problem), in the

spatial statistics community (where one is interested in understanding what kinds of patterns arise

from complex stochastic models), and also in the applied statistics community (where it is applied

largely in Bayesian contexts, enabling researchers to formulate statistical models which would other-

wise be resistant to effective statistical analyses).

The development of the theoretical work also benefits the development of statistical applications.

The MCMC simulation techniques have been applied to develop practical statistical inferences for

almost all problems in (bio) statistics, for example, the problems in longitudinal data analysis, im-

age analysis, genetics, contagious disease epidemics, random spatial pattern, and financial statistical

models such as GARCH and stochastic volatility.

The simplicity of the underlying principle of MCMC is a major reason for its success. However

a substantial complication arises as the underlying target problem becomes more complex; namely,

how long should one run the Markov chain so as to ensure that it is close to equilibrium? According to

[Gelm04], with n = 100 independent samples should be enough for reasonable posterior summaries,

but in some cases more samples are needed to assure more accuracy.

3.2 Markov chains

The essential theory required in developing Monte Carlo methods based on Markov chains is pre-

sented here. The most fundamental result is that certain Markov chains converge to a unique invariant

distribution, and can be used to estimate expectations with respect to this distribution. But in order to

reach this conclusion, some concepts need to be defined firstly.

A Markov chain is a series of random variables, X0, . . . , Xn, also called a statistic process, in

which only the value of Xn−1 influences the distribution of Xn. Formally:

P (Xn = xn|X0 = x0, . . . , Xn−1 = xn−1) = P (Xn = xn|Xn−1 = xn−1) (3.1)

21


where the Xn−1 have a common range called the state space of the Markov chain.

The common language to refer to different situations in which a Markov chain can be found is

the following. If Xn = i, it is said that the chain is in the state i in the step n or that it has the value

i in the step n. This language confers the chain certain dynamic view, which is corroborated by the

main tool to study it: the transition probabilities P (Xn+1 = j|Xn = i), which are represented by the

transition matrix P = (Pij) with Pij = P (Xn+1 = j|Xn = i) . This is used to show the probability

of changing of state i to state j.

Due to the fact that in major interesting applications Markov chains are homogeneous, the transi-

tion matrix can be defined from the initial probability, P0 = P (X1 = j|X0 = i). Regarding to this, a

Markov chain Xt is homogeneous if P (Xn+1 = j|Xn = i) = P (X1 = j|X0 = i) for all n, i, j.

Furthermore, using Chapman- Kolmogorov equation, it can be shown that, given the transition

matrixes P and, for step n, Pn of a homogenous Markov chain, then Pn = Pn.

On the other hand we will see the concepts of invariant or stationary distribution, ergodicity and

irreducibility, which are indispensable to reach the main result. It will be assumed that Xt is a ho-

mogenous Markov chain.

Then, vector P is an invariant distribution of the chain Xt if satisfies:

a) πj ≥ 0 such as∑

j πj = 1.

b) π = πP .

That is, a stationary distribution over the states of a Markov chain is one that persists forever once

it is reached.

The concept of ergodic state requires making other definitions clear such as recurrence and aperi-

odicity:

• The state i is recurrent if P (Xn = i|X0 = i) = 1 for any n ≥ 1. Otherwise, it is transient.

Moreover, i will be positive recurrent if the expected (average) return time is finite, and null

recurrent if it is not.

22


• The period of a state i, denoted by d, is defined as di = mcd(n : [Pn]ii > 0). The state i is

aperiodic if di = 1, or periodic if it is greater.

Then a state is ergodic if it is positive recurrent and aperiodic. The last concept to define is the

irreducibility. A set of states C ∈ S, where S is the set of all possible states, is irreducible if for all

i, j ∈ C:

• i and j have the same period.

• i is transient if and only if j is transient.

• i is recurrent if and only if j is null recurrent.

Now, having all these concepts in mind, we can know if a Markov chain has a stationary distribu-

tion with next lemma:

Lema 3.2.1. Let Xt be a homogenous and irreducible Markov chain. The chain will have only one

stationary distribution if, and only if, all the states are positive recurrent. In that case, it will have

inputs given by πi = µi−1, where µi denotes the expected return time of the state i.

The relation with the long time behaviour is given by this other lemma:

Lema 3.2.2. Let Xt be a homogenous, irreducible and aperiodic Markov chain. Then

[Pn]ij −→ 1µi

for all i, j ∈ S as n 99K ∞ (3.2)

3.3 Monte Carlo Integration

Monte Carlo integration estimates the integral E[g(θ)] by obtaining samples θt, t = 1, . . . , n from

the posterior distribution p(θ|y) and averaging

E[g(θ)] =1n

n∑

t=1

g(θt) (3.3)

where the function g(θ) represents the function of interest to estimate. Note that if samples

θt, t = 1, . . . , n has p(θ|y) as its stationary distribution, the θt form a Markov chain.

23


3.4 Gibbs sampler

In many models, it is not easy to draw directly from the posterior distribution p(θ|y). However, if the

parameter θ is partitioned into several blocks as θ = (θ1, . . . , θp) where θj for j = 1, . . . , p, then the

full conditional posterior distributions, p(θ1|y, θ2, . . . , θp), . . . , p(θp|y, θ1, . . . , θp−1) , could be sim-

ple to draw from to obtain a sequence θ1, . . . , θp. For instance, in the Normal linear regression model

it is convenient to set j=2, with θ1 = β and θ2 = σ2, and the full conditional distributions would be

p(θ1 = β|y, θ2 = σ2) and p(θ2 = σ2|y, θ1 = β), which are very useful in the Normal independent

model which will be explained later.

The Gibbs sampler is defined by iterative sampling from each of those p conditional distributions:

1. Set a starting value, θ0 = (θ02, . . . , θ

0p).

2. Take random draws

- θ11 from p(θ1|y, θ0

2, . . . , θ0p)

- θ12 from p(θ2|y, θ1

1, . . . , θ0p)

-...

- θ1p from p(θp|y, θ1

1, . . . , θ1p−1)

3. Repeat step 2 as necessary.

4. Reject those θ affected by θ0 = (θ02, . . . , θ

0p), that is the first p − 1 draws, and average the rest

of draws applying the Monte Carlo integration.

For instance, in the Normal regression model we would have:

1. Set a starting value, θ0 = (θ02 = (σ2)02).

2. Take random draws

- θ11 = β1

1 from p(θ1 = β|y, θ02 = (σ2)02)

- θ12 = (σ2)12 from p(θ2 = σ2|y, θ1

1 = β)

3. Repeat step 2 as necessary.

4. Eliminate those θ11 = β1

1 and average the rest of draws applying the Monte Carlo integration.

24


Those values dropped which are affected by the starting point are called the burn-in. Generally,

any set of values which are discarded in a MCMC simulation is called the burn-in. The size of the

burn-in period is the subject of current research in MCMC methods.

As the state of each draw depends on the state of the previous one, the sequence is a Markov

chain. More detail information can found in [Chen00], [Mart01] or [Rossi06].

3.5 Metropolis-Hastings sampler and its special cases

3.5.1 Metropolis-Hastings sampler

The Metropolis-Hastings method is adequate to simulate models that are not conditionally conjugate.

Furthermore, it can be combined with the Gibbs sampler to simulate posterior distributions where

some of the conditional posterior distributions are easy to sample from and other ones are not. As

the algorithms above explained, this is based on formulating a Markov chain, but using a proposal

distribution, q(.|θt), which depends on the current state θt, to generate a new proposed sample θ∗.

This proposal is accepted as the next state with probability given by

α(θt, θ∗) = min

{1,

p(θ∗|y)q(θt|θ∗)p(θt|y)q(θ∗|θt)

}(3.4)

If the point θ∗ is not accepted, then the chain does not move and θt+1 = θt. According to

[Mart01], the steps to follow are:

1. Initialize the chain to θ0 and set t=0.

2. Generate a candidate point θ∗ from q(.|θt).

3. Generate U from a uniform (0,1) distribution.

4. If U ≤ α(θt, θ∗) then set θt+1 = θ∗, else set θt+1 = θt.

5. Set t=t+1 and repeat steps 2 trough 5.

6. Take the average of the draws g(θ1), . . . , g(θn)

Note that it should be, not only recommendable, but also essential that the proposal distribution

q(·|θt) were easy to sample from.

25


There are some special cases of this method. The most important are briefly explained below. As

well as those, it can shown according to [Gelm04] that the Gibbs sampler is another special case of

the Metropolis-Hastings algorithm where the proposal point is always accepted.

3.5.2 Metropolis sampler

This method is a particular case of the Metropolis-Hastings sampler where the proposal distribution

has to be symmetric. That is,

q(θ∗|θt) = q(θt|θ∗) (3.5)

for all θ∗ and θt. Then, the probability of accepting the new point is


{1,

p(θ = θ∗|y)p(θ = θt|y)

}(3.6)

The same procedure seen in the Metropolis-Hastings sampler has to be followed.

3.5.3 Random-walk sampler

This special case refers to a proposal distribution of the form

q(θ∗|θt) = q(|θt − θ∗|) (3.7)

And the candidate point is θ∗ = θt + z, where z is called the increment random variable from q.

Then, the probability of accepting the new point is


{1,

p(θ = θ∗|y)p(θ = θt|y)

}(3.8)


3.5.4 Independence sampler

The last variation has a proposal distribution such that

q(θ∗|θt) = q(θ∗) (3.9)

So it does not depend on θt. Then, the probability of accepting the new point is

26



{1,

p(θ∗|y)p(θt)p(θt|y)p(θ∗)

}= min

{1,

w(θ∗)w(θt)

}(3.10)

where

w(θ) =p(θ|y)q(θ)

(3.11)

It is important to remark that to make this method works well, the proposal distribution q should

be very similar to the posterior distribution p(θ|y).


3.6 Importance sampling

Importance sampling is a variance reduction technique that can be used in the Monte Carlo method.

The idea behind this method is that certain values of the input random variables in a simulation have

more impact on the parameter being estimated than others. So instead of taking a simple average,

importance sampling takes a weighted average.

Let q(θ) be a density from which is easy to obtain random draws θ(s) for s = 1, . . . , S. Then q(θ)

is called the importance function, and the importance sampling can be defined:

The function gs =PS

s=1 w(θ(s))g(θ(s))PSs=1 w(θ(s))

, where w(θ(s)) = p(θ=θ(s)|y)

q(θ=θ(s)), converges to E[g(θ)|y] as

S −→ inf .

In fact, w(θ(s)) can be formulated by w(θ(s)) = p∗(θ|y)q∗(θ|y) , where the new densities are proportional

to the old ones.

For more information and details about Markov chain Monte Carlo methods and their application,

the reader is referred to [Chen00], [Gilk95], [Berg05] and [Kend05].

27

Chapter 4

Sensitivity Analysis

4.1 Introduction

There will be many times where the researcher, having selected a model, wants to consider the pos-

sibility of choosing another model or simply to compare with it. Then it is necessary any tool that

help him to compare both models, and to select one of them. This will be useful to make the variables

selection too in the regression models. In this section, the Bayesian Model Comparison is briefly

discussed, remarking those methods which will be more useful.

In the Bayesian field, common methods for model comparison are based on the following: sepa-

rate estimation, comparative estimation and simultaneous estimation.

Comparative estimation is based on distance measures such as entropy distance, and the underly-

ing idea is that the more parsimonious model may be preferred between two models whose distance

between their posterior or posterior predictive distributions is sufficiently small.

Simultaneous model estimation let us compare many models at the same time, and the main meth-

ods are reversible jump MCMC (RJMCMC) and birth and death MCMC (BDMCMC).

Separate estimation compares two models not necessarily nested, and the most used terms are the

posterior predictive distributions and the posterior probability of the model. Since methods which can

be considered to be into this type are the most accepted, we will explain some of them, remarking the

most important ones.

28

4. Sensitivity Analysis

4.2 Bayes Factor

This is probably the dominant method of Bayesian model testing. It is the analogue of likelihood ratio

tests within the frequentist framework, and the basic intuition is that prior and posterior information

are combined in a ratio that provides evidence in favour of one model specification versus another.

Let us suppose we have two models to compare, M1 and M2. Let p(M1) and p(M2) be the

prior probabilities for the model M1, M2, respectively, and p(M1|y) and p(M2|y) be the posterior

probabilities for the model M1, M2, respectively. Then the Bayes Factor is:

B(y) =p(y|M1)p(y|M2)

=p(M1|y)p(M1)p(M2|y)p(M2)

(4.1)

This means that the Bayes Factor chooses the very model for which the marginal likelihood of the

data, namely p(y|Mi), is maximum. Therefore, the value of a factor gives evidence of the preference

between two models.

According to [Jeff61], the following interpretation is suggested:

Bayes Factor Interpretation

B(y) < 110 Strong evidence for M2

110 < B(y) < 1

3 Moderate evidence for M2

13 < B(y) < 1 Weak evidence for M2

1 < B(y) < 3 Weak evidence for M1

13 < B(y) < 10 Moderate evidence for M2

B(y) > 10 Strong evidence for M1

Table 4.1: Bayes Factor Interpretation

29


The marginal likelihood usually involves an integral which can be analytically evaluated only for

some special cases. So, while Bayes Factors are rather intuitive, they are often quite difficult or even

impossible to calculate from a practical point of view. Because of this, there are other alternatives to

this method.

4.3 Alternative Stats to Bayes Factor

Let θ be the posterior mean of the posterior distribution and let us assume that the Bayes estimate for

the parameters ~θ is approximately equal to the maximum likelihood estimate. Then, the following

stats, from which some of them are used in frequentist statistics, could be useful diagnostics:

• The likelihood Ratio, which will always favour the unrestricted model, and where the ratio is:

Ratio = −2[log(p(θRestricted|y))− log(p(θFull|y))] (4.2)

The ratio is distributed as a χ2p, where p is the number of parameters, including the intercept.

• Akaike Information Criterion (AIC), where a ratio between AIC1 (AIC for M1) and AIC2

(AIC for M2) less than 1 indicates that M1 is better. This method let the models not have to be

nested and it favours more complicated models. The stat is:

AIC = −2log(p(θ|y)) + 2p (4.3)

where p is the number of parameters, including the intercept. It is used to be better than the

previous one.

• The Bayesian Information Criterion (BIC), which is also known as Schwarz Criterion (SC),

Schwarz Information Criterion (SIC) or Schwarz Bayesian Criterion (SBC). As it occurred

with the AIC, this method can be used for non- nested models. The BIC is:

BIC = −2log(p(θ|y)) + plog(n) (4.4)

where p is the number of parameters, including the intercept, and n is the sample size. Given any

two estimated models, the model with the lower value of BIC is the one to be preferred. Since

this method promotes model parsimony by penalizing models with increased model complexity

(larger p) and sample size, say n, it may be preferred to the AIC.

30


• The Deviance Information Criterion (DIC), which is a new statistic introduced by the devel-

opers of the WinBugs software, who explained it in a detailed way in [Spie03]. The main and

most important difference with the previous methods is that this is not an approximation of the

Bayes Factor. It is a hierarchical modelling generalization of the AIC and the BIC, and it is

particularly useful when the posterior distributions have been obtained by simulation. The DIC

is:

DIC =−4L

L∑

l=1

log(p(y|θl)) + 2log(p(y|θ)) (4.5)

where θL is the draw which has been obtained by simulating the posterior distribution in the

L iteration. This method also penalizes against higher dimensional models, and it may be

preferred to previous ones, mainly in linear models context.

4.4 Highest Posterior Density Intervals

All the techniques mentioned above typically require the elicitation of informative priors. However,

there could be Bayesians who were interested to do model comparison with a non-informative prior.

In such case, there are other techniques which can be used. Since the most common one in regression

analysis is the Highest Posterior Density Interval (HPDI), we will only explain this method and let

will the interested reader reference to the below citations.

Before defining the idea of HPDI is required to make the concept of credible set clear. Let us

assume that ω is the region over which the coefficients β are defined. Then, C ⊆ ω is a 100(1−α)%

credible set with respect to β if:

p(β ∈ C|y) = 1− α (4.6)

Since there are commonly numerous credible intervals, it is used to choose the one with smallest

area, namely the Highest Posterior Density Interval.

Formally, a 100(1−α)% highest posterior density interval for α is a 100(1−α)% credible inter-

val for θ with the property that it has a smaller area than any other 100(1−α)% credible interval for β.

31


This is the Bayesian analogue of confidence intervals within frequentist framework, but now the

meaning is more in line with commonsense.

More information about all these methods and other variants of the Bayes Factor can be found in

a more detailed way in [Aitk97], [Berg98], [Chen00], [Cong06] or [Koop03].

4.5 Model Comparison Summary

A model comparison summary can be found in Tables 4.2 and 4.3 where the mark symbols mean:

• * Good

• ** Better

• *** Still better

• **** Probably the best

32


Met

hod

Form

ulae

Inte

rpre

tatio

nM

ark

Bay

esFa

ctor

B(y

)=

p(y|M

1)

p(y|M

2)

B(y

)<

1 10

Stro

ngev

iden

cefo

rM2

*1 10

<B

(y)

<1 3

Mod

erat

eev

iden

cefo

rM2

1 3<

B(y

)<

1W

eak

evid

ence

forM

2

1<

B(y

)<

3W

eak

evid

ence

forM

1

3<

B(y

)<

10M

oder

ate

evid

ence

forM

1

B(y

)>

10St

rong

evid

ence

forM

1

Lik

elih

ood

Rat

ioR

atio

=−2

[ log

( p(

ˆβ

Res

tric

ted|y)

) −lo

g( p

(ˆ

βF

ull|y)

)]R

ati

o>

χ2 p

Rej

ectt

here

stri

cted

mod

el*

Rati

o<

χ2 p

Rej

ectt

here

stri

cted

mod

el

AIC

AIC

=−2

[ log

( p(β|y)

) +2p

]A

IC

1A

IC

2<

1M

1is

bette

rtha

nM

2**

AIC

1A

IC

2>

1M

2is

bette

rtha

nM

1

Tabl

e4.

2:Se

nsiti

vity

Sum

mar

yI

33


Met

hod

Form

ulae

Inte

rpre

tatio

nM

ark

BIC

BIC

=−2l

og( p

(β|y)

) +plo

g(n

)B

IC

1B

IC

2<

1M

1is

bette

rtha

nM

2**

*B

IC

1B

IC

2>

1M

2is

bette

rtha

nM

1

DIC

DIC

=−4

1 L

∑L i=

1lo

g( p

(y|β

L)) +

2log

( p(y|β

))D

IC

1D

IC

2<

1M

1is

bette

rtha

nM

2**

**D

IC

1D

IC

2>

1M

2is

bette

rtha

nM

1

HPD

IH

PDI=

p(β∈

C|y)

=1−

αw

ithth

esm

alle

star

ea

The

reis

apr

obab

ility

of

100(

1−

α)%

ofβ

bein

g

inth

ere

gion

C

****

Tabl

e4.

3:Se

nsiti

vity

Sum

mar

yII

34

Chapter 5

Regression Analysis

5.1 Introduction

Regression analysis is a statistical tool for the investigation of relationships between variables, such

as models the relationship between one or more random variables y, called the response variables,

and an independent variable or variables x, called the predictors. That is, it allows us to examine

the conditional distribution of y given x, denoted by p(y|β, x), when the n observations (xi, yi), are

exchangeable.

Applications of regression analysis exist in almost every field. In economics, the dependent vari-

able might be Ibex 35 index and the independent variables could be Dow Jones and FTSE 100 indexes.

In political science, the dependent variable might be a state’s level of welfare spending and the inde-

pendent variables measures of public opinion and institutional variables that would cause the state to

have higher or lower levels of welfare spending. In sociology, the dependent variable might be a mea-

sure of the social status of various occupations and the independent variables characteristics of the

occupations (pay, qualifications, etc.). In psychology, the dependent variable might be individual’s

racial tolerance as measured on a standard scale and with indicators of social background as indepen-

dent variables. In education, the dependent variable might be a student’s score on an achievement test

and the independent variables characteristics of the student’s family, teachers, or school.

Before explaining the Bayesian regression, it will be reviewed the classical regression model,

focusing on those parts useful for the former.

35

5. Regression Analysis

5.2 Classical Regression Model

The simplest version of this model is the Normal linear model, where the variable y given X is a

Normal distribution whose mean is a linear function of X:

E(yi|β,X) = β0 + β1xi1 + · · ·+ βpxip for all i = 1, . . . , n. (5.1)

Even though the mean of y is a linear function of X , the real and the observed data do not fit in,

and this is due to a random error, namely ε, so the appropriate form to reach a probabilistic linear

model is through

yi = β0 + β1xi1 + · · ·+ βpxip + εi for all i = 1, . . . , n. (5.2)

where εi is the term of the random error, which has a Normal distribution with mean 0 and variance

σ2. Due to the fact that the random variable yi is the result of the addition of a constant (the mean)

and a random variable which has Normal distribution, yi follows a Normal distribution:

yi ' N(β0 + β1xi1 + · · ·+ βpxip, σ2) for all i = 1, . . . , n (5.3)

When the variance of y given X , β is assumed to be constant over all observations, the model

will be called ordinary linear regression model.

In a matrix notation, the Normal linear model can be denoted by

Y = Xβ + ε (5.4)

and

Y ' N(Xβ, σ2I) (5.5)

where:

Y =

y1

y2

...

yn

X =

1 x11 . . . x1p

1 x21 . . . x1p

......

. . ....

1 xn1 . . . xnp

β =

β0

β1

...

βp

ε =

ε0

ε1...

εp

and I is the identity matrix.

It can be shown that the ordinary least squares estimate of β, namely β, is

36


β = (X ′X−1)X ′Y =

β0

β0

...

β0

(5.6)

where

X ′X =

n∑n

i=1 xi1 . . .∑n

i=1 xik∑ni=1 xi1

∑ni=1 x2

i1 . . .∑n

i=1 xikxik

......

. . ....

∑ni=1 xik

∑ni=1 xikxi1 . . .

∑ni=1 x2

ik

X ′Y =

∑ni=1 yi∑n

i=1 xi1yi

...∑n

i=1 xikyi

As well, it can be shown that

E(β) = β (5.7)

Furthermore, the variances of β are proportional to the elements of the matrix (X ′X)−1, denoted

by C, which multiplied by the constant σ2 represents the covariance matrix. The elements of the

diagonal of that matrix are the variances of

V ar(βj) = σ2Cjj for all j = 0, 1, . . . , p. (5.8)

where C = (X ′X)−1.

Likewise, the classical estimation of σ2 is given in terms of the sum of squares error, SSE =∑n

i=1(yi − yi)2, and is given by the mean squares error:

σ2 = MSE =SSE

n− p=

(YX β)′(YX β)n− p

=Y ′Y − β′X ′Y

n− p(5.9)

where n is the number of observations and p corresponds to the number of parameters β.

Regarding individual regression coefficients β, there will be sometimes where to make hypothesis

tests about them can be interesting in order to evaluate the potential value of each regressor variable

of the model. The statistic to use in these cases is

T0 =βj√σ2Cjj

(5.10)

37


where Cjj is the element of the diagonal of the matrix XX ′ corresponding to βj . So the null hypoth-

esis will be rejected if |T0| > tn−p, α2

.

Finally, once the model has been estimated and validated, one of its more important applications

consists of new predictions about the response variable Y when a new explanatory variable X∗ is

observed. In this case, a point estimate would be

Y ∗ = X∗′ β (5.11)

and a confidence interval for this future observation will be

Y ∗ ± tn−p, α2

√σ2(1 + X∗′(X ′X)−1X∗) (5.12)

where

X∗′ = [x∗1 x∗2 . . . x∗k] (5.13)

These results can be found in a more detailed way in [Mont02], [Zamo01] or [Mate95].

To understand better all that has been said above, let us see a practical application in the Stock

Markets. Let us suppose we are interesting in investigating the relationship between Ibex 35 index

and Dow Jones, FTSE 100 and Dax indexes the previous day. For such purpose, we have the points

(taken as the mean of the daily maximum and minimum points) from January to October in 2006; this

is during the first ten months in 2006.

The model to adjust is:

IBEX35t = β1DowJonest−1 + β2FTSE100t−1 + β3DAXt−1 + εt

where

εt ' N(0, σ2)

The estimates β are calculated according to what said before resulting:

β1

β2

β3

=

1.0147

−2.0085

2.1082

38


The estimate for the variance σ2 is:

σ2 = 332.182

So the model calculated is:

IBEX35t = 1.0147×DowJonest−1 − 2.0085× FTSE100t−1 +

+2.1082×DAXt−1 + εt

where

εt ' N(0, 332.182)

This indicates that when Dow Jones or DAX goes up, Ibex 35 will increase the next day too.

However, when FTSE100 arises, Ibex 35 will decrease the next day.

If we use this model to predict the value which Ibex 35 will have on November, 1st, when DOW

Jones, FTSE 100 and DAX values are known the previous day, we have:

IBEX35t = 1.0147× 12067− 2.0085× 6155 + 2.1082× 6287 = 13137

Finally, a comparison between the multiple and the simple Normal linear regression models is

shown in Table 5.1 indicating the different parameters to use in each case. The goal of this compar-

ison is to make clear that the simple Normal regression is a particular case of the multiple Normal

regression where there is only a regressor variable or predictor.

5.3 The Bayesian Approach

The main difference between classical and Bayesian approach of the regression analysis is that the

latter treats the parameters like random variables which have a distribution. The aim of Bayesian

approach is to make inferences through the posterior distribution based on a prior distribution for

the parameters β and σ2 of the Normal linear model and to provide a predictive distribution for the

model’s predictions.

As it was said in the preceding section, and according to [Rossi06], the Normal linear regression

model is given by:

39


Multiple Normal Linear Regression Simple Normal Linear Regression

Function yi = β0 + β1xi1 + · · ·+ βpxip + εi y = β0 + β1x + ε

Mean µi = β0 + β1xi1 + · · ·+ βpxip µ = β0 + β1x

Variance σ2 σ2

Model Y ' N(~µ, σ2I) Y ' N(µ.σ2)

β β = (X ′X)−1X ′Y

E[β] β β

V ar(β) V ar(βj) = σ2Cjj V ar(β0) = σ2[

1n + x2Pn

i=1(xi−x)2

]

V ar(β1) = σ2Pni=1(xi−x)2

σ2 σ2 = Y ′Y−β′X′Yn−p σ2 =

Pni=1(yi−y1)2

n−2

Prediction Yf ± tn−p, α2

√σ2(1 + X ′

f (X ′X)−1Xf ) Yf ± tn−p, α2

√σ2(1 + 1

n + (xf−x)2Pni=1(xi−x)2

)

LimitationOnly applied to those data in same

range as sampled data

Only applied to those data in same

range as sampled data

Table 5.1: Multiple and Simple Regression Comparison

Y = Xβ + ε (5.14)

where

40


ε ' N(0, σ2I) (5.15)

So

Y |X,β, σ2 ' N(Xβ, σ2I) (5.16)

For simplicity of notation, we will not explicitly include X in our conditioning set for regression

model.

Using the definition of the multivariate Normal density, the likelihood function is obtained:

p(Y |β, σ2) =(σ2)−n/2

√(2π)n

exp

[ −12σ2

(Y −Xβ)′(Y −Xβ)]

(5.17)

It will be convenient to write

(Y −Xβ)′(Y −Xβ) (5.18)

in terms of the ordinary least squares estimators

v = n− p (5.19)

β = (X ′X)−1X ′Y (5.20)

s2 = (Y−Xβ)′(Y−Xβ)n−p (5.21)

So

(Y −Xβ)′(Y −Xβ) = vs2 + (β − β)X ′X(β − β) (5.22)

Then

p(Y |β, σ2) =1

(2π)n/2σpexp

[ −12σ2

(β − β)′(X ′X)(β − β)]

(σ2)−v/2 exp[−vs2

2σ2

](5.23)

As it was said before, n corresponds to the number of observations and p refers to the number of

parameters β. This new form of expressing the likelihood function would be more useful to find a

natural conjugate prior distribution, which would have the same form that the former has.

41


The prior distribution for β and σ2 , denoted by p(β, σ2), can be written in a more convenient

way applying the definition of the joint distribution:

p(β, σ2) = p(β|σ2)p(σ2) (5.24)

Note that β and σ2 are supposed to be dependent, which will rarely occur. Some authors prefer

to work with the error precision, 1σ2 say, instead of the variance σ2.

All this is very similar to that explained in the Bayesian Analysis for the Normal distribution. The

term of the first parenthesis in the likelihood function suggests a form of a Normal distribution for the

parameter β knowing σ2. So

p(β|σ2) ∝ (σ2)−pexp

[ −12σ2

(β − β0)′V −10 (β − β0)

](5.25)

and, hence,

β|σ2 ' N(β0, σ2V0) (5.26)

According to [Rossi06] the term of the second parenthesis in the likelihood function suggests a

form of an inverse gamma distribution for the parameter σ2(see appendix A). So

p(σ2) ∝ (σ2)−(v02

+1)exp

[−v0s20

2σ2

](5.27)

and, hence,

σ2 ' Inv −G

(v0

2,v0s

20

2

)(5.28)

Note that there is an extra term (σ2)−1 here which is not suggested by the form of the likelihood

explained above. This term can be rationalized by viewing the conjugate prior as arising from the

posterior of a sample of size v0 with sufficient statistics, s20, β0, formed with the noninformative prior,

p(β, σ2) ∝ σ−2, which will be briefly explained later.

So the natural conjugate prior distribution of the parameters β and σ2 is:

p(β, σ2) ∝ (σ2)−(p+v0

2+1)exp

[ −12σ2

[v0s20 + (β − β0)′V −1

0 (β − β0)]]

(5.29)

and, hence,

42


β, σ2 ' N − Inv − χ2(β0, V0s20; v0, s

20) (5.30)

where the prior hyper-parameters β0,V0,v0 and s20 show the knowledge that the researcher has about

the problem and her or his confidence in it. Furthermore, the parameter β0 measures the marginal

effect of the explanatory variable on the dependent variable. As well, V0 indicates the uncertainty

about the prior information and it plays the same role than (X ′X)−1 does in the classical approach,

v0 represents a fictitious data set so it plays a similar role than n and s20 is an imaginary s2 for those

fictitious data. In terms of the distribution, β0 and V0σ2 represent the location and scale of β, respec-

tively, and v0 and s20 the degrees of freedom and scale of σ2, respectively.

Since a conjugate prior distribution has been used, the posterior distribution will have the same

form. That is, the posterior distribution will be a Normal-Scaled Inverse χ2 with a posterior hyper-

parameters β1, V1, v1 and s21. According to [Rossi06] and [Koop03], it can be shown that

β, σ2|y ' N − Inv − χ2(β1, V1s21; v1, s

21) (5.31)

The relation between the prior and the posterior hyper-parameters, according to [Koop03], is:

V1 = (V −10 + X ′X)−1 (5.32)

β1 = V1(V −10 β0 + X ′Xβ) (5.33)

v1 = v0 + n (5.34)

v1s21 = v0s

20 + vs2 + (β − β0)[V0 + (X ′X)−1]−1(β − β0) (5.35)

As it was mentioned in the Bayesian Data Analysis chapter, a measure is needed to summarize

the posterior distribution, and this is usually the posterior mean, namely E(β|y). According to what

said in previous chapters, the marginal for β will be a multivariate t-distribution (see Appendix A):

β|y ' tv1(β1, s21V1) (5.36)

where

E(β|y) = β1 = V1(V −10 β0 + X ′Xβ) (5.37)

and

43


V ar(β|y) =v1s

21

v1 − 2V1 (5.38)

So the posterior mean is a weighted average of the ordinary least squares estimate, β, and the

prior mean, β1, where those weights are proportional to the observed data, X ′X , and the importance

given to the prior, V −10 , respectively. This should make clear that as prior variance for β is decreased,

greater posterior weight is placed on prior beliefs relative to the data, so the posterior mean will be

closer to the prior mean.

The elements of the diagonal of the matrix v1s21

v1−2V1 are the variances of β0, β1, . . . , βp.

V ar(βj) =v1s

21

v1 − 2V1jj for all j = 0, 1, . . . , p (5.39)

Likewise, the marginal posterior for σ2 is:

σ2|y ' Inv − χ2(v1, s21) (5.40)

and, hence,

E(σ2|y) =v1s

21

v1 − 2(5.41)

V ar(σ2|y) =2v2

1s41

(v1 − 2)2(v1 − 4)(5.42)

So, as we increase the total of fictitious data v0, then v1 tends towards v0, and, hence, σ2 get

closer to s2.

Tables 5.2 and 5.3 shows how the different posterior parameters of interest vary depending on the

prior parameters V0 (considering V0 as cIk ) and v0 and the sample size n:

Table 5.2 means that if the size of the sample increases towards infinity, then the prior information

that the researcher gives has very little or almost none importance, as it occurs if the precision of the

prior distribution for β decreases (that is, V0 increases) towards 0. The difference between both cases

is that in the former the variance of β is lower than in the latter.

The number of fictitious data does not seem to affect to the posterior mean, but it affects to the

posterior variance increasing it (resp. decreasing) as the fictitious data increase (resp. decrease).

44


Action E[β|y] V ar[β|y]

n Increase Closer to OLS estimates Closer to 0

Decrease Closer to β0 Further from 0

V0 Increase Closer to OLS estimates Further from 0

Decrease Closer to β0 Closer to 0

ν0 Increase Not affected Increase

Decrease Not affected Increase

Table 5.2: Sensitivity analysis of parameter β

Table 5.3 refers to the parameter σ2, and it means that if the fictitious data increase, then the

information given by the researcher will have much more weight over the posterior mean of σ2 than

the real data have, and the variance will be lower too. The other way round occurs when the number

of real data increases. Then, the data information will have the most important weight and the prior

information will not have any value. Another interesting result is that as the precision of the prior

distribution for β decreases (that is, V0 increases) the posterior mean of σ2 will approximate to the

number of real data times the ordinary least estimates.

Continuing in a different issue, the fact that the natural conjugate prior implies prior information

enters in the same manner as data information helps with prior elicitation. When several priors can

be applied to the same problem, two strategies can be adopted to surmount the possible criticisms.

First, a prior sensitivity analysis can be carried out to demonstrate that results are the same with

different priors chosen. But, if results are sensitive to choice of prior, Bayesian approach allows for

the scientifically honest finding of such a state of affairs. There has been work done on extreme

bounds analysis for quantities such as the posterior mean of a parameter. [Poir95] provides a detailed

45


Action E[β|y] V ar[β|y]

n Increase Closer to OLS estimates Closer to 0

Decrease Closer to s20 Further from 0

V0 Increase Closer to vs2 Closer to 0

Decrease Closer to OLS estimates Further from 0

ν0 Increase Closer to s20 Closer to 0

Decrease Closer to OLS estimates Further from 0

Table 5.3: Sensitivity analysis of parameter σ2

discussion about this issue. A second strategy is to use a non-informative prior to let the data speak

loudly and be predominant over prior information. For example, let’s set v0 = 0, and V −10 = 0. Then

β, σ2|y ' N − Inv − χ2(β1, V1s21; v1, s

21) (5.43)

where

V1 = (X ′X)−1 (5.44)

β1 = β (5.45)

v1 = n (5.46)

v1s21 = vs2 (5.47)

With this non-informative prior, all of these formulas involve only data information and equal to

ordinary least squares results. Bayesians often write this prior as:

46


p(β, σ2) ∝ σ−2 (5.48)

Finally, one of the goals of the Bayesian approach is to provide a predictive model to predict an

unobserved data point generated from the same model that the data set with n observations (N(0, σ2)

with the same β). This is denoted by:

Y ∗ = X∗β + ε∗ (5.49)

where Y ∗ is not observed and ε∗ is independent of ε.

Bayesian prediction is based on calculating

p(y∗|y) =∫ ∫

p(y∗|y, β, σ2)p(β, σ2|y)dβdσ2 (5.50)

The key to get the prediction is to find out the form of p(y∗|y, β, σ2), since the posterior p(β, σ2|y)

has been already calculated, and to test if p(y∗|y) is easy to integrate or, on the contrary, a posterior

simulator has to be employed.

Since ε∗ is independent of ε, then Y ∗ is independent of Y , and p(y∗|y, β, σ2) can be written as

p(y∗|β, σ2), which is a multivariate Normal, as it was seen before.

p(y∗|β, σ2) =(σ2)−

T2√

(2π)Sexp

[− 1

2σ2(y∗ −X∗β)′(y∗ −X∗β)

](5.51)

Multiplying this by the posterior obtained previously and integrating yields a multivariate t:

y∗|y ' tv1(X∗β1, s

21(IT + X∗V1X

∗′)) (5.52)

where T is the number of observed X∗.

It is easy to see that:

E(y∗|y) = X∗β1V ar(y∗|y) = s21(IT + X∗V1X

∗′) (5.53)

A brief summary that compares the classical and the Bayesian approaches is displayed to note the

coincidences and differences between them.

47


Classical Regression Bayesian Regression

β = (X ′X)−1X ′Y β1 = V1

(V −1

0 β0 + X ′Xβ)

σ2 = Y ′Y− ˆβX′Yn−p s2

1 =ν0s2

0+νs2+(β−β0)′(β−β0)V0+(X′X)−1

ν1

E[β] = β E[β|y] = β1

V ar(βj

)= σ2Cjj V ar (βj |y) = ν1s2

1ν1−2V1jj

Y ∗|y ' tn−p

(X∗β, σ2IT

)Y ∗|y ' tν1

(X∗β1, s

21

(IT + X∗V1X

∗′))

Table 5.4: Classical and Bayesian regression comparison

A very interesting and more exhaustive comparison between these two approaches can be read in

the article written by [Urba92], where he explains the advantages and disadvantages of using each of

them.

5.4 Normal Linear Regression Model subject to inequality constraints

In this section, let us guess we want to impose inequality constraints on the coefficients in the Normal

linear regression model, such as βj ∈ A, where A is the region of all valid values of the coefficients.

This is quite simple in Bayesian regression since they are imposed through the prior distribution:

p(β, σ2) ' N − Inv − χ2(β0, V0s20; v0, s

20)1(β ∈ A) (5.54)

where β0, V0, v0 and s20 are prior hyper-parameters to be chosen and 1(β ∈ A) is the indicator func-

tion, which equals 1 if β ∈ A and 0 otherwise.

Likewise, the posterior distribution for β is now:

48


p(β|y) ∝ tv1(β1, s21V1)1(β ∈ A) (5.55)

where β1, V1, v1 and s21 were defined previously.

So the difference introducing inequality constraints is that we must add the indicator function now.

This can result very easy, but for general choice of A neither analytical posterior results nor Gibbs

sampling work. The most suitable method is the importance sampling, which has already explained.

In this case, according to [Koop03] the importance function is:

q(β) = tv1(β1, s21V1) (5.56)

The strategy consists of getting draws y∗(s) drawing from p(y∗|β(s), σ2(s) using the draws β(s)

and σ2(s) which were obtained for the posterior distribution. Then using these draws (y∗)(s) in the

Importance Sampling, the mean and the variance can be calculated.

Other more simple way consists of ignoring the constraints until the end of simulation, and then

discarding those draws which violate the restrictions. According to [Gelm04], this works reasonably

well if the constraints do not eliminate a large portion of data.

5.5 Normal Linear Regression Model with Independent Parameters

Now, suppose that the parameters β and σ2 are independent, so

p(β, σ2) = p(β)p(σ2) (5.57)

With the same likelihood function as that used in the previous section, this assumption implies that

β follows a Multivariate Normal Distribution with mean β0, as it occurred with β and σ2 dependent,

but with variance V0, and σ2 has exactly the same Scaled − Inv − χ2 distribution used previously.

That is:

β ' N(β0, V0)σ2 ' Inv − χ2(v0, s20) (5.58)

The prior joint distribution is

49


p(β, σ2) ∝{

exp[−1

2(β − β0)

′ V −10 (β − β0)

]}{(σ2)−(

v02

+1) exp[−v0s

20

2σ2

]}(5.59)

β, σ2 ' N − Inv − χ2(β0, V0, v0, s20) (5.60)

As the posterior joint distribution is proportional to the prior times the likelihood:

p(β, σ2|Y ) ∝ exp[−1

2(Y −Xβ)′(Y −Xβ)

σ2+ (β − β0)′V −1

0 (β − β0)]×

× (σ2)−(n+v0

1+1) exp

[−v0s

20 + vs2

2σ2

](5.61)

Since this function does not take the form of any well-known density, it is interesting to find the

conditional distributions for β, p(β|Y, σ2), and for σ2, p(σ2|Y, β), because with them any informa-

tion from p(β, σ2|Y ) can be obtained through the posterior simulation with the Gibbs sampler already

explained in previous chapters.

According to [Koop03], it can be shown that those conditional distributions are:

p(β|Y, σ2) ∝ exp[−1

2(β − β1)′V −1

1 (β − β1)]

(5.62)

p(σ2|Y, β) ∝ (σ2)−(p+v0

2+1) exp

[− 1

2σ2(Y −Xβ)′(Y −Xβ) + v0s

20 + vs2

](5.63)

And this all yields:

β|y, σ2 ' N(β1, V1) (5.64)

σ2|y, β ' Inv − χ2(v1, s21) (5.65)

where

V1 = (V −10 +

1σ2

X ′X)−1 (5.66)

β1 = V1(V −10 |β0 +

1σ2

X ′Y ) (5.67)

v1 = n + v0 (5.68)

s21 =

(Y −Xβ)′(Y −Xβ) + v0s20

v1(5.69)

50


The fact that the posterior distribution has an unknown form affects to the prediction for y∗,

p(y∗|y), too. As it has been already said for the posterior predictive in Bayesian Approach chapter,

the interest is on p(y∗|y, β, σ2). Since y and y∗ are independent of one another,

p(y∗|y, β, σ2) = p(y∗|β, σ2) (5.70)

And hence

p(y∗|β, σ2) =(σ2)

T2

(2π)T2

exp[− 1

2σ2(y∗ −X∗β)(y∗ −X∗β)

](5.71)

As the analytical solution of the integral of this figure is not trivial, the importance of Gibbs

sampler arises again, and, combine it with the Monte Carlo integration, any posterior and predictive

inference can be done. The strategy consists of getting draws y∗(s) drawing from p(y∗|βs, σ2(s))

using the draws β(s), σ2(s) which were obtained for the posterior distribution. Then using these draws

y∗(s) in the Monte Carlo integration the mean and the variance can be calculated.

5.6 Normal Linear Regression Model with Heteroscedasticity and Cor-relation

Until now the variances have been supposed to be equal and having no correlation, but this is not very

realistic. In this section we are going to relax that assumption and to consider the next model:

Y = Xβ + ε (5.72)

where

ε ' N(0,Σ) (5.73)

That is, we are considering heteroscedasticity and correlation. According to [Koop03], since Σ is

a positive definite matrix, a matrix P can be found that verifies PΣP ′ = I , and it can be shown that

Y ∗ = X∗β + ε∗ (5.74)

where

ε∗ ' (0, σ2I) (5.75)

51


and

Y ∗ = PY (5.76)

X∗ = PX (5.77)

ε∗ = Pε (5.78)

Then, the likelihood function to consider now is:

p(Y |β, σ2, Σ) =1

(2π)n2

(σ2)−p2 exp

[− 1

2σ2(β − βΣ)′X ′Σ−1X(β − βΣ)

]×

× (σ2)−v2 exp

[−vsΣ−2

2σ2

](5.79)

where:

v = n− p (5.80)

βΣ = (X∗′X∗)−1X∗′Y ∗ (5.81)

s2(Σ) =(Y ∗ −X∗βΣ)′(Y ∗ −X∗βΣ)

v(5.82)

which is very similar to that use with equal variances.

Using the prior distributions described in the previous section, we have:

p(β, σ2, Σ) = p(β)p(σ2)p(Σ) (5.83)

where β is normally distributed with prior parameters β0, V0 and σ2 is an scaled inverse Chi-square

with parameters v0 and s20.

Hence, knowing that the posterior distribution is proportional to the prior times the likelihood:

p(β, σ2,Σ|Y ) ∝ p(Σ)×{

exp[−1

2(Y ∗ −X∗β)′(Y ∗ −X∗β)

σ2+ (β − β0)′V −1

0 (β − β0)]}

×

×{

(σ2)−(n+v0

2+1) exp

[−v0s

20

2σ2

]}(5.84)

52


This suggests a Normal distribution for the posterior conditional for β and an scaled inverse Chi-

square for the posterior conditional for σ2, as occurred before. Therefore:

β|Y, σ2, Σ ' N(β1, V1) (5.85)

σ2|Y, β,Σ ' Inv − χ2(v1, s21) (5.86)

where

V1 = (V −10 +

X ′Σ−1X

σ2)−1 (5.87)

β1 = V1(V −10 β0 +

X ′Σ−1XβΣσ2

) (5.88)

v1 = n + v0 (5.89)

s21 =

(Y −Xβ)′Σ′(Y −Xβ) + v0s20

v1(5.90)

According to [Koop03], the posterior conditional for Σ yields:

p(Σ|Y, β, σ2) ∝ p(Σ)|Σ|− 12 exp

[− 1

σ2(Y −Xβ)′Σ−1(Y −Xβ)

](5.91)

So we have come to the point where the form that Σ takes is crucial.

5.6.1 Heteroscedasticity

Let us suppose we suspect that there is not correlation among the errors but their variances are differ-

ent. Hence, we will have n variances ωi for n errors εi.

It could be that the researcher has an idea of the form of Σ and assumes that

ωi = h(xi, α) = (1 + α1xi1 + · · ·+ αpxip)2 (5.92)

That is, the variances are related to some or all independent variables. The researcher should

choose a prior for α, and then, Bayesian inference can be carried out through a Metropolis-Hastings

algorithm such as Random walk.

If the researcher knows that the error variances are different but has not idea of their form, then a

prior for Σ has to be chosen. According to [Koop03]:

53


p(Σ) =n∏

i=1

p(ωi) (5.93)

where

ωi ' Inv − χ2(vw, 1) (5.94)

But now a hyper-prior distribution should be fixed for vw, such as

p(Σ) = p(Σ|vw)p(vw) (5.95)

That is, we are using a hierarchical prior to treat the heteroscedasticity. According to [Gelm04] a

Metropolis-Hastings algorithm can be used to draw posterior simulations.

5.6.2 Correlation

Now, let us assume that there is some correlation among the errors through the time- space relationship

such as the error in one period depends on that in the previous period. This is a type of regression

called autoregressive, and it can be considered a time series. For example, if we are considering

the relation among the Ibex 35 values one day and the previous ones, we could say that there is a

correlation between errors that exists in the relation among Fridays and previous days and what exists

in the relation among the values on Thursdays or Wednesdays or Tuesdays and the previous days.

That is:

εt = ρ1εt−1 + ρ2εt−2 + · · ·+ ρpεt−p + ut (5.96)

where

ut ' N(0, σ2) (5.97)

We will consider that there is stationary. This means, in a general way, that the probability dis-

tribution does not vary through the time. Some time series does not seem to be stationary, but the

differences do. The main difference to take into account is the first one mentioned. The first differ-

ence of εt is δεt and it indicates the variation in ε among the periods t and t− 1, t− 2 ,. . . , t− p.

According to [Koop03], the irregular component ut can be formulated in the following way:

ρ(L)εt = ut (5.98)

54


where L is called the lag operator and has the property that Lεt = εt−1 and ρ(L) = (1− ρ1L− · · · −ρpL

p).

So, if we have the regression model:

Yt = X′tβ + εt (5.99)

Then, it is possible to find a model such as

Y ∗t = X∗

t β + ut, ut ' N(0, σ2) (5.100)

where

Y ∗t = ρ(L)Yt (5.101)

X∗t = ρ(L)Xt (5.102)

Therefore, using an independent Normal scaled inverse chi-square prior for β and σ2, it yields:

β|Y, σ2, ρ ' N(β1, V 1) (5.103)

σ2|Y, β, ρ ' Inv − χ2(v1, s21) (5.104)

where

V1 = (V −10 +

X∗′X∗

σ2)−1 (5.105)

β1 = V1(V −10 β0 +

X∗′Y ∗

σ2) (5.106)

v1 = v0 + T − p (5.107)

s21 =

(Y ∗ −X∗β)′(Y ∗ −X∗β) + v0s20

v1(5.108)

And now, as it occurred with heteroscedasticity, a prior should be selected for ρ. Let us choose a

multivariate Normal subject to the constraint ρ ∈ φ, where φ is the stationary region. Then,

p(ρ) ' N(ρ0, Vρ0)1(ρ ∈ φ) (5.109)

p(ρ|Y, β, σ2) ' N(ρ1, Vρ1)1(ρ ∈ φ) (5.110)

55


where ρ0 and Vρ0 are the prior parameters which the research should establish and ρ1 and Vρ1 are the

posterior parameters with the next relation:

Vρ1 = (V −1ρ0 +

E′Eσ2

)−1 (5.111)

ρ1 = Vρ1(V −1ρ0 ρ0 +

E′Eσ2

) (5.112)

where E is a matrix with the errors through the time from t−1 to t−p for each independent variable.

According to [Koop03], a Gibbs sampler can be used to draw posterior simulations.

5.7 Models Summary

Since the main models to be used in the posterior application are those homoscedastic and not auto

correlated, the main ideas are shown in Tables 5.5, 5.6, 5.7 and 5.8.

56


Cas

eβ

σ2

Join

tPri

orD

istr

ibut

ion

p(β

,σ2)=

p(β|σ

2)p

(σ2)

β|σ

2'

N(β

0,σ

2V

0)

σ2'

Inv−

χ2(v

0,s

2 0)

β,σ

2'

N−

Invχ

2(β

0,V

0s2 0

;v0,s

2 0)

p(β

,σ2)=

p(β|σ

2)p

(σ2)

β|σ

2'

N(β

0,σ

2V

0)1

(β∈

A)

σ2'

Inv−

χ2(v

0,s

2 0)

β,σ

2'

N−

Invχ

2(β

0,V

0s2 0

;v0,s

2 0)1

(β∈

A)

p(β

,σ2)=

p(β

)p(σ

2)

β'

N(β

0,V

0)

σ2'

Inv−

χ2(v

0,s

2 0)

β,σ

2'

N−

Invχ

2(β

0,V

0;v

0,s

2 0)

p(β

,σ2)=

p(β

)p(σ

2)

β'

N(β

0,V

0)1

(β∈

A)

σ2'

Inv−

χ2(v

0,s

2 0)

β,σ

2'

N−

Invχ

2(β

0,V

0;v

0,s

2 0)1

(β∈

A)

Tabl

e5.

5:M

ain

Prio

rDis

trib

utio

nsSu

mm

ary

57


Cas

eJo

intP

oste

rior

Dis

trib

utio

nK

ey

p(β

,σ2|y)

=p(y|β

,σ2)p

(β|σ

2)p

(σ2)

β,σ

2'

N−

Inv−

χ2(β

1,V

1s2 1

;v1,s

2 1)

Obt

ain

Mar

gin

Dis

trib

u-

tions

,D

raw

dire

ctly

from

them

and

sum

mar

ize

p(β

,σ2|y)

=p(y|β

,σ2)p

(β|σ

2)p

(σ2)

β,σ

2'

N−

Inv−

χ2(β

1,V

1s2 1

;v1,s

2 1)1

(β∈

A)

Obt

ain

Mar

gin

Dis

tri-

butio

ns,

Dra

wdi

rect

ly

from

them

,dis

card

inva

lid

draw

san

dsu

mm

ariz

e

p(β

,σ2|y)

=p(y|β

,σ2)p

(β)p

(σ2)

∝{ ex

p[ −

1 2

{ (Y−X

β)′

(Y−X

β)

σ2

+(β−

β0)′V−1 0

(β−

β0)}]

} ×O

btai

nC

ondi

tiona

lDis

tri-

butio

ns,

Dra

ww

ithG

ibbs

Sam

pler

and

sum

mar

ize

×{ (σ

2)−

(n+

v0

2+

1) e

xp

[ −v0s2 0

+vs2

2σ

2

]}

p(β

,σ2|y)

=p(y|β

,σ2)p

(β)p

(σ2)

∝{ ex

p[ −

1 2

{ (Y−X

β)′

(Y−X

β)

σ2

+(β−

β0)′V−1 0

(β−

β0)}]

} ×

Obt

ain

Con

ditio

nalD

istr

i-

butio

ns,

Dra

ww

ithG

ibbs

Sam

pler

,di

scar

din

valid

draw

san

dsu

mm

ariz

e×

{ (σ2)−

(n+

v0

2+

1) e

xp

[ −v0s2 0

+vs2

2σ

2

]}1(

β∈

A)

Tabl

e5.

6:M

ain

Post

erio

rDis

trib

utio

nsSu

mm

ary

58


Cas

ePr

iorP

aram

eter

sPo

ster

iorP

aram

eter

sR

elat

ion

p(β

,σ2|y)

=p(y|β

,σ2)p

(β|σ

2)p

(σ2)

β0

V0

v 0 s2 0

β1

V1

v 1 s2 1

β1

=V

1(V

−1 0β

0+

X′ X

β)

V1

=(V

−1 0+

X′ X

)−1

v 1=

v 0+

n

s2 1=

v0s2 0

+vs2

+(β−β

0)[

V0+

(X′ X

)−1](

β−β

0)

v1

p(β

,σ2|y)

=p(y|β

,σ2)p

(β)p

(σ2)

β0

V0

v 0 s2 0

β1

V1

v 1 s2 1

β1

=V

1(V

−1 0β

0+

X′ Y σ2

)

V1

=(V

−1 0+

X′ X σ2

)−1

v 1=

v 0+

n

s2 1=

v0s2 0

+(Y−X

β)′

(Y−X

β)

v1

Tabl

e5.

7:Pr

iora

ndPo

ster

iorP

aram

eter

sSu

mm

ary

59


Cas

ep(y∗ |y

,β,σ

2)

Key

Con

stra

int

p(y∗ |y

)=

∫∫p(y∗ |y

,β,σ

2)p

(β|σ

2,y

)p(σ

2|y)

dβdσ

2N

(β,σ

2)

Oby

ain

draw

sy∗

from

p(y∗ |y

,β,σ

2)

usin

gpr

evio

usdr

aws

from

post

erio

r

sim

ulat

ion.

Use

Mon

teC

arlo

inte

gra-

tion

toge

tpre

dict

ive

infe

renc

es.

No

p(y∗ |y

)=

∫∫p(y∗ |y

,β,σ

2)p

(β|σ

2,y

)p(σ

2|y)

dβdσ

2N

(β,σ

2)

Oby

ain

draw

sy∗

from

p(y∗ |y

,β,σ

2)

usin

gpr

evio

usdr

aws

from

post

erio

r

sim

ulat

ion.

Use

Mon

teC

arlo

inte

gra-

tion

toge

tpre

dict

ive

infe

renc

es.

Yes

p(y∗ |y

)=

∫∫p(y∗ |y

,β,σ

2)p

(β|y)

p(σ

2|y)

dβdσ

2N

(β,σ

2)

Oby

ain

draw

sy∗

from

p(y∗ |y

,β,σ

2)

usin

gpr

evio

usdr

aws

from

post

erio

r

sim

ulat

ion.

Use

Mon

teC

arlo

inte

gra-

tion

toge

tpre

dict

ive

infe

renc

es.

No

p(y∗ |y

)=

∫∫p(y∗ |y

,β,σ

2)p

(β,y

)p(σ

2|y)

dβdσ

2N

(β,σ

2)

Oby

ain

draw

sy∗

from

p(y∗ |y

,β,σ

2)

usin

gpr

evio

usdr

aws

from

post

erio

r

sim

ulat

ion.

Use

Mon

teC

arlo

inte

gra-

tion

toge

tpre

dict

ive

infe

renc

es.

Yes

Tabl

e5.

8:M

ain

Post

erio

rPre

dict

ive

Dis

trib

utio

nsSu

mm

ary

60

Chapter 6

Symbolic Data

6.1 What is symbolic data analysis?

Nowadays there are more and more data which are susceptible to be analyzed and studied. The tech-

nological advances let us get huge quantities of information about a specific variable. But part of

that information is lost due to the fact that standard statistical methods do not have the flexibility to

manage such quantity of information. For example, let us assume we are studying the evolution of

stock prices for an enterprise. At the end of each month we would have the different values that the

stock has been taking daily. It seems reasonable to think that the researcher would take only the daily

close prices, or the daily mean prices, but he would not manage all the gathered information.

The symbolic data analysis (SDA) deals with this problem and let us analyse vast information ef-

ficiently in order to extract the required knowledge and to represent it better. Going on with the same

example, the symbolic data will let the engineer manage the daily maximum and minimum prices

of a month, or manage a histogram for monthly prices and work with them. In this way, SDA com-

plements other statistical tools which are widely used, such as candlesticks. More information about

candlesticks and other interesting tools can be found in [Lee 06] and [Irpi05]. For instance, Figure 6.1

illustrates an interval time series for the daily maximum and minimum Ibex 35 values in January 2006.

So, the possibilities with symbolic data are evident. For instance, let us think of an application

of this with warrants. Warrant is a right, without obligation, to buy, namely call warrant, or to sell,

namely put warrant, something at an agreed price, namely strike. So you could get a predicted stock

price range to decide the best put warrant or the most suitable call warrant, and obtain more benefits.

61

6. Symbolic Data

Figure 6.1: Interval time series

Regarding to the aggregation method used by SDA lies the notion of a symbolic object. This is a

mathematical model of a concept (see [Dida95]) which, basically, let us select some individuals from

a group. Going further with SDA and according to [Bill06a], three main kinds of symbolic data can

be considered: multi-valued, interval-valued and modal- valued.

As far as the former is concerned, a multi-valued symbolic random variable Y is one whose pos-

sible value takes one or more values from the list of values in its domain Y . The complete list of

possible values in Y is finite, and values may be well- defined categorical or quantitative values.

For example, let us have all the companies which have formed the Ibex 35 index since its be-

ginning. Then we could define a variable Y = blue chips in the Ibex 35 having 15 observations

wu = year. Thus, we have, for instance, that during the first year, in 1992 (wu = w1), Telefonica,

Repsol, Endesa, SCH and BBVA were considered to be the blue chips. In 2007 (wu = w1) Santander,

Telefonica, BBVA, Endesa and Repsol YPF are considered to be the blue chips.

Likewise, an interval-valued symbolic random variable Y is one that takes values in an interval.

62

6. Symbolic Data

wu Year Z = Blue chips in Ibex 35

w1 1992 {Telefonica, Repsol, Endesa, SCH, BBVA}

......

...

w15 2007 {Telefonica, Repsol YPF, Endesa, BBVA, Santander}

Table 6.1: Multivalued Data Example

That interval can be closed or open at either end. This is very important in SDA; furthermore, it can

extract the tendency of centralization and dispersion of a dataset. Let us recall the example of the

daily stock prices for a company in a month. This information can be recorded as the daily maximum

and minimum values during the month. As this is one of the most interesting types of symbolic data

for our purpose, we will take it up again below.

Finally, let a random variable Y take possible values {ηk : k = 1, 2, . . . } over a domain Y . Then,

a modal valued outcome is that formed with the value ηk and an associated measure πk. This last

one is usually a weight, probability, relative frequency, and the like. But it can also be capacities,

necessities, possibilities, credibilities and related entities.

Then modal multi-valued variable can be defined now. This is a variable whose observed outcome

takes values that are a subset of the domain with its respective measure. For example we could define

a variable Z = Importance of the companies in the Ibex 35 index. Thus, for instance, we have that the

most important company in 1992 was Telefonica and now Santander is currently the company with

highest weight in the index in 2007.

Another example: let us suppose we define a variable Y = Maximum daily stock price for enter-

prises in the Continuous Spanish Stock Market. We could have for the enterprise Endesa:

63

6. Symbolic Data

wu Year Z = Importance of a Ibex 35 company

w1 1992

{Telefonica, 13.7; Repsol, 9.7; Endesa, 9.2; SCH, 8.0, BBVA,

7.2; Iberdrola, 6.9; Santander, 5.9; Banco Popular, 3.8; Banesto,

3.6; Banco Exterior, 3.0; Cepsa, 2.5; Tabacalera, 2.4; Acesa,

2.1; Union FENOSA, 2.0; Gas Natural, 1.9; Sevillana de Elec-

tric, 1.8; Fuerzas E. Catalua, 1.7; Bankinter, 1.6; Dragados,

1.4; Aguas de Barcelona, 1.3; Mapfre, 1.3; Asland, 1.2;FCC,

1.1; Portland Valderribas, 1.0; Hidrocantbrico, 0.8;Vallehermoso,

0.8; Metrovacesa, 0.8; Acerinox, 0.7; Viscofn, 0.6; Cubiertas y

MZOV, 0.5; Sarrio, 0.4; Uralita, 0.4; Huarte, 0.3; Urbis, 0.3;

Agromn, 0.2}

......

...

w15 2007

{ Telefonica, 16.0; Repsol YPF, 5.9; Endesa, 7.5; BBVA, 13.0;

Iberdrola, 5.6; Santander, 17.2; Banco Popular, 3.4; Banesto,

0.5; Union FENOSA, 1.8; Gas Natural, 1.5; Bankinter, 0.9; Cor.

Mapfre, 0.7; FCC, 1.2; Sacyr Vallehermoso, 1.0; Metrovacesa,

0.5; Acerinox, 1.0; Inditex 3.0; ACS Const, 2.9; B. Sabadell, 2.1;

Altadis, 2.0; Abertis A, 2.0; G. Ferrovial, 1.6; Acciona, 1.4; FCC,

1.2; Gamesa, 1.0; Enags, 0.8; REE, 0.8, Cintra, 0.7; Agbar, 0.7;

Telecinco, 0.6; Iberia, 0.5; Indra A, 0.5; Fadesa, 0.5; Sogecable,

0.4; Antena 3 TV, 0.4; NH Hoteles, 0.4}

Table 6.2: Modal-multivalued Example

Y (Endesa) = {38.7, 0.125; 38.75, 0.125; 38.8, 0.250; 38.85, 0.250; 38.9, 0.125; 39, 0.125}

This means that we assign a probability of 0.125 to the possibility of that Endesa maximum daily

price is 38.7, 0.125 to the possibility of that Endesa maximum daily price is 38.75, a probability of

64

6. Symbolic Data

0.25 to the possibility of that Endesa maximum daily price is 38.8 and so on.

Another very interesting variant of this type are modal interval-valued variables. That is, instead

of a value with a probability, the variable can take any value in an interval with a probability. Contin-

uing with the previous example:

Y (Endesa) = {[38.7, 38.75), 0.125; [38.75, 38.85), 0.125; [38.8, 38.9), 0.25}

For more information and other types of data, the reader is referenced to [Bill06a], [Huiw06] and

[Arro06].

6.2 Interval-valued variables

As it has been already mentioned, to summarize a dataset is one of the three possible sources or rea-

sons from which the interval data may result. According to [Huiw06], there are other two sources:

the imprecision of measurement and the expert’s knowledge including uncertainty.

Now, suppose u ∈ E is the set of m symbolic objects with observations Y (u) with u = 1, . . . , m.

Let us suppose we are interested in the particular random variable Yj ≡ Z, and that the realization

of Z for the observation wu is the interval Z(wu) = [au, bu] = ξ. Then, according to [Bill06a], the

empirical density function of Z is

f(ξ) =1m

∑

u∈E

Iu(ξ)‖Z(u)‖ , ξ ∈ R (6.1)

where Iu(·) indicates if ξ is or is not in the interval Z(u) and where ‖Z(u)‖ is the length of that

interval.

Likewise, it can be shown that the symbolic empirical mean is given by

Z =1

2m

∑

u∈E

(bu + au) (6.2)

and the symbolic empirical variance is given by

S2 =1

3m

∑

u∈E

(b2u + buau + a2

u

)− 14m2

[∑

u∈E

(bu + au)

]2

. (6.3)

65

6. Symbolic Data

These formulas are coherent with the hypothesis of uniformity into the intervals. As well as the

symbolic mean can be understood intuitively as the centre of gravity, the symbolic variance is not so

easy to be understood. In fact, it would seem more reasonable to formulate the variance as:

S2 =1

4m

∑

u∈E

(bu + au)2 − 14m2

[∑

u∈E

(bu + au)

]2

. (6.4)

That is, the variance of the midpoints. But this last formulation does not take into account the

internal variation of the intervals, while the former does and, hence, this is higher.

For example, let us consider the maximum and minimum points for the Ibex 35 during December

2006.

Then, according to what said above, the mean point in that month was:

Z =138

∑

u∈E

(highu + lowu) = 14116.

And the empirical symbolic variance is:

S2 =1

3m

∑

u∈E

(b2u + buau) + a2

u

)− 14m2

[∑

u∈E

(bu + au)

]2

= 28006.

If we had calculated the variance taking only the midpoints the result would have been:

S2 =1

4m

∑

u∈E

(bu + au)2 − 14m2

[∑

u∈E

(bu + au)

]2

= 26023

which is lower than that obtained previously because it does not take into consideration the internal

variation of the intervals.

Although it seems that everything related to interval-valued data are advantages, according to

[Huiw06], there are two major limitations when applying multivariate analysis on an interval dataset.

The first one is that the computing work is hard, and the second one is that the hyperrectangle may

enlarge the range of the original dataset and reduce analysis accuracy.

The methodology of interval data applied to multivariate analysis involved transforming symbolic

data matrix into numerical matrix. That is, to reduce p-dimensional observations into s-dimensional

66

6. Symbolic Data

components (where usually s ¿ p). This is called Principal Component Analysis. There are two

main methods that carry out this: the Vertices Method and the Centres Method. The former consists

of getting a matrix with 2p rows and p columns, from a hyperrectangle in the p−dimensional space,

where each row contains the coordinates of one vertex of hyperrectangle in Rp. On the other hand,

the latter deals with the idea of the average value of every variable for each category of data. A more

extended review of these two methods can found in [Bill06a]. [Huiw06] point up some limitations of

these methods and propose a new type of symbolic data: factor interval data. Due to the fact that the

symbolic data is a wide field, the reader is referenced to all the above citations.

6.3 Classical regression analysis with Interval-valued data

Regarding to classical multiple regression there are three current approaches to be considered, though

one of them is just a regression fit. Let us begin with the most intuitive to finish with the most con-

ceptual.

Due to the fact that now we have intervals instead of single values, it would be natural to take

midpoints and to proceed as it was done with multiple classical regression. That is, to use the result

to make new predictions from a new interval applying it to each extreme of the interval. Moreover,

[DeCa05] remark the need of establishing the constraint βi ≥ 0 to ensure that the lower extreme of the

predicted interval is lower than the higher extreme, and suggest the algorithm presented by [Laws74]

to solve such constraint. We suggest the alternative of getting enough draws from the posterior distri-

bution of β, discard those who are negative and average.

Let us recall the same example shown in the classical multiple regression, but taking now the

maximum and minimum values that Ibex 35, Dow Jones, FTSE 100 and DAX took in the first ten

months in 2006. We would take the midpoints of those intervals and we would obtain the same result

that we got in the classical multiple regression:

IBEX35t = 1.0102DowJonest−1 − 2.0144FTSE100t−1 + 2.1229DAXt−1 + εt

where

εt ' N(0, 332.712

)

67

6. Symbolic Data

We could use this model to predict a new observation for November, 1st, applying it to each

extreme of the intervals:

max (IBEX35t) = 1.0102× 12161− 2.0144× 6149.9 + 2.1229× 6289.7 = 13242.41

min (IBEX35t) = 1.0102× 11986.84− 2.0144× 6110.9 + 2.1229× 6237.55 = 13040.7

So the prediction would be: [13040.7, 13242.41].

A disadvantage of this approach is that it does not take into account the interval length.

To solve that problem, [DeCa05] and [DeCa04] suggest another regression for the interval range.

They refer to this new approach as the constrained centre and range method (CCRM). In that case

the constraint is applied to the interval range regression instead of to the centres regression. We will

employ the radii instead of the ranges. So, going on with the previous example, we would have the

radios for the different indexes to build the next model:

RadioIBEX35t = 0.35RadioDowJonest−1 + 0.484RadioFTSE100t−1 +

+ 0.272RadioDAXt−1 + εt

where

εt ' N(0, 26.312)

With this new approach, the prediction could be calculated from the midpoint and the range of

the interval:

MidpointIBEX35t = 1.0102× 12073.65− 2.0144× 6130.4 + 2.1229× 6262.125 = 13141.3

RadioIBEX35t = 0.35× 86.81 + 0.484× 19.5 + 0.272× 24.575 = 46.53

Now the prediction would be: [13094.75, 13187.81].

Finally, the last approach is the use of the symbolic mean, the symbolic variance and the sym-

bolic covariance to make the regression. This means that a symbolic regression is used instead of the

68

6. Symbolic Data

classical regression. For this new approach another way of estimating is needed.

Recall the classical univariate multiple regression model:

Y = β0 + X1β1 + · · ·+ Xpβp + ε (6.5)

where

ε ' N(0, σ2) (6.6)

Calculating the mean values we have:

Y = β0 + X1β1 + · · ·+ Xpβp + ε (6.7)

from which it can be easily deduced that:

β0 6= 0 ⇒ ε = 0 (6.8)

This means that the mean error is zero if there is a constant term in the model. This is a very

important point for the posterior consequences.

Then we can obtain an equivalent model as:

Y − Y = β0 +(X1 − X1

)β1 + · · ·+ (

Xp − Xp

)βp + ε (6.9)

where Y −Y is the new dependent variable and X−X is the new matrix of independent variables.

β can be estimated in the following way:

β = S−1XXSXY (6.10)

where

SXX =

var (X1) cov (X1, X2) . . . cov (X1, Xp)

cov (X1, X2) var (X2) . . . cov (X2, Xp)...

.... . .

...

cov (X1, Xp) cov (X2, Xp) . . . var (Xp)

and

69

6. Symbolic Data

SXY =

cov (X1, Y )

cov (X2, Y )...

cov (Xp, Y )

where independent term is not being taken into account (so there is no column of ones in matrix X).

The independent term β0 is estimated in the following way:

β0 = Y −p∑

j=1

βjXj (6.11)

With this new way, the symbolic variance, the symbolic covariance and the symbolic mean for

interval- valued variables can be used to estimate β.

But this approach has the limitation that to be able to employ the symbolic statistics and the last

way of estimating it is necessary to introduce the independent term in the regression model. In fact,

the most important point is that this last approach suggested by [Bill06a] is just a regression fit since

they do not defined any residual term ε for symbolic data.

6.4 Bayesian regression analysis with Interval-valued data

Once we know how the interval-valued data can be employed in the classical regression, let us see

how this could be included in the Bayesian approach. For such purpose we will employ the CCRM

proposed by [DeCa05].

According to what has been said above and in Bayesian Regression, there is nothing new to be

done. The problem is reduced to two Bayesian regressions: one for the centres and another for the

radios with a constraint applied to . As we saw in Bayesian Regression, the constraint is much easier

to be incorporated into the Bayesian approach than into the classical one.

So, introducing the Bayesian approach into the regression with symbolic data, the engineer will

be able to incorporate more information to the problem that he could do with Bayesian regression and

traditional data. This is due to the fact that now two regressions are being made, and the expert will

be able to establish if the centres value will increase or decrease and the same for the radios. In this

70

6. Symbolic Data

sense, an opinion like:

’I think that Dow Jones will have less importance over Ibex 35 and DAX will have more relevance

than they have had until now, and there will be more volatility.’

would mean that, for instance, the prior mean for the Dow Jones midpoint will be less than the

indicated by the data. On the contrary, the prior mean for the DAX midpoint would be greater, as it

would occur with the prior means for the radios.

71

Chapter 7

Results

To show the usefulness of the Bayesian Centre and Radius approach proposed in this project, exper-

iments with real symbolic interval-valued data sets fitting a linear regression model together with a

data set from Spanish Continuous Stock Market are considered in this section.

7.1 Spanish Continuous Stock Market data sets

We have considered two situations in the Spanish Continuous Stock Market. On one hand, we have

used the monthly minimum and maximum prices of BBVA and BSCH from January 2000 to June

2007 in order to show how the classical regression approach applied to interval- valued data can be

improved through the Bayesian Centre and Radius approach when the variables are directly related.

This will let us see other advantages of the proposed approach over the classical regression with sin-

gles values.

On the other hand, we have taken the daily minimum and maximum prices of others two Spanish

Continuous Stock Market companies such as Dogi and Zardoya from January 2006 to December 2006

in order to show that the Bayesian Centre and Radius approach is better than other approaches even

when the variables are not related; that is, they are uncorrelated.

7.2 Direct Relation between Variables

In this case 66 of the total 89 months will be applied to the training set. The other 23 months will be

applied to the testing set.

72

7. Results

Let us begin with the classical regression approach applied to the midpoints of the monthly mini-

mum and maximum prices that BBVA and BSCH took in the considered training period. These data

yields the next model:

BSCHMidpoint = 1.3008 + 0.6229×BBV AMidpoint + ε

where

ε ' N(0, 0.52372

)

Figures 7.1 and 7.2 show that this model fits well enough for both training and testing sets.

7 8 9 10 11 12 13 14 15 16 175

6

7

8

9

10

11

12

13

Midpoints BBVA prices

Figure 7.1: Classical Regression with single values in training test

If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean Square

Error and Root Mean Square Error) for each set, we obtain the Table 7.1. This means that it is a good

model, but we are only using the midpoints to fit new data when we have much more available data.

Therefore we are wasting information we have gathered. Actually, this can be seen graphically in

Figure 7.3. This figure makes oneself think that the model is not as good as it was believed before,

since there are too available information for a very simple result and one could expect more from

those data.

Thus, other approach, known as Centre Method, could be considered by applying the obtained

model to each maximum and minimum price to get a predicted maximum and minimum price. This

provides the results displayed in Figures 7.4 and 7.5.

According to [Bill00], the total deviation is given by:

εCenterMethod2000 = εlower + εupper (7.1)

73

7. Results

13 14 15 16 17 18 19 209.5

10

10.5

11

11.5

12

12.5

13

13.5

14

14.5

Midpoints BBVA prices

Figure 7.2: Classical Regression with single values in testing test

Set ME MAE MSE RMSE

Training 0 0.4208 0.2660 0.5157

Testing 0.2321 0.3831 0.2446 0.4946

Table 7.1: Error Measures for Classical Regression with single values

The resulting error measures can be seen in Table 7.2.

Now, we have a fitted interval-valued data for each observed interval- valued data. This approach

seems to take advantage of the extracted data.

Now let us see the resulting error measures according to Centre method proposed by [Bill02],

where the sum of square errors is given by:

SSECenterMethod2002 =n∑

i=1

(ε2lower + ε2upper

)(7.2)

and, thus, the absolute mean error is given by:

74

7. Results

6 8 10 12 14 16 184

5

6

7

8

9

10

11

12

13

Minimum and Maximum BBVA prices

Figure 7.3: Classical Regression with interval- valued data

6 8 10 12 14 16 184

5

6

7

8

9

10

11

12

13


Figure 7.4: Centre Method (2000) in training set

MAECenterMethod2002 =∑n

i=1 (|εlower|+ |εupper|)n

(7.3)

Table 7.3 shows that this new definition of the error does not improve very much the previous

one.

However, let us compare these last approaches with the Centre and Radius Method. In this case

we will have the next model:

BCSHMidpoint = 1.3008 + 0.6299×BBV AMidpoint + εMidpoint

75

7. Results

12 13 14 15 16 17 18 19 20 219

10

11

12

13

14

15


Figure 7.5: Centre Method (2000) in testing set

Set ME MAE MSE RMSE

Training 0 0.8416 1.0638 1.0314

Testing 0.4643 0.7663 0.9784 0.9891

Table 7.2: Error Measure for Centre Method (2000)

where

εMidpoint ' N(0, 0.52372)

and

BCSHRadius = 0.106 + 0.6188×BBV ARadius + εRadius

where

εRadius ' N(0, 0.14582)

76

7. Results

Set ME MAE MSE RMSE

Training 0 0.8917 0.5922 0.7695

Testing 0.4643 0.7717 0.5125 0.7159


According to [DeCa07], the sum of squares of deviations is given by:

SSECentreRadiusMethod =n∑

i=1

(ε2Midpoint + ε2Radius

)(7.4)

Therefore, the mean absolute error is given by:

MAE =∑n

i=1 (|εMidpoint|+ |εRadius|)n

(7.5)

6 8 10 12 14 16 184

5

6

7

8

9

10

11

12

13


Figure 7.6: Centre and Radius Method in training set

The results shown in Figures 7.6 and and in Table 7.4 clearly clearly show how the error measures

are less with Centre and Radius Method than with Centre Method, and thus, the former is better than

77

7. Results

12 13 14 15 16 17 18 19 20 218

9

10

11

12

13

14

15


Figure 7.7: Centre and Radius Method in testing set

Set ME MAE MSE RMSE

Training 0 0.5233 0.2866 0.5353

Testing 0.1837 0.4712 0.2558 0.5058

Table 7.4: Error Measures for Centre and Radius Method

the latter.

Now, let us take into consideration an expert’s knowledge about the Spanish Continuous Stock

Market and see the results of the Bayesian Centre and Radius Method. Obviously, the Bayesian

methodology is mainly useful in the testing set since it is there where there are unobserved data.

Bearing in mind the previous Centre and Radius model, an expert could think that BSCH would

slightly get better respect to BBVA and assign the prior distribution seen in 5.36 with the following

prior parameters for the Midpoints:

78

7. Results

β0 =

(1.3008

0.64

)

V0 = 0.000000001

s20 = 0.52372

v0 = 107

Then the final Midpoints model would be:

BCSHMidpoint = 1.3008 + 0.64×BBV AMidpoint + εMidpoint

Let us assume that the expert would consider that the volatility would not vary and she or he

assign a vague prior parameters for the Radius distribution:

β0 =

(0.106

0.6188

)

V0 = 106

s20 = 0.14582

v0 = 4

Then, the final Radius model would be:

BCSHRadius = 0.106 + 0.6188×BBV ARadius + εRadius

The results for the testing set are shown in Figure 7.8 and in Table 7.5.

This shows that the proposed Bayesian Centre and Radius Method improve all the previous ap-

proaches since let us manage more information than the classical regression and we obtain best results

than those obtained with the Centre and Centre and Radius methods.

7.3 Uncorrelated Variables

In this other case 170 of the total 255 days will be applied to the training set. The other days will be

applied to the testing set.

79

7. Results

12 13 14 15 16 17 18 19 20 219

10

11

12

13

14

15


Figure 7.8: Bayesian Centre and Radius Method in testing test

Set ME MAE MSE RMSE

Testing 0.0126 0.4409 0.1997 0.4469

Table 7.5: Error Measures in Bayesian Centre and Radius Method

The classical regression with the midpoints of the prices range yields the next model:

DogiMidpoint = 5.6570− 0.0806× ZardoyaMidpoint + ε

where

ε ' N(0, 0.28822)

Figures 7.9 and 7.10 show that this model does not fits well for neither training nor testing sets.

If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean Square

Error and Root Mean Square Error) for each set, we obtain the Table 7.6.

80

7. Results

21 21.5 22 22.5 23 23.5 24 24.5 25 25.5 263

3.2

3.4

3.6

3.8

4

4.2

4.4

4.6

4.8

5

Midpoints Zardoya prices

Figure 7.9: Classical Regression with single values in training test

22.4 22.6 22.8 23 23.2 23.4 23.6 23.8 24 24.23.1

3.2

3.3

3.4

3.5

3.6

3.7

3.8

3.9

4

4.1

Midpoints Zardoya prices

Figure 7.10: Classical Regression with single values in testing test

The Centre Method could be applied to get a predicted maximum and minimum price. This

method yields the next model:

DogiMidpoint = 5.6570 + 0.0792× ZardoyaMidpoint + ε

where

81

7. Results

Set ME MAE MSE RMSE

Training 0 0.4231 0.2268 0.4763

Testing -0.3518 0.3651 0.1642 0.4052

Table 7.6: Error Measures for Classical Regression with single values

ε ' N(0, 7.21372)

Note that the slope has changed since, according to [DeCa04], it cannot be negative to ensure that

the fitted maximum is greater than the fitted minimum.

This provides the results shown in Figures 7.11 and 7.12.

20 21 22 23 24 25 26 272

3

4

5

6

7

8

Minimum and Maximum Zardoya prices

Min

imum

and

Max

imum

Dog

i pric

es

Figure 7.11: Centre Method (2000) in training set

Table 7.7 shows the resulting error measures.

82

7. Results

22 22.5 23 23.5 24 24.53

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8


Figure 7.12: Centre Method (2000) in testing set

Set ME MAE MSE RMSE

Training -7.1288 7.1288 51.8315 7.1994

Testing -8.0653 8.0653 65.1544 8.0718


It is very clear that this model is not accurate. This example evidences the main weak point of

this approach: the positive constraint imposed to the coefficients. This makes impossible having an

inverse relationship between variables. This is pointed out by the error measures, which are very high.

Now let us see the resulting error measures according to Centre method proposed by [Bill02]:

This new definition of the error improves the previous one.

However, let us compare these last approaches with the Centre and Radius Method. In this case

we will have the next model:

DogiMidpoint = 5.6570− 0.086× ZardoyaMidpoint + ε

83

7. Results

Set ME MAE MSE RMSE

Training -7.1288 7.1288 25.9183 5.0910

Testing -8.0653 8.0653 32.5825 5.7081


where

ε ' N(0, 0.28822)

and

DogiRadius = 0.0283 + 0.08× ZardoyaRadius + ε

where

ε ' N(0, 0.02592)

The results can be seen in Figures 7.13 and 7.14 and Table 7.9.

Set ME MAE MSE RMSE

Training 0 0.4385 0.2273 0.4768

Testing -0.3426 0.3882 0.1655 0.4068

Table 7.9: Error Measures for Centre and Radius Method

As it occurred with a direct relationship between variables, now the error measures are less with

Centre and Radius Method than with Centre Method, and thus, the former is better than the latter even

84

7. Results

20 21 22 23 24 25 26 272.5

3

3.5

4

4.5

5


Figure 7.13: Centre and Radius Method in training set

22 22.5 23 23.5 24 24.53

3.2

3.4

3.6

3.8

4

4.2

4.4


Figure 7.14: Centre and Radius Method in testing set

when there is not a clear relationship.

Now, let us see what happens introducing the Bayesian methodology. Bearing in mind the previ-

ous Centre and Radius model, an expert could think that the situation would change drastically and

assign the following prior parameters to the prior distribution explained in 5.36 for the Midpoints:

85

7. Results

β0 =

(3.1

0.02

)

V0 = 10−8

s20 = 0.28822

v0 = 106

So the final Midpoint model would be:

DogiMidpoint = 3.1 + 0.02× ZardoyaMidpoint + εMidpoint

And the following prior parameters to the prior distribution for the Radii:

β0 =

(0.0283

0.08

)

V0 = 106

s20 = 0.02592

v0 = 4

So the final Radius model would be:

DogiRadius = 0.0283 + 0.08× ZardoyaRadius + ε

The results for the testing set are shown in Figure 7.15 and Table 7.10.

Set ME MAE MSE RMSE

Testing 0.1031 0.2008 0.0443 0.2104

Table 7.10: Error Measures in Bayesian Centre and Radius Method

86

7. Results

22 22.5 23 23.5 24 24.53

3.2

3.4

3.6

3.8

4

4.2

4.4


Figure 7.15: Bayesian Centre and Radius Method in testing set

The Bayesian Centre and Radius Method results again better than the rest of approaches, even in

unfavourable conditions.

Therefore, we can conclude that the Bayesian Centre and Radius method has the same advantages

as the Centre and Radius method as [DeCa07] describe, but it has also the great advantages of the

Bayesian methodology. All this makes obtaining less errors in new predictions. An important future

development could be to build a Bayesian symbolic regression model with uniformly distributed

errors.

87

Chapter 8

A Guide to Statistical Software Today

8.1 Introduction

Statistical software begins to blend in one direction with relational database software such as Oracle

or Sybase (software we do not discuss here) and with mathematical software such as MATLAB in

the other direction. Mathematical software exhibits not only statistical capabilities flowing from code

for matrix manipulation, but also optimization and symbolic manipulation useful for statistical pur-

poses. This chapter is an assessment of the state of the art of the statistical software arena as of 2007.

It attempts to touch upon a few commercial packages, a few general public license packages, a few

analysis packages with statistic add-ons, and a few general purpose languages with statistical libraries.

We begin with the most important commercial packages such as SAS, Minitab, BMDP, SPSS

or S-PLUS, followed by some of the public licenses statistical and Bayesian software such as R or

BUGS and then some general purpose mathematical software and some general programming lan-

guage with statistical libraries.

Finally, it is exposed the role of the developed application in the current statistical scene, remark-

ing the main advantages and disadvantages.

88

8. A Guide to Statistical Software Today

8.2 Commercial Packages

8.2.1 The SAS System for Statistical Analysis

SAS began as a statistical analysis system in the late 1960’s growing out of a project in the Depart-

ment of Experimental Statistics at North Carolina State University. The SAS Institute was founded in

1976. Since that time, the SAS System has expanded to become an evolving system for complete data

management and analysis. This means that SAS is really much more than a simple software system.

As an example of its great potential, it is worth to mention that it is used by the 90 percent of those

companies on the Fortune 500 list. This expansion is probably due to the fact that the SAS manage-

ment has aligned themselves with the recent ”statistical-like” advances within the computer science

community such as data mining. This clever integration of mathematical/statistical methodologies,

database technology, and business applications has helped propel SAS to the top of the commercial

statistical software arena.

The architecture for the SAS approach is called the SAS Intelligence Platform, which is really a

closely integrated set of hardware/software components that allow users to fully utilize the business

intelligence (BI) that can be extracted from their client base. Among the products making up the SAS

System are products for: management of large data bases; statistical analysis of time series; statistical

analysis of most classical statistical problems, including multivariate analysis, linear models (as well

as generalized linear models), and clustering; data visualization and plotting. Being more precise, the

SAS Intelligence Platform consists of the following components:

• The SAS Enterprise ETL Servers

• The SAS Intelligence Storage

• The SAS Enterprise BI Server

• The SAS Analytic Technologies

One of the strengths of SAS is the fact that the package which contains those capabilities that one

normally associates with a data analysis package is constantly being upgraded with each release in

order to reflect the latest and greatest algorithmic developments in the statistical field.

The SAS System is available on PC and UNIX based platforms, as well as on mainframe com-

puters so it covers almost the main options, except Macintosh. As one could guess from what has

89


been said above, this system is aimed mainly to industrial, scientists and statisticians users with a

very high needs and knowledge and who do not care about spending time in learning process to use

this complex system.

Some useful URL’s are:

• http://www.sas.com/ which is the main URL for SAS

• http://is.rice.edu/ radam/prog.html which contains some user-developed tips on using SAS

Other statistical systems which are of the same general vintage as SAS are MINITAB, BMDP and

SPSS. All of these systems began as mainframe systems, but have evolved to smaller scale systems

as computing has evolved.

8.2.2 Minitab

Minitab Inc. was formed more than 20 years ago around its flagship product, MINITAB statistical

software. MINITAB Statistical Software provides tools to analyze data across a variety of disciplines,

and is targeted for users at every level: scientists, business and industrial users, faculty, and students.

In relation to the operating system, MINITAB is available on the most widely-used computer

platforms, including Windows, DOS, Macintosh, OpenVMS, and Unix.

As the opposite of SAS, MINITAB is quite easy to learn and use. There is no lengthy learning

process and little need for unwieldy manuals. Perhaps, this may be the main reason why MINITAB

is used extensively in the educational community.

For more details about this software visit the URL http://www.minitab.com/.

8.2.3 BMDP

BMDP has its roots as a bio-medical analysis packages from the late 1960s. In many ways it has re-

mained true to its origins and this is evidenced by its long list of clients which includes such biomed-

ical giants as Bristol- Myers Squibb, Merck and Glaxo Wellcome. There are three main distributions:

90


BMDP New System Personal Edition, the BMDP Classic for PCs - Release 7, and the BMDP New

System Professional Edition. As BMDP New System has an easy-to-use interface that makes data

analysis possible with simple point and click and fill-in-the-blank interactions, the Professional Edi-

tion combines the full suite of BMDP Classic for PCs Release 7 statistics with the powerful data

management and front-end data exploration features of the BMDP New System Personal Edition.

A reference URL for BMDP is http://www.ppgsoft.com/bmdp00.html.

8.2.4 SPSS

SPSS is a multinational software company founded in the late 1960s that provides statistical product

and service solutions for survey research, marketing and sales analysis, quality improvement, scien-

tific research, government reporting and education.

SPSS starts with the SPSS Base which includes most popular statistics, complete graphics, broad

data management and reporting capabilities. The SPSS products are a modular system and includes

SPSS Professional Statistics, SPSS Advanced Statistics, SPSS Tables, SPSS Trends, SPSS Cate-

gories, SPSS CHAID, SPSS LISREL 7, SPSS Developer’s Kit, SPSS Exact Tests, Teleform, and

MapInfo. Although this software was originally designed for mainframe use, SPSS has adapted to

market demand and it has releases for Windows, MAC and UNIX.

A reference URL for SPSS is http://www.spss.com/.

8.2.5 S-PLUS

While there are many different packages for performing statistical analysis, one that offers some of

the greatest flexibility with regard to the implementation of user defined functions and the customiza-

tion of ones environment is S-PLUS, which is one of the two implementations of the S language (R

is the other one, which will be viewed later).

S is an exceptionally well-developed tool for statistical research and analysis. S is especially

strong for statistical graphics, the output of data analysis through which both raw data and results are

displayed for both analysts and clients. S was originally developed at AT&T Bell Labs (recently split

into AT&T Laboratories and Lucent Bell Labs) by a team of researchers including Richard A. Becker,

91


John M. Chambers, Allan Wilks, William S. Cleveland and Trevor Hastie. The original description

of the S language, which was written by Becker, Chambers, and Wilks in 1988, was awarded by the

Association for Computing Machinery (ACM) for the 1998 Software System Award. The aim of the

language, as expressed by John Chambers, is ”to turn ideas into software, quickly and faithfully”.

A good introduction to the application of S to statistical analysis problems is contained in [Cham92]

and [Cham83]. More recent work that focus on the statistical capabilities of the S-PLUS system can

be found in [Vena02].

S-PLUS is manufactured and supported by the Statistical Sciences Corporation, now a division of

MathSoft. It runs on both PC and UNIX based platforms. In addition the company offers easy links

for the user to call S-PLUS from within C/FORTRAN or for the user to call C/FORTRAN compiled

functions within the S-PLUS environment. Statistical Sciences has made great efforts to keep the

software current with regard to the needs of the statistical community. They have released dedicated

modules which are targeted at specific application areas.

The S-PLUS home page can be reached at http://www.mathsoft.com/. This site contains an inter-

esting comparison between SAS and S-PLUS.

8.2.6 Others

Other statistically oriented packages enjoying good reputations are SYSTAT, DataDesk, JMP and

StatGraphics. SYSTAT originated as a PC-based package developed by Leland Wilkinson and is now

owned by SPSS. The current version is 6.0 and is a Microsoft Windows oriented product. On the con-

trary, DataDesk is a Macintosh-based product authored by Paul Velleman from Cornell University.

Currently released is version 5.0.1. and it is a GUI-based product which contains many innovative

graphical data analysis and statistical analysis features. More information about DataDesk can be

found at URL: http://www.lightlink.com/datadesk/. JMP is another SAS product that is highly visu-

alization oriented. It is a stand alone product for PC and Macintosh platforms. Information on JMP

can be found at http://www.sas.com/. Statgraphics is an education- orientated statistical software to

be used mainly in Universities and which offers a friendly-user interface. A good reference showing

how to use StatGrahics can be found in [Mate95].

92


8.3 Public License Packages

8.3.1 R

R is an Open Source implementation of the well-known S language which was originated at the Uni-

versity of Auckland, New Zealand, in the early 1990’s. It works on multiple computing platforms like

Unix systems or Windows, but the most important characteristic is that a software system that exists

under the Open source paradigm benefits from having ”many pairs of eyes” to examine the software

to help insure quality of the software. An example of the rapid development of this software is that in

1997, only two years after the public release in June 1995, the led team had to select a Core group of

around 10 members, who was responsible for changes to source code.

R software, for the most part, is a command line based language which is organized into vari-

ous packages. Basic packages are installed by default, and the user can download and install a great

variety of packages to be used. There are also several major projects that are ”R spin-offs”, such as

”Bioconductor”, which is an R package for gene expression analysis or ”Omega”, which is another

package focused on providing seamless interface between R and a number of other languages (PERL,

PYTHON, MATLAB). There are two main packages that have to be mentioned because of their im-

portance and implication for this project: JRI and bayesm. The first one deals with the problem of

communicating Java with R. This will let us create a graphical user interface using Swing in Java

and make all the statistical calculates with R. The second one, developed by [Rossi06], contains the

main functions to be used in Bayesian analysis. Precisely, in Bayesian data analysis is where R can

be better than the other statistical softwares.

More information about R can be found at http://www.r-project.org/.

8.3.2 BUGS

The BUGS (Bayesian inference Using Gibbs Sampling) project is concerned with flexible software

for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo methods.

The project began in 1989 in the MRC Biostatistics Unit and led initially to the ”Classic” BUGS

program, and the onto the WinBUGS software developed jointly with the Imperial College School

of Medicine at St. Mary’s, London. Development now also includes the OpenBUGS project in the

University of Helsinki, Finland.

93


The main advantages of these software is, as R, the flexibility that offers to the researcher to

model whatever he needs, but they are slightly more complex to learn than R is. Due to this fact, Phil

Woodward developed BugsXLA, which is an Excel Add-In that not only allows the user to specify

a model as one would in a package such as SAS or S-PLUS, but also aids the specification of priors

and control of the MCMC run itself.

More information can be found in http://www.mrc-bsu.cam.ac.uk/bugs/.

8.4 Analysis Packages with Statistical Libraries

8.4.1 Matlab

MATLAB is an interactive computing environment that can be used for scientific and statistical data

analysis and visualization. The basic data object in MATLAB is the matrix. The user can per-

form numerical analysis, signal processing, image processing and statistics on matrices, thus freeing

the user from programming considerations inherent in other programming languages such as C and

FORTRAN. There are versions of MATLAB for Unix platforms, PC’s running Microsoft Windows

and Macintosh. Because the functions are platform independent, provides the user with maximum

reusability of their work.

MATLAB comes with many functions for basic data analysis and graphics. Most of these are

written as M-file functions, which are basically text files that the user can read and adapt for other

uses. The user also has the ability to create their own M-file functions and script files, thus making

MATLAB a programming language. The recent addition of the MATLAB C-Compiler and C-Math

Library allow the user to write executable code from their MATLAB library of functions, yielding

faster execution times and stand-alone applications.

For researchers who need more specific functionality, MATLAB offers several modules or tool-

boxes. These typically focus on areas that might not be of interest to the general scientific community.

Basically, the toolboxes are a collection of M-file functions that implement algorithms and functions

common to an area of interest.

94


One of the most useful capabilities of MATLAB is the tools available for visualizing data. MAT-

LAB supports standard two and three dimensional scatter plots along with surface plots. In addition it

provides the user with a graphics property editor. As it occurs with R, there is a considerable amount

of contributed MATLAB code available on the internet. One notably useful source of code is avail-

able via the home page for MATLAB at http://www.mathworks.com/, where more information about

this software can be found.

8.4.2 Mathematica

Mathematica is an algebra computational system developed originally by Stephen Wolfram and sold

by his company, Wolfram Research. It has numerical and graphical features and powerful symbolic

processing capabilities but is comparatively complex to learn. Information on Mathematica is avail-

able at URL http://www.wolfram.com/.

8.4.3 Others

Other mathematical software worth noting is MAPLE, with powerful symbolic processing capabili-

ties, and MATHCAD, a package which combines numerical, symbolic, and graphical features. More

information about these software can be found at their official web sites, which are :

• http://www.maplesoft.com/

• http://www.mathsoft.com/

8.5 Some General Languages with Statistical Libraries

8.5.1 Java

It is difficult to asses the state of the art with regards to Java statistical libraries in that there may be

many custom user developed packages that we are unaware of. Given this caveat, there are three main

packages to mention.

The first one is StatCrunch, which provides the user capability to perform interactive exploratory

data analysis, logistic regression, nonparametric procedures, regression and regression diagnostics,

95


and others. The reader is referred to a review that appeared in [West04].

Another source of Java- based statistics functions is the Apache Software Foundation Jakarta

math project. The math project seeks to provide common mathematical functionality to the Java user

community.

The final source for Java- based statistical analysis is the Visual Numerics JSML package. It

provides the user with an integrated set of statistical, visualization, data mining, neural network, and

numerical packages. The reader is referred to http://www.vni.com/products/imsl/jmsl/jmsl.html for

additional discussions on JSML.

8.5.2 C++

C++ is another object oriented language program, like Java, with different statistical libraries. There

are two libraries that are worth mentioning. Goose and Probability and Statistics.

The first one is dedicated to statistical computation, and provide support for t-tests, F-tests,

Kruskall-Wallis tests, Spearman tests and others with an implementation of simple linear regression

models. More information about this in http://www.gnu.org/software/goose/goose.html.

The second one is aimed to Microsoft Windows developers and consists of five packages: statis-

tics, discrete probability, standard probability distributions, hypothesis testing and correlation and

regression. A strength of these modules are their support for various interfaces including C# and C++

.NET. The reader is referred to the URL http://www.webcabcomponents.com/dotNET/dotnet/pss/.

8.6 Developed Software Tool: BARESIMDA

The software tool developed throughout this project, as it has been said above, is based on Java and

R, both of them public licenses software. It has not been developed with the intention of creating

a complete statistical package which can be an alternative to any of the above software. Evidently,

it is very difficult to incorporate all the facilities that those programs have, and much more in a one

year period time for only one developer. In fact, BARESIMDA only focus on regression analysis

96


procedures with different approaches and data. In that sense, the developed tool gathers classical and

Bayesian regression and let user analyze Normal regression models in a very simple way with a very

intuitive graphical user interface. This is a very important feature in Bayesian approach, where there

is a complex theoretical basis and many users may not be familiarized with it.

Another advantage, maybe the most important one, over the rest of statistical package is that

BARESIMDA incorporates regression analysis with interval data in either classical and Bayesian ap-

proach. Not only it displays the analytical results but also let us see graphically the goodness fit and

the centres and radii tendencies.

With this first version of BARESIMDA, we have wanted to start the way towards public license

software which can take the advantages from the Java graphical user interface with Swing and from

statistical libraries in R.

97

Chapter 9

Software Requirements Specification

This chapter defines the complete description of the functions to be performed by the BARESIMDA

software, so it will assist the potential users to determine if the software specified meets their needs

or how the software must be modified in order to meet their needs.

This also reduces the development effort since the preparation of the Software Requirements

Specification (SRS) forces the developer to consider rigorously all of the requirements before de-

sign begins and reduces later redesign, recoding, and retesting. Careful review of the requirements

in the SRS can reveal omissions, misunderstandings, and inconsistencies early in the development

cycle when these problems are easy to correct. Likewise it provides a basis for estimating costs and

schedules and a baseline for verification and validation.

9.1 Purpose

The aim of this system is to provide a tool to build different types of regression analysis and check

advantages and disadvantages in each approach that has been developed.

9.2 Intended Audience

The software is intended to be handled by different types of users such:

• Inexperienced people who has a minimum knowledge about what regression is and what con-

sists on.

98

9. Software Requirements Specification

• Students and people with a medium degree of knowledge about regression and minimum infor-

mation about Bayesian paradigm.

• Graduate and experienced people who has a deep knowledge about regression and Bayesian

analysis and want to learn about symbolic regression.

9.3 Functionality

The software is supposed to do those things shown in the following points.

9.3.1 Classical Regression with crisp data

This refers to analytic and graphical analysis of multiple and simple classical Normal regression

models with crisp data. Being precise, the software has to provide the following facilities:

• Regression analysis summary with estimated parameters

• ANOVA table.

• Normality test.

• Heteroscedasticity test.

• Autocorrelated errors test.

• To predict new data.

• Complementary graphics to see the fitted model.

9.3.2 Classical Regression with interval- valued data

As well as it was done with crisp data, a regression analysis must be carried out with symbolic data,

specifically, with interval- valued data. All the functions which were described previously must be

implemented for the centres and radii regressions. In addition, the software will display graphically

the adequacy of the fitted model to the original interval- valued data.

99


9.3.3 Bayesian Regression with crisp data

The user must be capable of creating two different Bayesian models: Normal and Independent Nor-

mal. Since the main characteristic of Bayesian paradigm is the possibility to introduce subjective

information, the application will provide a very intuitive dialog to retrieve user’s beliefs about the

different parameters. The software will display the estimated parameters, it will provide a Normality

test for residuals and input fields to make new predictions.

9.3.4 Bayesian Regression with interval- valued data

As it occurred with classical regression, Bayesian regression must be possible to be carried out with

interval- valued data so the user will be able to incorporate prior information about the centres and

the radii. The analysis options are the same that those for crisp data and it has additional graphics to

see the adequacy of fitted interval- valued data with observed data.

9.3.5 Data Manipulation

The user will be able to type in new data by hand or to load an existing excel file into the application.

In the same way, he will be able to save both source data and the following resulting data:

• Residuals

• Normalized residuals

• Studentized residuals

• Fitted values

• Predicted values in an excel file.

9.3.6 Portability

The application must be able to be executed in the main platforms such as Windows, Linux and Unix.

9.3.7 Maintainability

In the same way, the tool must be well structured to be easily maintainable since changes and exten-

sions in the future are quite probable.

100


9.4 External Interfaces

9.4.1 User Interfaces

The application to be developed will have a Multiple Document Interface (MDI) with a high degree

of usability. The former means that its windows will reside under a single parent window like Figure

C.1 shows.

Figure 9.1: BARESIMDA MDI

This will avoid filling up the operating system task management interface, as they are hierarchi-

cally organized, and it will let the user hidden/ show/minimize/maximize them as a whole.

The second characteristic means that the user will not have to think too much about what the

application does or how it does it.

There will be an option to configure the application look to be able to be adapted to user’s prefer-

ences. The user will have the possibility to set the windows look as:

• Unix

• Windows

101


• Windows Classic

• Java

In the same way, the user will be able to indicate if she or he is an experienced or an inexperienced

user, what will help her or him to specify prior information in Bayesian regression.

9.4.2 Software Interfaces

BARESIMDA will connect to a Statistics software which will be responsible for making all compu-

tations and returning them to BARESIMDA. In this way all the operations must be transparent for

the end user through an interface which will let us interact between both programs. This makes the

application more usable.

Regarding input and output data, there will be necessary an interface to let us read and write from

and to an excel file.

102

Chapter 10

Software Architecture Study

10.1 Hardware/ Software Architecture

The application will be programmed in Java and built, executed and tested with SDK version 1.4.2

or posterior. Specifically, the graphical user interface will be developed using Swing. This is one

of the most powerful tools for developing user-friendly mechanism for interacting with an applica-

tion giving it a distinctive ”look” and ”feel”. Its libraries are part of the Java Foundation Classes

(JFC) - Java’s libraries for cross- platform GUI development. For more information on JFC visit

http://java.sun.com/products/jfc/. This will let us develop the main interface in a particular system,

but then can be executed in any platform, allowing users from different operating systems use the

look and feel of their own platform.

The software which has been chosen to carry out the statistics processes has been R since it is

distributed under public license, like Java; it let developer a high degree of flexibility to model and to

program the models that he wants to build and it is having a great expansion between statisticians and

scientists.

The way to communicate BARESIMDA and R is through the Java to R interface, called JRI. This

is a .jar library which can be obtained from http://rosuda.org/JRI/ and it allows running R inside Java

applications as a single thread. Basically it loads R dynamic library into Java and provides a Java API

to R functionality. JRI uses native code, but it supports all platforms where Sun’s Java (or compatible)

is available, including Windows, Mac OS X, Sun and Linux. More information about this interface

can be found in the reference cited above.

103

10. Software Architecture Study

Figure 10.1: Interface between BARESIMDA and R

As it was indicated in the previous chapter, BARESIMDA is required to read/ write excel files. For

such purpose, POI project consists of various parts that fit together to deliver the data in a Microsoft

file format to the Java application. Specifically, and according to our requirements, HSSF is the POI

project’s pure Java implementation of the Excel file format. It provides a way to create, modify, read

and write XLS spreadsheets. Being more precise, it offers:

• Low level structures for those with special needs

• An event-model API for efficient read-only access.

• A full user model API for creating, reading and modifying XLS files.

Visit http://jakarta.apache.org/poi/hssf/index.html for more information.

10.2 Logical Architecture

The application will be structured in three levels or layers where each of them will have a well defined

responsibility:

104

10. Software Architecture Study

Figure 10.2: Interface between BARESIMDA and Excel

• gui: it will be responsible for showing the graphical user interface and for getting the input

parameters and requests and passing them to the classes which will process them.

• action: it will contain the main procedures that treat the information and elaborate the regres-

sion models and analysis. The results will be given back to the caller process. It will be

responsible for calling the dao classes too.

• dao: it will be responsible for accessing to permanent data, this is, to load and to save informa-

tion.

Figure 10.3 shows the relation among these packages.

Figure 10.3: Logical Architecture

105

Chapter 11

Project Budget

Project costs for this system have been divided into two types of costs, which will be commented in

the following sections:

• Engineering costs.

• Investment and Materials Costs.

There is also a section summarizing the entire expected budget for the project. There is not any

commercial cost since it is intended to be a public license software for free distribution.

11.1 Engineering Costs

A computer engineer working in the environment where the project is focused on is expected to earn

around 2500 e/month. There is an additional extra cost of 30% for paying the Social Security.

The programmer works 8 hours/day, in a mean of 22 days/month. This makes a mean of 176

hours/month. Thus, the cost per hour is 18.46 e/h.

The estimated time required for the development of the project is divided in the work packages

explained in the beginning of this project:

• Bayesian Data Analysis: 168 hours

• Regression Models: 160 hours.

106

11. Project Budget

• Symbolic Data: 64 hours.

• Requirements Specification: 40 hours.

• Architecture Study: 56 hours.

• Design: 80 hours.

• Programming: 416 hours.

• Testing: 40 hours.

The estimated time required for the project is: 1024 hours (5 months and 18 days). Thus, the

estimated engineering costs is: 18903.04e.

11.2 Investment and Elements Costs

The elements used for the development of this project have been computer and software equipment.

These costs can be seen in Table 11.1.

Element Price

Pentium D925 to 3GHz 630e

Other expenses (Internet connection, office materials) 60e

Total 690e

Table 11.1: Estimated material costs

The amortization period for this type of elements is considered to be complete after 10000 work-

ing hours. Moreover, the usage rate is considered to be of about 85% of the engineering work hours,

thus obtaining the results shown in Table 11.2.

107

11. Project Budget

Concept Total

Hours of use of the material 870.4 hours

Resources cost/hour 0.19e/h

Total amortization materials costs 165.38e

Table 11.2: Amortization Costs

Thus, the sum of the engineering and material cost is 19068.42e. It can be assumed that the

investment done is about 5% of the engineering cost, thus the investment cost sums 945.15e.

Therefore, the total cost of the project, which is the sum of the engineering, materials, and invest-

ment costs, is estimated to be 20013.57e.

11.2.1 Summarized Budget

The overall expected budget can be observed in Table 11.3.

108

11. Project Budget

Cost Total

Engineering 18903.04e

Material 165.38e

Investment 945.15e

Total 20013.57e

Table 11.3: Summarized Budget

109

Chapter 12

Conclusions

12.1 Bayesian Regression applied to Symbolic Data

Dealing with a current and recent investigation matter such as symbolic data involves having a high

English level, since it is the universal language. On the other hand, a project like this that depends

on current investigations makes more difficult its progress since it is not treating with an established

issue.

A good investigation task requires a rigorous documentation and a complete bibliography. There

must be enough well cited references to let the reader find more information about those points inter-

esting for her or him.

Bayesian methodology is called to be a fundamental element in business processes orientated to

predict and forecast new situations and quantities. Although I have really enjoyed a lot with this

project, I suspect that, with a more complete previous formation in Bayesian data analysis, I could

have saved some initial time in learning concepts that, later, result obvious. This would let me extend

the project to other fields such as regression with hierarchical models or nonparametric Bayesian re-

gression, where the authentic Bayesian potential resides. However, this is probably due to the fact

that the more one knows about an issue, the more one likes it and the more one wants to learn about

it, and therefore the problem would never finish. Regarding to this, the project has responded and

exceeded to the initial personal perspectives, arousing a great interest for the investigation field and

learning to value this hard but exciting arena.

110

12. Conclusions

If I could change any of the past related to the project planning, I would have tried to condense the

study stage in order to spend more time on the application of the software tool to more real problems

and situations. Nevertheless, it would be difficult to carry out since the project development is done

within an academic year, where more activities are done.

12.2 BARESIMDA Software Tool

Public license software is increasingly enormously fortunately. This lets everybody have more op-

tions to choose.

In that sense, R is a great tool for programming new models, but it requires, on one hand, a very

high Statistics knowledge, since people requirements with a low- medium Statistics level are satisfied

by current Statistics software. On the other hand, it requires a medium programming level to be able

to carry out one’s ideas. Moreover, the way in which R handles data results tedious for someone used

to work with matrix representation.

Interconnecting different interfaces or applications is usually a difficult task, especially when there

is very little documentation to establish the connection in both sides. This problem is very important,

and is not usually taken into consideration when integrating different environments.

Concerning Java, it is really incredible all the possibilities and facilities that this programming

language offers. This makes programming task much easier.

12.3 Future Developments

As it can be deduced from all what has been said above and in previous chapters, the project could

have many and different extensions. The more important ones are:

• Bayesian regression with hierarchical models for interval- valued data.

• Bayesian time series for interval- valued data.

• Bayesian linear regression for histogram-valued data.

• Nonparametric Bayesian regression for interval- valued data.

111

12. Conclusions

• Bayesian Vector Autoregressive for interval- valued data.

• Bayesian regression for functional data.

• Bayesian symbolic regression with uniformly distributed errors.

Likewise, the software tool can be improved adding some conventional statistical functions in

order to get public license Statistics software with a user-friendly graphical interface.

12.4 Summary

In one hand, we have built a new Bayesian regression model for interval- valued data fitting better

than other existing approaches providing that prior information is accurate. And, as it has been shown,

this works well for both directly relationship between variables and uncorrelated variables. This is

an important advance in symbolic data field, since to our best knowledge there is not any Bayesian

approach for this kind of data.

On the other hand, a new software tool letting user make Bayesian symbolic regression has been

developed. Again, to our best knowledge, there is not any package with the same friendly-user in-

terface and the same facilities. Furthermore, it offers the possibility to make standard and Bayesian

regression with classical and symbolic data individually respectively.

As a result of this project, author and director are working together in a paper about past, present

and future of regression which is intended to be sent to ANALES. In the same way, another possible

article about Bayesian symbolic regression is born in mind to a more transcendent journal such as

Computational Statistics and Data Analysis (CSDA).

112

Appendix A

Probability Distributions

A number of probability distributions together with their density or probability mass function, mean

or variance have been used or mentioned previously. For ease of reference, their definitions are

regrouped in this chapter, together with a short discussion of their key properties. More information

about these distributions in a Bayesian context can be found in [Gelm04] or [Mate93].

A.1 Discrete Distributions

A.1.1 Binomial

The Binomial distribution is perhaps the most commonly encountered discrete distribution in Statis-

tics, and it is used in quality control by attributes and sampling techniques with replacement. Consider

a sequence of n independent trials, each of which can result in one of just two possible outcomes,

namely success and failure. Further assume that the probability of success, p, is the same for each

trial. Let Y denote the number of successes observed in the n trials. Then Y has a Binomial distri-

bution with parameters n and p. Properly, a discrete random variable, Y , has a Binomial distribution

with parameters n and p, denoted Y ' Bin(n, p), if its probability mass function is given by:

f(y|n, p) = py(1− p)n−y (A.1)

where n > 0, y = 0, 1, . . . , n and 0 ≤ p ≤ 1.

Likewise, mean and variance are defined:

113

A. Probability Distributions

E(Y ) = np (A.2)

V ar(Y ) = np(1− p) (A.3)

A.1.2 Geometric

The Geometric distribution is related in a certain way with the previous one. Consider the same

situation that before with a sequence of n independent trials with a constant success probability p

in each trial. In this case the number of trials varies until the first success is obtained, that is, it is

used to model the number of trials until the first success is obtained and it is common in reliability

analysis. Formally, a discrete random variable, Y , has a Geometric distribution with parameter p,

denoted Y ' Geo(p), if its probability mass function is given by:

f(y|p) = (1− p)y−1p (A.4)

where p ≥ 0 and y = 1, 2, . . . , n.

In the same way, mean and variance are denoted by:

E(Y ) =1p

(A.5)

V ar(Y ) =1− p

p2(A.6)

A.1.3 Poisson

The Poisson distribution is commonly used to represent count data, such as the number of shares sold

in a fixed time period. As well it is usual to see it in reliability analysis. In an strict way, a discrete

random variable, Y , has a Poisson distribution with parameter λ, denoted Y ' P (λ), if its probability

mass function is given by:

f(y|λ) =exp(−λ)λy

y!(A.7)

where λ ≥ 0 and y = 0, 1, . . . , n.

114


In the same way, mean and variance are denoted by:

E(Y ) = λ (A.8)

V ar(Y ) = λ (A.9)

A.2 Continuous Distributions

A.2.1 Uniform

The uniform distribution is used to represent a variable that is known to lie in an interval and equally

likely to be found anywhere in the interval. The main characteristic is that if a variable, X , has a

probability distribution F (x), then the variable Y = F (X) is uniform in the interval [0, 1]. Properly,

a continuous random variable, Y , has a Uniform distribution over the interval [a,b], denoted Y 'U(a, b), if its probability density function is given by:

f(y|a, b) =

{1

b−a a ≤ y ≤ b

0 otherwise(A.10)

where −∞ < a < b < ∞.

Mean and variance are specified alike by:

E(Y ) =a + b

2(A.11)

V ar(Y ) =(b− a)2

12(A.12)

A.2.2 Univariate Normal

The Normal distribution, also called Gaussian distribution, is ubiquitous in statistical work. It is a

family of distributions of the same general form, differing in their location and scale parameters: the

mean and standard deviation, respectively. The standard normal distribution is the normal distribution

with a mean of zero and a variance of one. Formally, a continuous random variable, Y , has a Normal

distribution with mean µ and variance σ2, denoted Y ' N(µ, σ2), if its probability density function

is given by:

f(y|µ, σ2) =1√

2πσ2exp(−(y − µ)2

2σ2) (A.13)

115


where σ2 ≥ 0, −∞ < µ < ∞ and y ∈ <.

Likewise, mean and variance are formulated by:

E(Y ) = µ (A.14)

V ar(Y ) = σ2 (A.15)

A.2.3 Exponential

This distribution is used to model the time, t, between independent events that happen at a constant

rate,λ. Therefore, this is the distribution of waiting times for the next event in a Poisson process and

is a special case of the Gamma distribution with α = 1. Formally, a continuous random variable,

Y , has an Exponential distribution with parameter λ, denoted Y ' Exp(λ), if its probability density

function is given by:

f(y|λ) = λexp(−λy) (A.16)

where λ ≥ 0 and y ≥ 0.

Similarly, mean and variance are identified by:

E(Y ) =1λ

(A.17)

V ar(Y ) = 1λ2 (A.18)

A.2.4 Gamma

A Gamma distribution is a general type of statistical distribution that is related to the Beta distribution

and arises naturally in processes for which the waiting times between Poisson distributed events are

relevant.

In Bayesian context, the Gamma distribution is the conjugate prior distribution for the inverse of

the normal variance and for the mean parameter of the Poisson distribution.

116


In a formal way, a continuous random variable, Y , has a Gamma distribution with shape and scale

parameters α and β , respectively, denoted Y ' Gamma(α, β), if its probability density function is

given by:

f(y|α, β) =yα−1exp(− y

β )

βαΓ(α)(A.19)

where α > 0 and y > 0.


E(Y ) = αβ (A.20)

V ar(Y ) = αβ2 (A.21)

A.2.5 Inverse- Gamma

If Y −1 has a Gamma distribution with parameters α and β then Y has the Inverse- Gamma distribu-

tion. In a Bayesian context, this distribution is the conjugate prior distribution for the normal variance.

Formally, a continuous random variable, Y , has an Inverse-Gamma distribution with shape and

scale parameters α and β, respectively, denoted Y ' Inv−Gamma(α, β), if its probability density


f(y|α, β) =βαy−α−1exp(−β

y )

Γ(α)(A.22)

where α > 0, β > 0 and y > 0.


E(Y ) =β

α− 1α > 1 (A.23)

V ar(Y ) =β2

(α− 1)2(α− 2)α > 2 (A.24)

A.2.6 Chi-square

It is an essential distribution in Inference Statistics and in goodness tests. The χ2 distribution is a

special case of the Gamma distribution, with shape parameter α = v2 and scale parameter β = 12.

117


Since it is a special case, we need not define again the density function and mean and variance as they

can be deduced easily from the Gamma distribution.

A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square

As the χ2 distribution is a special case of the Gamma distribution, the inverse χ2 distribution is a

special case of the inverse- Gamma distribution, with shape parameter α = v2 and scale parameter

β = 12. So, for density function and mean and variance, see inverse- Gamma distribution. We also

define the scaled inverse χ2 distribution, which is useful for variance parameters in normal models.

A continuous random variable, Y , has a scaled inverse χ2 distribution with v degrees of freedom and

scale s, denoted Y ' ScaledInv − χ2(v, s2), if its probability density function is given by:

f(y|v, s) =(v

v )v2

Γ(v2 )

svy−( v2+1)exp(−vs2

2y) (A.25)

The mean and variance are defined by:

E(Y ) =v

v − 2s2 v > 2 (A.26)

V ar(Y ) =2v2

(v − 2)2(v − 4)v > 4 (A.27)

Note that this is the same as Inv −Gamma(α = v2 , β = v

2s2).

A.2.8 Univariate Student- t

The Student’s t-distribution is a probability distribution that arises in the problem of estimating the

mean of a normally distributed population when the sample size is small. In regression analysis, it is

used to represent the posterior predictive distribution in Normal regression. As anecdote, it is worth

to mention that this distribution was published by William Gosset in 1908, but he was not allowed

to bring it out under his own name, so the paper was written under the pseudonym Student. Strictly,

a continuous random variable, Y , has a Student’s t-distribution with degrees of freedom, denoted

Y ' t(v), if its probability density function is given by:

f(y|v) =Γ(v+1

2 )√vπΓ(v

2 )(1 +

y2

v)−

v+12 (A.28)

118


where v > 0 and y > 0.

In the same way, mean and variance are identified by:

E(Y ) = 0 v > 1 (A.29)

V ar(Y ) =v

v − 2v > 2 (A.30)

A.2.9 Beta

In probability theory and statistics, the Beta distribution is a family of continuous distributions de-

fined on the interval [a, b] differing the values of their two non- negative shape parameters, α and

β. In Bayesian context, the Beta is the conjugate prior distribution for the binomial probability. A

continuous random variable, Y , has a Beta distribution with α and β, denoted Y ' Beta(α, β) if its

probability density function is given by:

f(y|α, β) =Γ(α + β)Γ(α)γ(β)

yα−1(1− y)β−1 (A.31)

where α ≥ 0 and β ≥ 0

Mean and variance are identified by:

E(Y ) =α

α + β(A.32)

V ar(Y ) =αβ

(α + β)2(α + β + 1)(A.33)

A.2.10 Multivariate Normal

The multivariate normal distribution extends the univarate Normal distribution model to fit vector

observations. A p- dimensional vector of continuous random variables, Y = (Y1, Y2, . . . , Yp), is said

to have a multivariate Normal distribution with vector of means ~µ and variance- covariance matrix Σ,

if its probability density function is given by:

f() = (1√2π

)p2 |Σ|− 1

2 exp[−12(y − ~µ)′Σ−1(y − ~µ)] (A.34)

119


Likewise, mean and variance are formulated by:

E(Y ) = ~µ (A.35)

V ar(Y ) = Σ (A.36)

A.2.11 Multivariate Student- t

It is a multivariate generalization of the Student’s t-distribution. Rigorously, a continuous random

variable has a multivariate Student’s t-distribution with v degrees of freedom, location ~µ = (µ1, . . . , µd)

and symmetric, positive definite dxd scale matrix Σ, denoted Y ' t(v, ~µ,Σ), if its probability density


f(y|v, ~µ,Σ) =Γ(v+d

2 )

Γ(v2 )v

d2 π

d2

Σ−1 12 (1 +

1v(y − ~µ)′Σ−1(y − ~µ))−

−(v+d)2 (A.37)

In the same way, mean and variance are defined by:

E(Y ) = µ v > 1) (A.38)

V ar(Y ) =v

v − 2Σ v > 2 (A.39)

A.2.12 Wishart

The Wishart is the conjugate prior distribution for the inverse covariance matrix in a multivariate Nor-

mal distribution. It is a multivariate generalization of the Gamma distribution. The integral is finite if

the degrees of freedom parameter, v, is greater or equal to the dimension, k.

Formally, a continuous random variable, Y , has a Wishart distribution with v degrees of freedom

and symmetric, positive definite k × k scale matrix S, denoted Y ' Wishartv(S), if its probability

density function is given by (W positive definite):

f(y|v, S) = (2vk2 π

k(k−1)4

k∏

i=1

Γ(v + 1− i

2))−1|S|− v

2 |W |− v+k+12 exp[−1

2tr(S−1W )] (A.40)

Similarly, mean is defined by:

E(Y ) = vS (A.41)

120


A.2.13 Inverse- Wishart

If W−1 ' Wishartv(S), then W has the inverse- Wishart distribution. This is the conjugate prior

distribution for the multivariate Normal covariance matrix. Formally, a continuous random variable,

Y, has an inverse- Wishart distribution with v degrees of freedom and symmetric, positive definite kxk

scale matrix S, denoted Y ' Inv −Wishartv(S−1), if its probability density function is given by

(W positive definite):

f(y|v, S) = (2vk2 π

k(k−1)4

k∏

i=1

Γ(v + 1− i

2))−1|S| v2 |W | v+k+1

2 exp[−12tr(SW−1)] (A.42)

Similarly, mean is defined by:

E(Y ) = (v − k − 1)−1S (A.43)

121

Appendix B

Installation Guide

B.1 From source folder

Source folder contains the following files and folders:

• BARESIMDA.jar: it is the executable application file. Java Runtime Enviroment 1.4.2 or pos-

terior, R 2.4.1 or posterior and the libraries provided in the folder requires to be installed.

• R Libraries: with the libraries to be moved into the R software library \%R HOME%\library.

• Java Library: it contains the file to be moved into \%JAVA HOME%\lib\ext.

\%R HOME% and \%JAVA HOME% refers to the path in which R and Java are installed respec-

tively. For instance, in Windows, if you have installed them into the root directory C: you should have

C:\R\R-2.4.1\library and C:\Java\lib\ext.

B.2 From installer

An installer will be provided to make the installation process much easier. It is not necessary any

previous program, since the installer will install the Java Runtime Environment and R. As result of

executing this installer, a new folder and a shortcut icon will be created.

122

Appendix C

User’s Guide

C.1 Data Entry

C.1.1 Loading an excel file

1. Select the file menu item in the menu bar.

Figure C.1: Load Data Menu

2. Put the mouse over the Load element and click on it.

3. A dialog box shown in Figure C.2 will be displayed. Click on the Search button to select the

excel file to load and indicate the sheet number in the field with that label. If the first row in

the data sheet is the header with the variable names, then click OK to load data. Otherwise,

deselect the variable names option and click OK.

4. Then, data will be displayed in the Data windows as it was done in an excel sheet (see Figure

C.3)

C.1.2 Defining a new variable

1. Ensure that Data window is the active window.

123

C. User’s Guide

Figure C.2: Select File Dialog

Figure C.3: Display Loaded Data

2. Define the new variable by clicking on the New Variable button (see Figure C.4).

3. You will be required to type in the name of the new variable. Type it in and click OK (see

Figure C.5).

4. A new column will be added to the spreadsheet with the new variable as header (see Figure

C.6).

5. If you want to define several new variables, repeat from step 2 as necessary

124

C. User’s Guide

Figure C.4: Define New Variable

Figure C.5: Enter New Variable Name

Figure C.6: Display New Variable

125

C. User’s Guide

C.1.3 Editing an existing variable


2. Click on the Edit Variable button (see Figure C.7).

Figure C.7: Edit Variable

3. A dialog will be displayed. Select the variable to edit and go on (see Figure C.8).

Figure C.8: Select Variable to Be Editted

4. A new dialog will be shown and you will be required to type in the new name of the variable.

Type it in and the variable will be stored with the new name (see Figure C.9).

126

C. User’s Guide

Figure C.9: Enter New Name

C.1.4 Deleting an existing variable


2. Click on the Delete Variable button and a dialog will be displayed.

3. Select the variable to delete and go on. A confirmation dialog will be shown. Confirm that is

the variable to be deleted and the variable and its data will be removed from the application

(see Figure C.10).

Figure C.10: Confirmation

C.1.5 Typing in a new data row


2. Click on the New Row button. If there is any defined variable previously, a row will be added

to the spreadsheet with so many columns as variables are defined (see Figure C.11).

3. Double-click onto the cell to edit and enter the new value. When you finish, press enter (see

Figure C.12).

4. Repeat step 2 and 3 as necessary.

127

C. User’s Guide

Figure C.11: New Row data

Figure C.12: Type Data

C.1.6 Deleting an existing data row


2. Select the data row or rows to be deleted. Then click on the Delete Row button. A confirmation

dialog will be displayed.

3. Confirm and all data in those rows will be removed.

128

C. User’s Guide

C.1.7 Modifying an existing data


2. Select the data cell to be modified and double-click onto it. You will be able to edit the cell

value. When you finish, press Enter.

C.2 Configuration

C.2.1 Setting the look& feel

1. Select the Look&Feel item in the Configuration element of the menu bar (see Figure C.13).

Figure C.13: Look And Feel Menu

2. Select the Look&Feel style you want. The available options are: Metal (Java style), CDE/

Motif (Unix/ Linux style), Windows and Windows Classic (see Figure C.14).

Figure C.14: Look And Feel Styles

3. When you have selected your option (for instance CDE/ Motif), the application appearance will

be modified (see Figure C.15).

C.2.2 Selecting the type of user

1. Select the Type Of User item in the Configuration element of the menu bar (see Figure C.16).

2. A dialog will be displayed. Select the type of user you are and accept (see Figure C.17). This

will be useful to define prior information in Bayesian regression.

129

C. User’s Guide

Figure C.15: New Look And Feel

Figure C.16: Type Of User Menu

Figure C.17: Select Type Of User

130

C. User’s Guide

C.3 Non Symbolic Regression

C.3.1 Simple Classical Regression

1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then,

select Simple Regression (see Figure C.18).

Figure C.18: Non-Symbolic Classical Regression Menu

2. You will be required to select the independent and dependent variables from the defined vari-

ables. Select them and go on (see Figure C.19).

Figure C.19: Select Non-Symbolic Variables in Simple Regression

3. A brief report will be displayed in the Classical Simple regression window indicating that for

more details see Analysis Options in the ToolBar (see Figure C.20).

4. From this point, you can:

(a) Change dependent and independent variables in the Variables Options, by selecting them

again as it was done before.

131

C. User’s Guide

Figure C.20: Brief Report

(b) Select tests and analysis in the Analysis Options, by clicking on the wanted analysis

options. The analysis options available are shown in Figure C.21.

Figure C.21: Analysis Options in Non-Symbolic Classical Simple Regression

To make new predictions, you will have to select the predict option and introduce the new

observed value and press OK (see Figure C.22).

(c) Select graphics in the Graphics Options, by clicking on the wanted graphics options. The

graphics options available are show in Figure C.28.

(d) Save some results in the Save Options, by clicking on the wanted save options and select-

ing the file where is going to be saved. The save options available are shown in Figure

C.24.

132

C. User’s Guide

Figure C.22: New Prediction in Non-Symbolic Classical Simple Regression

Figure C.23: Graphics options in Non-Symbolic Classical Simple Regression

C.3.2 Multiple Classical Regression

1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then,

select Multiple Regression (see Figure C.25).

2. You will be required to select the dependent and independent variables from the defined vari-

ables. Select them and go on (see Figure C.26).

133

C. User’s Guide

Figure C.24: Save options in Non-Symbolic Classical Simple Regression

Figure C.25: Non-Symbolic Classical Multiple Regression Menu

Figure C.26: Select Variables in Non-Symbolic Classical Multiple Regression

3. From this point a new Multiple Classical regression window is created, and the procedure is

similar to that described in Simple Classical Regression. Therefore the user is referenced to

that section to see how to select variable, analysis, graphics and save options.

(a) Available Analysis Options can be seen in Figure C.21.

There are two new analysis options: backward and forward selection. This will let you to

134

C. User’s Guide

Figure C.27: Analysis options in Non-Symbolic Classical Multiple Regression

know those independent variables that really influences into the dependent variable.

(b) Available Graphics Options are shown in Figure C.28.

Figure C.28: Graphics options in Non-Symbolic Classical Multiple Regression

(c) Available Save Options can be seen in Figure C.29.

4. You will be able to select if there is an intercept in the model or not by clicking on the Model

option (see Figure C.30).

135

C. User’s Guide

Figure C.29: Save options in Non-Symbolic Classical Multiple Regression

Figure C.30: Intercept in Non-Symbolic Classical Multiple Regression

C.3.3 Simple Bayesian Regression

1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then,


Figure C.31: Non-Symbolic Bayesian Simple Regression Menu


ables as it was done in Simple Classical Regression. Select them and go on (see Figure C.32).

3. A new Bayesian Simple Regression window will be created. The estimates mean and standard

deviation of the parameters will be displayed as well as the 95% highest posterior density

interval and the standard numerical error.

4. You will be able to select variable, analysis, graphics and save options as it was done in Simple

136

C. User’s Guide

Figure C.32: Select Variables in Non-Symbolic Bayesian Simple Regression

Classical Regression, although for Bayesian regression, these options are more limited. How-

ever, the procedure is the same.

(a) Available Analysis Options are shown in Figure C.33.

Figure C.33: Analysis Options in Non-Symbolic Bayesian Simple Regression

(b) Available Graphics Options can be seen in C.34.

(c) Available Save Options are shown in Figure C.35.

5. In Bayesian regression, new options are available in the ToolBar:

(a) Specifying Prior Information, by clicking on the Prior Information item in the ToolBar. A

new input dialog will be displayed, where you will be able to specify prior information.

If you have selected Experienced User in the Type Of User option in the Configuration

menu, you will see a dialog like that shown in Figure C.38.

137

C. User’s Guide

Figure C.34: Graphics Options in Non-Symbolic Bayesian Simple Regression

Figure C.35: Save Options in Non-Symbolic Bayesian Simple Regression

Figure C.36: Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regression

138

C. User’s Guide

Otherwise, you will see that shown in Figure C.37.

Figure C.37: Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression

(b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar.

Figure C.38: Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression

C.3.4 Multiple Bayesian Regression

1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then,


Figure C.39: Non-Symbolic Bayesian Multiple Regression menu


ables as it was done in Multiple Classical Regression. Select them and go on.

3. A new Bayesian Multiple regression will be created. From this point the procedure is the same

as in Bayesian Simple Regression.

(a) Analysis Options are shown in Figure C.40.

139

C. User’s Guide

Figure C.40: Analysis Options in Non-Symbolic Bayesian Multiple Regression

(b) Graphics options can be seen in Figure C.41.

Figure C.41: Graphics Options in Non-Symbolic Bayesian Multiple Regression

(c) Save Options are shown in Figure C.42.

Figure C.42: Save Options in Non-Symbolic Bayesian Multiple Regression

(d) Model Options are those shown in Figure C.43.

C.4 Symbolic Regression

C.4.1 Simple Classical Regression

1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select

Simple Regression (see Figure C.44).

140

C. User’s Guide

Figure C.43: Model Options in Non-Symbolic Bayesian Multiple Regression

Figure C.44: Symbolic Classical Simple Regression Menu

2. You will be required to select the minimum and maximum dependent and minimum and max-

imum independent variables from the defined variables. Select them and go on (see Figure

C.45).

Figure C.45: Select Variables in Symbolic Classical Simple Regression

3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case

for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other

one for the radii. In this case, there more graphics options.


(b) Graphics Options can be seen in Figure refGraphics SCSS.

141

C. User’s Guide

Figure C.46: Analysis Options in Symbolic Classical Simple Regression

Figure C.47: Graphics Options in Symbolic Classical Simple Regression

(c) Save Options are the same that in Non-Symbolic Regression were.

142

C. User’s Guide

C.4.2 Multiple Classical Regression

1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select

Multiple Regression (see Figure C.48).

Figure C.48: Symbolic Classical Multiple Regression Menu


imum independent variables from the defined variables (se Figure C.49). Ensure that the first

maximum independent variable selected is the adequate for the minimum independent variable

chosen.

Figure C.49: Select Variables in Symbolic Classical Multiple Regression

3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case

for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other

one for the radii. In this case, there more graphics options.


(b) Graphics Options can be seen in Figure C.51.

(c) Save Options are the same the same that were in Non-Symbolic Regression.

143

C. User’s Guide

Figure C.50: Analysis Options in Symbolic Classical Multiple Regression

Figure C.51: Graphics Options in Symbolic Classical Multiple Regression

C.4.3 Simple Bayesian Regression

1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select


2. You will be required to select the minimum and maximum dependent and minimum and maxi-

mum independent variables from the defined variables (see Figure C.53).

144

C. User’s Guide

Figure C.52: Symbolic Bayesian Simple Regression

Figure C.53: Select Variables in Symbolic Bayesian Simple Regression

3. A new Bayesian Simple Regression window will be created. The estimates mean and standard

deviation of the midpoints and radii parameters will be displayed as well as the 95% highest

posterior density interval and the standard numerical error.

4. You will be able to select variable, analysis, graphics and save options as it was done in Non-

Symbolic Regression.

(a) Available Analysis Options are shown in Figure C.54.

Figure C.54: Analysis Options in Symbolic Bayesian Simple Regression

(b) Available Graphics Options can be seen in Figure C.55.

145

C. User’s Guide

Figure C.55: Graphics Options in Symbolic Bayesian Simple Regression

(c) Save Options are the same that were in Non-Symbolic Regression.

5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in

the ToolBar:

(a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi

Prior Information item in the ToolBar. A new input dialog will be displayed, where you

will be able to specify prior information.

(b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar

(see Figure C.56).

C.4.4 Multiple Bayesian Regression

1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select



imum independent variables from the defined variables (see Figure C.58). Ensure that the first

146

C. User’s Guide

Figure C.56: Model Options in Symbolic Bayesian Simple Regression

Figure C.57: Symbolic Bayesian Multiple Regression Menu

maximum independent variable selected is the adequate for the minimum independent variable

chosen.

Figure C.58: Select Variables in Symbolic Bayesian Multiple Regression

3. A new Bayesian Multiple Regression window will be created. The estimates mean and standard

deviation of the midpoints and radii parameters will be displayed as well as the 95% highest

posterior density interval and the standard numerical error.

4. You will be able to select variable, analysis, graphics and save options as it was done in Non-

Symbolic Regression.

(a) Analysis Options are the same that were in Non-Symbolic Regression.

147

C. User’s Guide

(b) Graphics Options are shown in Figure C.59.

Figure C.59: Graphics Options in Symbolic Bayesian Multiple Regression

(c) Save Options are the same that were in Non-Symbolic Regression.

5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in

the ToolBar:

(a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi

Prior Information item in the ToolBar. A new input dialog will be displayed, where you

will be able to specify prior information.

(b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar.

148

Appendix D

Obtaining and Installing R

The way to obtain R is to download it from one of the CRAN (Comprehensive R Archive Network)

sites. The main site is http://cran.r-project.org. It has a number of mirror sites worldwide, which may

be closer to you and give faster download times.

Installation details tend to vary over time, so you should read the accompanying documents and

any other information offered on CRAN.

D.1 Binary distributions

The version for recent variants of Microsoft Windows comes as a single SetupR.exe file, on which

you simply double- click with the mouse and then follow the on- screen instructions. When the pro-

cess is completed, you will have an entry under Programs on the Start men for invoking R, as well as

a desktop icon.

For Linux distributions that use RPM package format (RedHat, Mandrake,LinuxRPC and SuSE)

and also for Alpha Unix (OSF/Tru64), .rpm files of R and the recommended add-on packages can be

installed using the rpm command. Packages for the Debian APT package manager are also available.

For the Macintosh platforms there are two different binary distributions: the ”Carbon” R and the

”Darwin” R. The first version is intended to run natively on MacOS System from 8.6 to OS X, and

the second one as a usual Unix command undex OS X. The Darwin R also requires an X windows

manager like XDarwin to use the X11 graphic device.

149

D. Obtaining and Installing R

Carbon R comes in single .sit archive file that you simply decompress by dragging the file onto

Stuffit Expander ad move the resulting folder rmxyz into your favourite applications folder. The Dar-

win version is a .tgz archive, which can be installed, after decompression, with some (fairly trivial)

manual adjustments.

Darwin R can also be installed using the ”fink”. Fink installs all dynamic libraries that might be

needed, and it can update R to newer versions when available.

D.2 Installation from source

Installation from source code is possible on all supported platforms, although nontrivial on Macintosh

and Windows, mainly because the build environment is not part of the system. On Unix-like systems

(Macintosh OS X included), the process can be as simple as unpacking the sources and writing

.\configure

make

make install

and then you would unpack the recommended package bundle, change to its directory and enter

R CMD INSTALL *.tar.gz

The above works on widely used platforms, provided that the relevant compilers and support li-

braries are installed. If your system is more esoteric or you want to use special compilers or libraries,

then you may need to dig deeper.

For Windows and Carbon Macintosh, the directories src/gnuwin32 and src/macintosh have IN-

STALL file with detailed information about the procedure to follow.

150

D. Obtaining and Installing R

D.3 Package installation

To install R packages such as bayesm under Unix/Linux or Windows, you can connect to the Internet,

start R, and enter

install.packages(”bayesm”,.libPaths()[1])

The Windows version provides a convenient menu interface for the operation.

If your R machine is not connected to the Internet, you can also download the package the pack-

age as a file and install that. For Windows and the Carbon version of Macintosh, you need to get

the binary package (.zip or .sit extension). For Windows, installation from a local .zip file is possible

via a menu entry. For Macintosh users, the procedure is described in the Macintosh FAQ. For Unix

and Linux, you can issue the following at the shell prompt (the -l option allows you to give a private

library):

R CMD INSTALL bayesm

On Unix and Linux systems you will need superuser permissions to install. Otherwise you can

set up a private library directory and install into that. Use the R LIBS environment variable to use

your private library subsequently. A similar issue arises if R is installed on a read-only file system in

a Windows environment. Further details can be found in the help page for library.

Information and further Internet resources for R can be obtained from CRAN and the R homepage

at http://www.r-project.org. Notice in particular the mailing lists, the user-contributed documents, and

the FAQs.

151

Appendix E

Obtaining and installing Java RuntimeEnvironment

The way to obtain the Java Runtime Enviroment (JRE) is to download it from Sun Microsystems offi-

cial site. The main site is http://java.sun.com, from where you can select the version to be downloaded.

The link to download the current version, which is J2SE v1.4.2 14 JRE, is http://java.sun.com-

/j2se/1.4.2/download.html.

E.1 Microsoft Windows

You must have administrative permissions in order to install the Java 2 Runtime Environment on Mi-

crosoft Windows 2000 and XP. The download page provides the following two choices of installation.

Continue based on your choice.

1. Windows Installation- After clicking the ”Download” link for the JRE, a dialog box pops up.

Choose the open option to start a small program which then prompts you for more information

about what you want to install.

2. Windows Offline Installation- After clicking the JRE ”Download” link for the ”Windows Of-

fline Installation”, a dialog box pops up. Choose the save option to save the downloaded file

without installing it. Run this file by double-clicking on the installer’s icon. Then follow the

instructions that the installer provides. When done with the installation, you can delete the

downloaded file to recover disk space.

152

E. Obtaining and installing Java Runtime Environment

E.2 Linux

Java 2 Runtime Environment 1.4.2 is available in two installation formats:

1. Self-extracting Binary File - This file can be used to install the Java 2 Runtime Environment in

a location chosen by the user. This one can be installed by anyone (not only root users), and

it can easily be installed in any location. As long as you are not root user, it cannot displace

the system version of the Java platform supplied by Linux. To use this file, see Installation of

Self-Extracting Binary below.

2. RPM Packages - A rpm.bin file which contains RPM packages, installed with the rpm utility.

It requires root access to install, and installs by default in a location that replaces the system

version of the Java platform supplied by Linux. To use this bundle, see Installation of RPM

File below.

Choose the install format that is most suitable to your needs.

E.2.1 Installation of Self-Extracting Binary

Use these instructions if you want to use the self-extracting binary file to install the Java 2 Runtime

Environment. If you want to install RPM packages instead, see Installation of RPM File.

1. Download and check the download file size to ensure that you have downloaded the full, uncor-

rupted software bundle. You can download to any directory you choose; it does not have to be

the directory where you want to install the Java 2 Runtime Environment. Before you download

the file, notice its byte size provided on the download page on the web site. Once the download

has completed, compare that file size to the size of the downloaded file to make sure they are

equal.

2. Make sure that execute permissions are set on the self-extracting binary. Run this command:

chmod +x j2re-1\ 4\ 2\ 14-linux-i586.bin.

3. Change directory to the location where you would like the files to be installed. The next step

installs the Java 2 Runtime Environment into the current directory.

4. Run the self-extracting binary. Execute the downloaded file, prep ended by the path to it. For

example, if the file is in the current directory, prep end it with ”./” (necessary if ”.” is not in the

153


PATH environment variable):

./j2re-1 4 2 14-linux-i586.bin

The binary code license is displayed, and you are prompted to agree to its terms. The Java

2 Runtime Environment files are installed in a directory called j2re1.4.2\ 14 in the current

directory.

E.2.2 Installation of RPM File

Use these instructions if you want to install Java 2 Runtime Environment in the form of RPM pack-

ages. If you want to use the self-extracting binary file instead, see Installation of Self-Extracting

Binary.

1. Download and check the file size. You can download to any directory you choose. Before you

download the file, notice its byte size provided on the download page on the web site. Once the

download has completed, compare that file size to the size of the downloaded file to make sure

they are equal.

2. Extract the contents of the downloaded file. Change directory to where the downloaded file is

located and run these commands to first set the executable permissions and then run the binary

to extract the RPM file:

chmod a+x j2re-1\ 4\ 2\ 14-linux-i586-rpm.bin

./j2re-1\ 4\ 2\ 14-linux-i586-rpm.bin

Note that the initial ”./” is required if you do not have ”.” in your PATH environment variable.

The script displays a binary license agreement, which you are asked to agree to before installation

can proceed. Once you have agreed to the license, the install script creates the file j2re-1\ 4\ 2\ 14-

linux-i586.rpm in the current directory.

1. Become root by running the su command and entering the super-user password.

154


2. Run the rpm command to install the packages that comprise the Java 2 Runtime Environment:

rpm -iv j2re-1\ 4\ 2\ 14-linux-i586.rpm

3. Delete the bin and rpm file if you want to save disk space.

4. Exit the root shell.

E.3 UNIX

1. Check the download file size. You can download to any directory you choose; it does not have

to be the directory where you want to install the J2RE. Before you download the file, notice its

byte size provided on the download page on the web site. Once the download has completed,

compare that file size to the size of the downloaded file to make sure they are equal.

2. Make sure that execute permissions are set on the self-extracting binary:

On SPARC processors: chmod +x j2re-1\ 4\ 2\ 14-solaris-sparc.sh

On x86 processors: chmod +x j2re-1\ 4\ 2\ 14-solaris-i586.sh

3. Change directory to the location where you would like the files to be installed. The next step

installs the J2RE into the current directory.

4. Run the self-extracting binary. Execute the downloaded file, prep ending the path to it. For

example, if the downloaded file is in the current directory, prep end it with ”./”:

On SPARC processors: ./j2re-1\ 4\ 2\ 14-solaris-sparc.sh

On x86 processors: ./j2re-1\ 4\ 2\ 14-solaris-i586.sh

The binary code license is displayed, and you are prompted to agree to its terms. The J2RE

files are installed in a directory called j2re1.4.2\ 14 in the current directory.

155


More information about installation process on different kinds of operating systems can be found

in the Sun Microsystems official site which has been mentioned above.

156

Bibliography

[Aitk97] Aitkin, M., The calibration of P-values, posterior Bayes factors and the AIC from the pos-

terior distribution of the likelihood, Statistics and Computing 7 (4), 253-261. 1997

[Arro06] Arroyo, J. and Mate, C., Introducing interval time series: accuracy measures, COMPSTAT,

Rome 2006.

[Berg05] Berg, B.A., Introduction to Markov Chain Monte Carlo Simulations and their Statistical

Analysis, NATIONAL UNIVERSITY OF SINGAPORE 7. 2005

[Berg98] Berger, J. and Pericchi, L., Accurate and stable Bayesian model selection: the median

intrinsic Bayes Factor, The Indian Journal Of Statistics 60 (1), 1-18. 1998

[Bill00] Billard, L. and Diday, E., Regression Analysis for Interval-Valued Data, Data Analysis,

Classification and Related Methods: Proceedings of the Seventh Conference of the International

Federation of Classification Societies, Namur, Belgium 2000.

[Bill02] Billard, L. and Diday, E., From the Statistics of Data to the Statistics of Knowledge: Sym-

bolic Data Analysis, Journal of the American Statistical Association 98 (462), 470-487. 2002.

[Bill06a] Billard, L. and Diday, E., Symbolic Data Analysis: Conceptual Statistics and Data Mining,

Wiley ,England 2006.

[Bill06b] Billard, L. and Diday, E., Symbolic Data Analysis: what is it?, COMPSTAT, Rome 2006.

[Cham83] Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, P.A., Graphics Methods for Data

Analysis, Wadsworth, 1983.

[Cham92] Chambers, J.M. and Hastie, T.J., Statistical Models in S, Hall/CRC, 1992.

[Chen00] Chen, M., Shao, Q. and Ibrahim, J.G., Monte Carlo Methods in Bayesian Computation,

Springer, New York 2000.

157

[Chen03] Cheng, R. and Sahu, S., A fast distance based approach for determining the number of

components in mixtures, Canadian Journal of Statistics 31, 3-22, 2003.

[Cong06] Congdon, P., Bayesian Statistical Modelling, Wiley, England 2006.

[Dalg02] Dalgaard, P., Introductory Statistics with R, Springer, New York 2002.

[DeCa04] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. A New Method to Fit a Linear

Regression Model for Interval-Valued Data, KI 2004: Advances in Artificial Intelligence: 27th

Annual German Conference in AI, 295-306, Springer, Ulm, Germany, 2004.

[DeCa05] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. Applying Constrained Linear Re-

gression Models to Predict Interval-Valued Data , KI 2005: Advances in Artificial Intelligence

3698, 92-106, Springer, Koblenz, Germany 2005.

[DeCa07] De Carvalho, F.A.T. and Lima Neto, E.A., Centre and Range method for fitting a linear

regression model to symbolic interval data, Computational Statistics and Data Analysis, 2007.

[Dida95] Diday, E., Probabilist, Possibilist and Belief Objects for Knowledge Analysis, Annals of

Operations Research, 55, 227-276, 1995.

[Gelf90] Gelfand, A.E. and Smith, A.F.M., Sampling-based approaches to calculating marginal den-

sities, Journal of the American Statistical Association 85, 398-409, 1990.

[Gelm04] Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B., Bayesian Data Analysis, Hall/CRC,

Boca Raton, Florida 2004.

[Gilk95] Gilks, W.R., Best, N. and Tan, K.K.C., Adaptive rejection Metropolis sampling within Gibbs

sampling, Applied Statistics 44, 455-472, 1995.

[Gosh03] Gosh, J.K. and Ramamoorthi, R.V., Bayesian Nonparametrics, Spriger, New York 2003.

[Hast70] Hastings, W.K., Monte Carlo sampling methods using Markov chains and their applica-

tions, Biometrika 57, 97-109, 1970.

[Huiw06] Huiwen, W., Mok, H.M.K. and Dapeng, L., Factor interval data analysis and its applica-

tion, COMPSTAT, Rome 2006.

[Irpi05] Irpino, A., ”Spaghetti” PCA analysis: An extension of principal component analysis to time

dependent interval data, Pattern Recognition Letters, 2005.

158

[Jeff61] Jeffreys, H., Theory of Probability, Oxford University Press, 1961.

[Kend05] Kendall, W. S., Liang, F. and Wang, J-S., Markov chain Monte Carlo: Innovations and

Applications, National University of Singapore 7, 2005.

[Koop03] Koop, G., Bayesian Econometrics, Wiley, England 2003.

[Laws74] Lawson, C.l. and Hanson, R.J., Solving Least Squares Problem, Prentice-Hall, New York

1974.

[Lee 06] Lee, C-H.L., Liu, A. and Chen, W-S., Pattern Discovery of Fuzzy Time Series for Financial

Prediction, IEEE 18, (5), 2006.

[Mart01] Martinez, W.L. and Martinez, A.R., Computational Statistics Handbook with MATLAB ,

Hall/CRC, Boca Raton, Florida 2001.

[Mate93] Mate, C. and Sarabia, A., Problemas de Probabilidad y Estadıstica, CLAGSA, Madrid

1993.

[Mate95] Mate, C., Curso General sobre StatGraphics II, Universidad Pontifica Comillas, Madrid

1995.

[Mate06] Mate, C., Analisis Bayesiano de Datos, Asociacion Espaola para la Calidad, Madrid 2006.

[Metr53] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E., Equation

of state calculations by fast computing machines, Journal of Chemical Physics 21, 1087-1092,

1953.

[Mont02] Montgomery, D.C. and Runger, G.C., Probabilidad y Estadıstica Aplicadas a la Inge-

nierıa, Wiley, 2002.

[Mull04] Muller, P. and Quintana, F.A., Nonparametric Bayesian Data Analysis, Statistical Science

19, 95-110, 2004.

[Poir95] Poirier, D., Intermediate Statistics and Econometrics: A Comparative Approach., The MIT

Press, Cambridge 1995.

[Rossi06] Rossi, P.E., Allenby, G. and McCulloch, R., Bayesian Statistics and Marketing, Wiley,

New York 2006.

159

[Rupp04] Rupp, A.A., Dey, D.K. and Zumbo, B.D., To Bayes or Not to Bayes, From Whether to

When: Applications of Bayesian Methodology to Modeling, Structural Equation Modeling: A

Multidisciplinary Journal 11 (3), 424-451. 2004.

[Spie03] Spiegelhalter, D., Thomas, A., Best, N., Gilks, W. and Lunn, D., BUGS: Bayesian inference

using Gibbs sampling, 2003.

[Urba92] Urbach, P., Regression Analysis: Classical and Bayesian , The British Journal for the Phi-

losophy of Science 43 (3), 311-342, 1992.

[Vena02] Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S, Springer, New York

2002.

[West04] West, R.W., Wu,T. and Heydt, D., An introduction to StatCrunch 3.0, Journal of Statistical

Software 9 (6), 2004.

[Zamo01] Zamora, MM. and Estavillo, J., Modelo de regresion normal clasico, 2001.

160

Bayesian Regression System for Interval-valued data

Documents

Transcript of Bayesian Regression System for Interval-valued data