Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy &...

20
Overview G. Jogesh Babu

Transcript of Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy &...

Page 1: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Overview

G. Jogesh Babu

Page 2: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Overview of Astrostatistics

A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots in astronomy (starting with Hipparchus in 4th c. BC) Relevance of statistics in astronomy todayState of astrostatistics todayMethodological challenges for astrostatistics in 2000s

Page 3: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Descriptive Statistics

Introduction to R programming language, an integrated suite of software facilities for data manipulation, calculation and graphical display. Descriptive statistics helps in extracting the basic features of data & provide summaries about the sample and the measures. Commonly used techniques such as, graphical description, tabular description, and summary statistics, are illustrated through R.

Page 4: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Exploratory Data Analysis

An approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to:– maximize insight into a data set– uncover underlying structure– extract important variables– detect outliers and anomalies– formulate hypotheses worth testing– develop parsimonious models– provide a basis for further data collection through

surveys or experiments

Page 5: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Probability theory

Conditional probability & Bayes theorem (Bayesian analysis)Expectation, variance, standard deviation (units free estimates)density of a continuous random variable (as opposed to density defined in physics)Normal (Gaussian) distribution, Chi-square distribution (not Chi-square statistic)Probability inequalities and the CLT

Page 6: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Correlation & Regression

Correlation coefficient

Underlying principles of linear and multiple linear regression

Least squares estimation

Ridge regression

Principal components

Page 7: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Linear regression issues in astronomy

Compares different regression lines used in astronomy

Illustrates them with Faber-Jackson relation.

Page 8: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Statistical Inference

While Descriptive Statistics provides tools to describe what the data shows, the statistical inference helps in reaching conclusions that extend beyond the immediate data alone. Statistical inference helps in making judgments of an observed difference between groups is a dependable one or one that might have happened by chance in a study. Topics to be covered include:– Point estimation– Confidence intervals for unknown parameters– Principles of testing of hypotheses

Page 9: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Maximum Likelihood Estimation

Likelihood - differs from that of a probability– Probability refers to the occurrence of future events

– while a likelihood refers to past events with known outcomes

MLE is used for fitting a mathematical model to data.

Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit.

Page 10: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

MLE Contd.

Thomas Hettmansperger's lecture includes: – Maximum likelihood method for linear regression,

an alternative to least squares method– Cramer-Rao inequality, which sets a lower bound

on the error (variance) of an estimator of parameter. It helps in finding the `best' estimator.

Analysis of data from two or more different populations involve mixture models. – The likelihood calculations are difficult, so an

iterative device called EM algorithm will be introduced. Computations are illustrated in the Lab

Page 11: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Nonparametric Statistics

These statistical procedures make no assumptions about the probability distributions of the population.

The model structure is not specified a priori but is instead determined from data.

As non-parametric methods make fewer assumptions, their applicability is much wider

Procedures described include:– Sign test– Mann-Whitney two sample test – Kruskal-Wallis test for comparing several samples

Page 12: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Bayesian Inference

As evidence accumulates, the degree of belief in a hypothesis ought to changeBayesian inference takes prior knowledge into account The quality of Bayesian analysis depends on how best one can convert the prior information into mathematical prior probabilityTom Loredo describes methods for parameter estimation, model assessment etcIllustrates with examples from astronomy

Page 13: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Multivariate analysis

Analysis of data on two or more attributes (variables) that may depend on each other– Principle components analysis, to reduce the

number of variables– Canonical correlation– Tests of hypotheses– Confidence regions – Multivariate regression– Discriminant analysis (supervised learning).

Computational aspects are covered in the lab

Page 14: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Bootstrap

How to get most out of repeated use of the data. Bootstrap is similar to Monte Carlo method but the `simulation' is carried out from the data itself. A very general, mostly non-parametric procedure, and is widely applicable. Applications to regression, cases where the procedure fails, and where it outperforms traditional procedures will be also discussed

Page 15: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Goodness of Fit

Curve (model) fitting or goodness of fit using bootstrap procedure. Procedure like Kolmogorov-Smirnov does not work in multidimensional case, or when the parameters of the curve are estimated. Bootstrap comes to rescueSome of these procedures are illustrated using R in a lab session on Hypothesis testing and bootstrapping

Page 16: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Model selection, evaluation, and likelihood ratio tests

The model selection procedures covered include:

Chi-square test

Rao's score test

Likelihood ratio test

Cross validation

Page 17: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Time Series & Stochastic Processes

Time domain procedures

State space models

Kernel smoothing

Poisson processes

Spectral methods for inference

A brief discussion of Kalman filter

Illustrations with examples from astronomy

Page 18: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Monte Carlo Markov Chain

MCMC methods are a collection of techniques that use pseudo-random (computer simulated) values to estimate solutions to mathematical problems

MCMC for Bayesian inference

Illustration of MCMC for the evaluation of expectations with respect to a distribution

MCMC for estimation of maxima or minima of functions

MCMC procedures are successfully used in the search for extra-solar planets

Page 19: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Spatial Statistics

Spatial point processes

Intensity function

Homogeneous and inhomogeneous Poisson processes

Estimation of Ripley's K function (useful for point pattern analysis).

Page 20: Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Cluster Analysis

Data mining techniques

Classifying data into clusters – k-means

– Model clustering

– Single linkage (friends of friends)

– Complete linkage clustering algorithm