1 A(n) (extremely) brief/crude introduction to minimum description length principle jdu 2006-04.

A(n) (extremely) brief/crude introduction to minimum description length princ

iplejdu

2006-04

Outline

• Conceptual/non-technical introduction

• Probabilities and Codelengths• Crude MDL• Refined MDL• Other topics

Outline

Introduction

• Example: data compression– Description methods

Source: Grnwald et al. (2005) Advances in Minimum Description Length: Theory and Applications.

Introduction

• Example: regression– Model selection and overfitting– Complexity of the model vs. Goodness of fit

Introduction

• Models vs. Hypotheses

Introduction

• Crude 2-part version of MDL

Outline

Probabilities and Codelengths• Let X be a finite or countable set

– A code C(x) for X• 1-to-1 mapping from X to Un>0{0,1}n

• LC(x): number of bits needed to encode x using C

– P: probability distribution defined on X• P(x): the probability of x• A sequence of (usually iid) observations x1, x2,

…, xn: xn

Probabilities and Codelengths• Prefix codes: as examples of uniquely

decodable codes– no code word is a prefix of any other

c 1011

d 1010

Source: http://www.cs.princeton.edu/courses/archive/spring04/cos126/

Probabilities and Codelengths• Expected codelength of a code C

– Lower bound:

• Optimal code– if it has minimum expected codelength over all un

iquely decodable codes– How to design one given P?

• Huffman coding

CCP xLxPxLE )()())((

xPxPxH )(log)()( 2

Probabilities and Codelengths• Huffman coding

Source: http://star.itc.it/caprile/teaching/algebra-superiore-2001/

Probabilities and Codelengths• How to design code for {1, 2, …, M}?

– Assuming a uniform distribution: 1/M for each number

– ~logM bits

Probabilities and Codelengths• How to design code for all the

positive integers?– For each k

• Describe it with 0s • Followed by a 1• Then encode k using the uniform code for• In total, ~ 2logk + 1 bits

– Can be refined…

Probabilities and Codelengths• Let P be a probability distribution over X,

then there exists a code C for X such that:

• Let C be a uniquely decodable code over X, then there exists a probability distribution P such that:

)(log)( xPxLC

)(log)( nnC xPxL

Probabilities and Codelengths• Codelength revisited

Outline

Crude MDL

• Preliminary: k-th order Markov chain on X={0,1}– A sequence: X1, X2, …, XN

– Special case: 0-th order: Bernoulli model (biased coin)

• Maximum Likelihood estimator

Crude MDL

• Preliminary: k-th order Markov chain on X={0,1}– Special case: first order Markov chain B(1)

• MLE

Crude MDL

• Preliminary: k-th order Markov chain on X={0,1}– 2k parameters

– Log likelihood function: …– MLE: …

Crude MDL

• Question: Given data D=xn, find the Markov chain that best explains D.– We do not want to restrict ourselves to cha

ins of fixed order• How to avoid overfitting?• Obviously, an (n-1)-th order Markov model wo

uld always fit the data the best

Crude MDL

• two-part MDL revisited

Crude MDL

• Description length of data given hypothesis

Crude MDL

• Description length of hypothesis– The code should not change with the

sample size n.– Different codes will lead to preferences

of different hypotheses– How to design a code that

• Leads to good inferences with small, practically relevant sample sizes?

Crude MDL

• An ``intuitive” and ``reasonable” code for k-th order Markov chain– First describe k using 2logk+1 bits– Then describe the d=2k parameters

• Assume n is given in advance– For each theta in the MLE {theta[1|000…000], …, theta[1|111

…111]}, the best precision we can achieve by counting is 1/(n+1)

– Describe each theta with log(n+1) bits– L(H)=2logk+1+dlog(n+1)– L(H)+L(D|H) = 2logk+1+dlog(n+1) – logP(D|k, theta)– For a given k, only the MLE theta need to be consi

Crude MDL

• Good news– We have found a principled manner to

encode data D using H

• Bad news– We have not found clear guidelines to

design codes for H

Outline

• Probabilities and Codelengths• Crude MDL• Refined MDL• Other issues

Refined MDL

• Universal codes and universal distributions– maximum likelihood code depends on the

data• How to describe the data in an unambiguous

manner?– Design a code such that for every possible

observation, its codelength corresponds to its ML? - impossible

Refined MDL

• Worst-case regret

• Optimal universal model

Refined MDL

• Normalized maximum likelihood (NML)

• Minimizing -logNML

Refined MDL

• Complexity of a model

– The more sequences that can be fit well by an element of M, the larger M’s complexity

– Would it lead to a ``right” balance between complexity and fit?• Hopefully…

Refined MDL

• General refined MDL

Outline

1 A(n) (extremely) brief/crude introduction to minimum description length principle jdu 2006-04.

Documents

Transcript of 1 A(n) (extremely) brief/crude introduction to minimum description length principle jdu 2006-04.

Extremely fast. Extremely precise. Extremely small shanks. · GÜHRING - youR wo Wo woRld-wld-Wld-wIde PARTNde Pade PARTNRtNeReR Extremely fast. Extremely precise. Extremely small

Alberta’s Oil Sands CGC1P. The Oil Sands AKA Tar Sands Large deposits of bitumen (extremely heavy crude oil) –A mix of crude bitumen (semi-solid oil),

JDU-RJD-Congress 243 Candidates List for Bihar Assembly ...€¦ · JDU-RJD-Congress 243 Candidates List for Bihar Assembly Elections 2015. Title: pdf Created Date: 9/23/2015 4:35:33

Crude Oil Package VP VISION Crude Oil Package

Crude Oils

Crude Primer

Cessna C177RG HA-JDU - CAVOK Aviation Training · Cessna C177RG . HA-JDU . 42. 21. 40. Modifications due to different PROP TR 839 AOM modification I. Limitations: Propeller: McCauley

India Crude Tanker Report Indian crude tanker Indian Shipping

Crude Historical

Décompensa+on,de,BPCO,smurbmpm.fr/wp-content/uploads/videos/jdu/2016-2017/jdu-dec-pneumo/ppt/... · Cadre!Nosologique!des! Pathologies!Obstruc5ves!Pulmonaires! Bronchite Chronique

Detroit in PERSPECTIVE - UTA · Detroit in Perspective were crude and "backward." the town's streets were nearly always extremely muddy, and mail, when it arrived at all, was erratic.

Crude Oil (Sour) - rivieraresourcesinc.comrivieraresourcesinc.com/.../Crude-Oil-Sour...Texas.pdf · Product Name Crude Oil (Sour) Synonyms Crude; Petroleum; Petroleum Oil; Rock Oil

beroNet Telephony Appliance 2 - portal.partnernet-ict.com · beroNet GmbH. ZZZ SDUWQHUQHW LFW FRP TBMFT!QBSUOFSOFU JDU DPN Specifications Sample Module Configurations » Intel Celeron

Crude Distillation

RAGLAND Guaranteed Analysis: PROTEIN SUPPLEMENTSraglandmills.com/pdfs/ragland-mills-protein-supplements.pdfRAGLAND Recommended Crude Crude Crude Guaranteed Analysis: PROTEIN Consumption

Crude Distil

Crude Oil Transportation: Nigerian Niger Delta Waxy Crude

Crude Oil - 0104.nccdn.net0104.nccdn.net/1_5/06e/17d/2cd/Crude-Oil-Forecast--Markets-and... · Crude Oil Forecast, Markets & Transportation i EXECUTIVE SUMMARY The Canadian crude

Human DNA Fingerprinting by Polymerase Chain …...DNA Fingerprinting 5 and bloodless. The cell lysate obtained by boiling cheek cells is extremely crude in biochemical terms — it

Sustainable Roohsing&JDU