Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya...

Post on 28-Dec-2015

214 views 0 download

Transcript of Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya...

Positive and Negative Randomness

Paul Vitanyi CWI, University of Amsterdam

Joint work with Kolya Vereshchagin

Non-Probabilistic Statistics

Classic Statistics--Recalled

Probabilistic Sufficient Statistic

Kolmogorov complexity

K(x)= length of shortest description of x K(x|y)=length of shortest description of x

given y.

A string is random if K(x) ≥ |x|.

K(x)-K(x|y) is information y knows about x. Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)

Randomness Deficiency

Algorithmic Sufficient Statistic where model is a set

Algorithmic suficient statistic where model is a total computable function

Data is binary string x;Model is a total computable function p ;Prefix complexity is K(p) (size smallest TM computing p);Data-to-model code length l_x(p)=min_d {|d|:p(d)=x.

x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small.p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p.

p is minimal ss (sophistication) for x if K(p) minimal.

Graph Structure Function

h_x(α)

α

log |S| Lower boundh_x(α)=K(x)-α

Minimum Description Length estimator, Relations between estimators

Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}.

MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}.

Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.

Individual characteristics: More detail, especially for meaningful (nonrandom) Data

We flip the graph so that log|.|is on the x-axis and K(.) is on the y-axis. This is essentally theRate-distortion graph for list (set)distortion.

Primogeniture of ML/MDL estimators

•ML/MDL estimators can be approximatedfrom above;•Best-fit estimator cannot be approximatedEither from above or below, up to anyPrecision.•But the approximable ML/MDL estimatorsyield the best-fitting models, even thoughwe don’t know the quantity of goodness-of-fit ML/MDL estimators implicitlyoptimize goodness-of-fit.

Positive- and Negative Randomness,

and Probabilistic Models

Precision of following given function h(α)

h(α)

d

h_x(α)

Model cost α

Data-to-Model cost log |S|

Logarithmic precision is sharp

Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those arethe strings of high complexity K(x) > n.

For strings of low complexity, say K(x)< n/2,The number of appropriate functions is muchgreater than the number of strings. Hence there cannot be a string for every such function. Butwe show that there is a string for every approximate shape of function.

All degrees of neg. randomness

Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n(up to a log term)

Proof. All shapes of the structure function are possible, as long as it starts from n-k anddecreases monotonicallyand is 0 at k for some k ≤ n.(Up to the precision in the previous slide).

Are there natural examples of negative randomness

Question: Are there natural examples of strings ofwith large negative randomness. Kolmogorov didn’tThink they exist, but we know the are abundant..

Maybe information distance between strings xand y yields large negative randomness.

Information Distance:

• Information Distance (Li, Vitanyi, 96; Bennett,Gacs,Li,Vitanyi,Zurek, 98)

D(x,y) = min { |p|: p(x)=y & p(y)=x}

Binary program for a Universal Computer(Lisp, Java, C, Universal Turing Machine)

Theorem (i) D(x,y) = max {K(x|y),K(y|x)}

Kolmogorov complexity of x given y, definedas length of shortest binary ptogram thatoutputs x on input y.

(ii) D(x,y) ≤D’(x,y) Any computable distance satisfying ∑2 --D’(x,y) y for every x.

≤ 1

(iii) D(x,y) is a metric.

Not between random strings

• The information distance between random strings x and y of length n doesn’t work.

• If x,y satisfy K(x|y),K(y|x) > n then

p=x XOR y where XOR means bitwise

exclusive-or serves as a program to

translate x too y and y to x. But if x and y

are positively random it appears that p

is so too.

T

Selected BibliographyN.K. Vereshchagin, P.M.B. Vitanyi, A theory of lossy compression of individual data, http://arxiv.org/abs/cs.IT/0411014, Submitted. P.D. Grunwald, P.M.B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N.K. Vereshchagin and P.M.B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50:12(2004), 3265- 3290. P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), 2443-2463. Q. Gao, M. Li and P.M.B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121:1-2(2000), 1--29. P.M.B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000), 446--464.