Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya...
-
Upload
kory-gilmore -
Category
Documents
-
view
214 -
download
0
Transcript of Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya...
Positive and Negative Randomness
Paul Vitanyi CWI, University of Amsterdam
Joint work with Kolya Vereshchagin
Non-Probabilistic Statistics
Classic Statistics--Recalled
Probabilistic Sufficient Statistic
Kolmogorov complexity
K(x)= length of shortest description of x K(x|y)=length of shortest description of x
given y.
A string is random if K(x) ≥ |x|.
K(x)-K(x|y) is information y knows about x. Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)
Randomness Deficiency
Algorithmic Sufficient Statistic where model is a set
Algorithmic suficient statistic where model is a total computable function
Data is binary string x;Model is a total computable function p ;Prefix complexity is K(p) (size smallest TM computing p);Data-to-model code length l_x(p)=min_d {|d|:p(d)=x.
x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small.p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p.
p is minimal ss (sophistication) for x if K(p) minimal.
Graph Structure Function
h_x(α)
α
log |S| Lower boundh_x(α)=K(x)-α
Minimum Description Length estimator, Relations between estimators
Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}.
MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}.
Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.
Individual characteristics: More detail, especially for meaningful (nonrandom) Data
We flip the graph so that log|.|is on the x-axis and K(.) is on the y-axis. This is essentally theRate-distortion graph for list (set)distortion.
Primogeniture of ML/MDL estimators
•ML/MDL estimators can be approximatedfrom above;•Best-fit estimator cannot be approximatedEither from above or below, up to anyPrecision.•But the approximable ML/MDL estimatorsyield the best-fitting models, even thoughwe don’t know the quantity of goodness-of-fit ML/MDL estimators implicitlyoptimize goodness-of-fit.
Positive- and Negative Randomness,
and Probabilistic Models
Precision of following given function h(α)
h(α)
d
h_x(α)
Model cost α
Data-to-Model cost log |S|
Logarithmic precision is sharp
Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those arethe strings of high complexity K(x) > n.
For strings of low complexity, say K(x)< n/2,The number of appropriate functions is muchgreater than the number of strings. Hence there cannot be a string for every such function. Butwe show that there is a string for every approximate shape of function.
All degrees of neg. randomness
Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n(up to a log term)
Proof. All shapes of the structure function are possible, as long as it starts from n-k anddecreases monotonicallyand is 0 at k for some k ≤ n.(Up to the precision in the previous slide).
Are there natural examples of negative randomness
Question: Are there natural examples of strings ofwith large negative randomness. Kolmogorov didn’tThink they exist, but we know the are abundant..
Maybe information distance between strings xand y yields large negative randomness.
Information Distance:
• Information Distance (Li, Vitanyi, 96; Bennett,Gacs,Li,Vitanyi,Zurek, 98)
D(x,y) = min { |p|: p(x)=y & p(y)=x}
Binary program for a Universal Computer(Lisp, Java, C, Universal Turing Machine)
Theorem (i) D(x,y) = max {K(x|y),K(y|x)}
Kolmogorov complexity of x given y, definedas length of shortest binary ptogram thatoutputs x on input y.
(ii) D(x,y) ≤D’(x,y) Any computable distance satisfying ∑2 --D’(x,y) y for every x.
≤ 1
(iii) D(x,y) is a metric.
Not between random strings
• The information distance between random strings x and y of length n doesn’t work.
• If x,y satisfy K(x|y),K(y|x) > n then
p=x XOR y where XOR means bitwise
exclusive-or serves as a program to
translate x too y and y to x. But if x and y
are positively random it appears that p
is so too.
T
Selected BibliographyN.K. Vereshchagin, P.M.B. Vitanyi, A theory of lossy compression of individual data, http://arxiv.org/abs/cs.IT/0411014, Submitted. P.D. Grunwald, P.M.B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N.K. Vereshchagin and P.M.B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50:12(2004), 3265- 3290. P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), 2443-2463. Q. Gao, M. Li and P.M.B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121:1-2(2000), 1--29. P.M.B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000), 446--464.