Download - Compressive Sensing - EURASIP

Richard Baraniuk

Rice University

CompressiveSensing

Acknowledgements

• For assistance preparing this presentation

• Rice DSP group– Petros Boufounos, Volkan Cevher– Mark Davenport, Marco Duarte, Chinmay Hegde,

Jason Laska, Shri Sarvotham, …

• Mike Wakin, University of Michigan– geometry of CS, embeddings

• Justin Romberg, Georgia Tech– optimization algorithms

Agenda

• Introduction to Compressive Sensing (CS) – motivation– basic concepts

• CS Theoretical Foundation– geometry of sparse and compressible signals– coded acquisition– restricted isometry property (RIP)– signal recovery

• CS Applications

• Related concepts and work

Pressure is on Digital Sensors

• Success of digital data acquisition is placing increasing pressure on signal/image processing hardware and software to support

higher resolution / denser sampling» ADCs, cameras, imaging systems, microarrays, …

xlarge numbers of sensors

» image data bases, camera arrays, distributed wireless sensor networks, …

xincreasing numbers of modalities

» acoustic, RF, visual, IR, UV, x-ray, gamma ray, …

Pressure is on Digital Sensors

• Success of digital data acquisition is placing increasing pressure on signal/image processing hardware and software to support

higher resolution / denser sampling» ADCs, cameras, imaging systems, microarrays, …

xlarge numbers of sensors

» image data bases, camera arrays, distributed wireless sensor networks, …

xincreasing numbers of modalities

» acoustic, RF, visual, IR, UV

deluge of datadeluge of data» how to acquire, store,

fuse, process efficiently?

Digital Data Acquisition

• Foundation: Shannon sampling theorem“if you sample densely enough(at the Nyquist rate), you can perfectly reconstruct the original analog data”

time space

Sensing by Sampling

• Long-established paradigm for digital data acquisition– uniformly sample data at Nyquist rate (2x Fourier bandwidth)

sample toomuchdata!

Sensing by Sampling

• Long-established paradigm for digital data acquisition– uniformly sample data at Nyquist rate (2x Fourier bandwidth)

– compress data

compress transmit/store

receive decompress

sample

JPEGJPEG2000

…

Sparsity / Compressibility

pixels largewaveletcoefficients

(blue = 0)

widebandsignalsamples

largeGabor (TF)coefficients

time

freq

uen

cy

Sample / Compress


receive decompress

sample

sparse /compressiblewavelettransform

• Long-established paradigm for digital data acquisition– uniformly sample data at Nyquist rate – compress data

What’s Wrong with this Picture?

• Why go to all the work to acquire N samples only to discard all but K pieces of data?


receive decompress

sample


What’s Wrong with this Picture?linear processinglinear signal model(bandlimited subspace)


receive decompress

sample


nonlinear processingnonlinear signal model(union of subspaces)

Compressive Sensing

• Directly acquire “compressed” data

• Replace samples by more general “measurements”

compressive sensing transmit/store

receive reconstruct

Compressive Sensing

Theory IGeometrical Perspective

• Signal is -sparse in basis/dictionary– WLOG assume sparse in space domain

Sampling

sparsesignal

nonzeroentries

Compressive Sampling

• When data is sparse/compressible, can directly acquire a condensed representation with no/little information loss through linear dimensionality reduction

measurements sparsesignal

nonzeroentries

How Can It Work?

• Projection not full rank…

… and so loses information in general

• Ex: Infinitely many ’s map to the same

How Can It Work?



• But we are only interested in sparse vectors

• is effectively MxK

columns

How Can It Work?



• But we are only interested in sparse vectors

• Design so that each of its MxK submatricesare full rank

columns

How Can It Work?

• Goal: Design so that its Mx2K submatrices are full rank

– difference between two K-sparse vectors is 2K sparse in general

– preserve information in K-sparse signals

– Restricted Isometry Property (RIP) of order 2K

columns

Unfortunately…

• Goal: Design so that its Mx2K submatrices are full rank(Restricted Isometry Property – RIP)

• Unfortunately, a combinatorial, NP-complete design problem

columns

Insight from the 80’s [Kashin, Gluskin]

• Draw at random– iid Gaussian– iid Bernoulli

…

• Then has the RIP with high probability

– Mx2K submatrices are full rank– stable embedding for sparse signals– extends to compressible signals in balls

columns

Compressive Data Acquisition

• Measurements = random linear combinationsof the entries of

• WHP does not distort structure of sparse signals – no information loss

measurements sparsesignal

nonzeroentries

CS Signal Recovery

• Goal: Recover signal from measurements

• Problem: Randomprojection not full rank(ill-posed inverse problem)

• Solution: Exploit the sparse/compressiblegeometry of acquired signal

• Sparse signal: only K out of Ncoordinates nonzero

– model: union of K-dimensional subspacesaligned w/ coordinate axes

Concise Signal Structure

sorted index


– model: union of K-dimensional subspaces

• Compressible signal: sorted coordinates decay rapidly to zero

sorted index

power-lawdecay



– model: union of K-dimensional subspaces

• Compressible signal: sorted coordinates decay rapidly to zero

– model: ball:

sorted index

power-lawdecay


CS Signal Recovery• Random projection

not full rank

• Recovery problem:givenfind

• Null space

• So search in null space for the “best”according to some criterion– ex: least squares

(N-M)-dim hyperplaneat random angle

• Recovery: given(ill-posed inverse problem) find (sparse)

• fast, wrong

CS Signal Recovery

pseudoinverse

Why Doesn’t Work

least squares,minimum solutionis almost never sparse

null space of

translated to

(random angle)

for signals sparse in the space/time domain

• Reconstruction/decoding: given(ill-posed inverse problem) find

• fast, wrong

•

CS Signal Recovery

number ofnonzeroentries

“find sparsestin translated nullspace”

• Reconstruction/decoding: given(ill-posed inverse problem) find

• fast, wrong

• correct:only M=2Kmeasurements required to stably reconstruct K-sparse signal

slow: NP-completealgorithm

CS Signal Recovery

number ofnonzeroentries

• Recovery: given(ill-posed inverse problem) find (sparse)

• fast, wrong

• correct, slow

• correct, efficientmild oversampling[Candes, Romberg, Tao; Donoho]

number of measurements required

CS Signal Recovery

linear program

Why Works

minimum solution= sparsest solution (with high probability) if

for signals sparse in the space/time domain

Universality

• Random measurements can be used for signals sparse in any basis

Universality


Universality


sparsecoefficient

vector

nonzeroentries

Compressive Sensing

• Directly acquire “compressed” data

• Replace N samples by M random projections

transmit/store

receive linear pgm

…

random measurements

Compressive Sensing

Theory IIStable Embedding

Johnson-Lindenstrauss Lemma

• JL Lemma: random projection stably embeds a cloud of Q points whp provided

• Proved via concentration inequality

• Same techniques link JLL to RIP [B, Davenport, DeVore, Wakin, Constructive Approximation, 2008]

Q points

Connecting JL to RIPConsider effect of random JL � on each K-plane

– construct covering of points Q on unit sphere– JL: isometry for each point with high probability– union bound � isometry for all points q in Q– extend to isometry for all x in K-plane

K-plane

Connecting JL to RIPConsider effect of random JL � on each K-plane

– construct covering of points Q on unit sphere– JL: isometry for each point with high probability– union bound � isometry for all points q in Q– extend to isometry for all x in K-plane– union bound � isometry for all K-planes

K-planes

• Gaussian

• Bernoulli/Rademacher [Achlioptas]

• “Database-friendly” [Achlioptas]

• Random Orthoprojection to RM [Gupta, Dasgupta]

Favorable JL Distributions

RIP as a “Stable” Embedding

• RIP of order 2K implies: for all K-sparse x1 and x2,

K-planes

• Theorem: Supposing � is drawn from a JL-favorabledistribution,* then with probability at least 1-e-C*M, � meets the RIP with

* Gaussian/Bernoulli/database-friendly/orthoprojector

• Bonus: universality (repeat argument for any ��)

• See also Mendelson et al. concerning subgaussian ensembles

Connecting JL to RIP[with R. DeVore, M. Davenport, R. Baraniuk]

Compressive Sensing

Recovery Algorithms

CS Recovery Algorithms

• Convex optimization: [with or without noise]– Linear programming– BPDN– GPSR– FPC– Bregman iteration– Dantzig selector, …

• Iterative greedy algorithm– Matching Pursuit (MP)– Orthogonal Matching Pursuit (OMP)– StOMP– ROMP– CoSaMP, …

Compressive Sensing

Theory Summary

CS Hallmarks

• CS changes the rules of the data acquisition game– exploits a priori signal sparsity information

• Stable– acquisition/recovery process is numerically stable

• Universal– same random projections / hardware can be used for

any compressible signal class (generic)

• Asymmetrical (most processing at decoder)– conventional: smart encoder, dumb decoder– CS: dumb encoder, smart decoder

• Random projections weakly encrypted

CS Hallmarks

• Democratic– each measurement carries the same amount of information– robust to measurement loss and quantization simple

encoding

• Ex: wireless streaming application with data loss

– conventional: complicated (unequal) error protection of compressed data� DCT/wavelet low frequency coefficients

– CS: merely stream additional measurements and reconstruct using those that arrive safely (fountain-like)

Compressive SensingIn Action

Art

Gerhard Richter4096 Farben / 4096 Colours

1974254 cm X 254 cmLaquer on CanvasCatalogue Raisonné: 359

Museum Collection:Staatliche KunstsammlungenDresden (on loan)

Sales history: 11 May 2004Christie's New York Post-War and Contemporary Art (Evening Sale), Lot 34US$3,703,500

Gerhard RichterCologne Cathedral

Stained Glass


Cameras

“Single-Pixel” CS Camera

randompattern onDMD array

DMD DMD

single photon detector

imagereconstruction

orprocessing

w/ Kevin Kelly

scene

“Single-Pixel” CS Camera


DMD DMD


imagereconstruction

orprocessing

scene

• Flip mirror array M times to acquire M measurements• Sparsity-based (linear programming) recovery

…

Single Pixel Camera

Object LED (light source)

DMD+ALP Board

Lens 1Lens 2Photodiode

circuit

First Image Acquisition

target 65536 pixels

1300 measurements (2%)

11000 measurements (16%)

Second Image Acquisition

500 random measurements

4096 pixels

Single-Pixel Camera


DMD DMD


imagereconstruction

orprocessing

CS Low-Light Imaging with PMT

true color low-light imaging

256 x 256 image with 10:1 compression[Nature Photonics, April 2007]

Hyperspectral Imaging

spectrometer

blue

red

near IR

Video Acquisition• Measure via stream of random projections

– shutterless camera

• Reconstruct using sparse model for video structure– 3-D wavelets (localized in space-time)

space

time

3D complex wavelet

original 64x64x643-D wavelet thresholding

2000 coeffs, MSE = 3.5

frame-by-frame 2-D CS recon20000 coeffs, MSE = 18.4

joint 3-D CS recon20000 coeffs, MSE = 3.2

Miniature DMD-based Cameras

• TI DLP “picoprojector” destined for cell phones

Georgia Tech Analog Imager

• Bottleneck in imager arrays is data readout

• Instead of quantizing pixel values, take CS inner products in analog

• Potential for tremendous (factor of 10000) power savings

[Hassler, Anderson, Romberg, et al]

Other Imaging Apps

• Compressive camera arrays– sampling rate scales logarithmically

in both number of pixels and number of cameras

• Compressive radar and sonar– simplified receiver– exploring radar/sonar networks

• Compressive DNA microarrays– smaller, more agile arrays for

bio-sensing– exploit sparsity in presentation

of organisms to array [O. Milenkovic et al]


A/D Converters

Analog-to-Digital Conversion

• Nyquist rate limits reach of today’s ADCs

• “Moore’s Law” for ADCs:– technology Figure of Merit incorporating sampling rate

and dynamic range doubles every 6-8 years

• DARPA Analog-to-Information (A2I) program– wideband signals have

high Nyquist rate but are often sparse/compressible

– develop new ADC technologies to exploit

– new tradeoffs amongNyquist rate, sampling rate,dynamic range, …

frequency hopperspectrogram

time

freq

uen

cy

Analog-to-Information Conversion

• Sample near signal’s (low) “information rate”rather than its (high) Nyquist rate

• Practical hardware:randomized demodulator(CDMA receiver)

A2Isampling rate

number oftones /window

Nyquistbandwidth

Example: Frequency Hopper

20x sub-Nyquistsampling

spectrogram sparsogram

Nyquist rate sampling

Analog-to-Information Conversion

analogsignal

digitalmeasurements

informationstatisticsA2I DSP

• Analog-to-information (A2I) converter takes analog input signal and creates discrete (digital) measurements

• Extension of analog-to-digital converter that samples at the signal’s “information rate” rather than its Nyquist rate

Streaming Measurements

measurements

streaming requires special

• Streaming applications: cannot fit entire signal into a processing buffer at one time

Random Demodulator

Random Demodulator

50 100 150 200 250 300 350 400 450

�0.5

0

0.5

Random Demodulator

Parallel Architecture

p1(t) mT

y1[m]

p2(t) mT

y2[m]

pZ(t) mT

yZ[m]

x(t)

[Candes, Romberg, Wakin, et al]

Random Demodulator

• Theorem [Tropp et al 2007]

If the sampling rate satisfies

then locally Fourier K-sparse signals can be recovered exactly with probability

Empirical Results

Example: Frequency Hopper

• Random demodulator AIC at 8x sub-Nyquist

spectrogram sparsogram


Data Processing

Information Scalability

• Many applications involve signal inferenceand not reconstruction

detection < classification < estimation < reconstruction

fairly computationallyintense

Information Scalability

• Many applications involve signal inferenceand not reconstruction

detection < classification < estimation < reconstruction

• Good news: CS supports efficient learning, inference, processing directly on compressive measurements

• Random projections ~ sufficient statisticsfor signals with concise geometrical structure

Matched Filter• Detection/classification with K unknown

articulation parameters– Ex: position and pose of a vehicle in an image– Ex: time delay of a radar signal return

• Matched filter: joint parameter estimation and detection/classification– compute sufficient statistic for each potential target and

articulation– compare “best” statistics to detect/classify

Matched Filter Geometry

• Detection/classification with K unknown articulation parameters

• Images are points in

• Classify by finding closesttarget template to datafor each class (AWG noise)

– distance or inner product

data

target templatesfrom

generative modelor

training data (points)

Matched Filter Geometry

• Detection/classification with K unknown articulation parameters

• Images are points in

• Classify by finding closesttarget template to data

• As template articulationparameter changes, points map out a K-dimnonlinear manifold

• Matched filter classification = closest manifold search

articulation parameter space

data

CS for Manifolds

• Theorem:

random measurements stably embed manifoldwhp[B, Wakin, FOCM ’08]related work:[Indyk and Naor, Agarwal et al., Dasgupta and Freund]

• Stable embedding

• Proved via concentration inequality arguments(JLL/CS relation)

CS for Manifolds

• Theorem:

random measurements stably embed manifoldwhp

• Enables parameter estimation and MFdetection/classificationdirectly on compressivemeasurements– K very small in many

applications (# articulations)

Example: Matched Filter

• Detection/classification with K=3 unknown articulation parameters1. horizontal translation2. vertical translation3. rotation

Smashed Filter

• Detection/classification with K=3 unknown articulation parameters (manifold structure)

• Dimensionally reduced matched filter directly on compressive measurements

Smashed Filter

• Random shift and rotation (K=3 dim. manifold)• Noise added to measurements• Goal: identify most likely position for each image class

identify most likely class using nearest-neighbor test

number of measurements Mnumber of measurements M

avg.

shift

estim

ate

erro

r

clas

sifica

tion r

ate

(%)

more noise

more noise

Compressive Sensing(Theoretical Aside)

CS for Manifolds

• Random projections preserve information– Compressive Sensing (sparse signal embeddings)– Johnson-Lindenstrauss lemma (point cloud embeddings)

• What about manifolds?

K-planes

Random Projections

K-dimensional,smooth, compact

• M � 2K+1 linear measurements- necessary for injectivity (in general)- sufficient for injectivity (w.p. 1) when � Gaussian

• But not enough for efficient, robust recovery

Manifold Embeddings[Whitney (1936); Sauer et al. (1991)]

Recall: K-planes Embedding

K-planes

• M � 2K linear measurements- necessary for injectivity- sufficient for injectivity (w.p. 1) when � Gaussian

• But not enough for efficient, robust recovery

Let � be a random MxN orthoprojector with

Theorem:

Let F ½ RN be a compact K-dimensional manifold with– condition number 1/� (curvature, self-avoiding)

– volume V

Then with probability at least 1-�, the following

statement holds: For every pair x1,x2 2 F,

Stable Manifold Embedding[with R. Baraniuk]

Let � be a random MxN orthoprojector with

Theorem:

Let F ½ RN be a compact K-dimensional manifold with– condition number 1/� (curvature, self-avoiding)

– volume V

Then with probability at least 1-�, the following

statement holds: For every pair x1,x2 2 F,

Stable Manifold Embedding[with R. Baraniuk]

Stable Manifold EmbeddingSketch of proof:

– construct a sampling of points� on manifold at fine resolution� from local tangent spaces

– apply JL to these points

– extend to entire manifold

Implications:

Nonadaptive (even random) linear projections can efficiently capture & preserve structure of manifold

See also: Indyk and Naor, Agarwal et al., Dasgupta and Freund

Manifold Learning

• Given training points in , learn the mapping to the underlying K-dimensional articulation manifold

• ISOMAP, LLE, HLLE, …

• Ex: images of rotating teapot

articulation space= circle

Up/Down Left/Right Manifold

[Tenenbaum, de Silva, Langford]

K=2

Compressive Manifold Learning

• ISOMAP algorithm based on geodesic distances between points

• Random measurements preserve these distances

• Theorem: If , then theISOMAP residual variance in the projected domain is bounded by the additive error factor

full data (N=4096) M = 100 M = 50 M = 25

translatingdisk manifold

(K=2)

[Hegde et al ’08]


Networked Sensing

• Sensor networks:intra-sensor and inter-sensor correlation

• Can we exploit these to jointly compress?

• Popular approach: collaboration– inter-sensor communication overhead

• Ongoing challenge in information theory– Slepian-Wolf coding, …

• One solution: Compressive Sensing

Multi-Signal CS

• “Measure separately, reconstruct jointly”• Ingredients

– models for joint sparsity– algorithms for joint reconstruction– theoretical results for measurement savings

Distributed CS (DCS)

…

• “Measure separately, reconstruct jointly”• Ingredients

– models for joint sparsity– algorithms for joint reconstruction– theoretical results for measurement savings

• Zero collaboration, trivially scalable, robust• Low complexity, universal encoding

Distributed CS (DCS)

…

Real Data Example• Light Sensing in Intel Berkeley Lab• 49 sensors, N =1024 samples each, � = wavelets

K=100

M=400

M=400

Multisensor Fusion• Network of J sensors viewing articulating object

• Complexity/bandwidth for fusion and inference

– naïve w/ no compression– individual CS (sensor by sensor)

Multisensor Fusion• Network of J sensors viewing articulating object

• Collection of images lies on K-dim manifold in

• Complexity/bandwidth for fusion and inference

– naïve w/ no compression– individual CS (sensor by sensor)

– joint CS (fuse all sensors)

Multi-Camera Vision Apps

• Background subtraction from multi-camera compressive video – innovations (due to object motion) sparse in space domain– estimate innovations from random projections of video– only few measurements required per camera

thanks to overlapping field of view

Multi-Camera Vision Apps• 3-D shape inference from compressive msmnts

– estimate innovations from random projections of video– fuse into 3D shape by exploiting object sparsity in 3D volume

Compressive Sensing

Related Work

Related Work

• Streaming algorithms

• Multiband sampling

• Finite rate of innovation (FROI) sampling

• General sparse approximation

Compressive Sensing

The End

CS – Summary

• Compressive sensing– integrates sensing, compression, processing– exploits signal sparsity/compressibility information– enables new sensing modalities, architectures, systems

• Why CS works: stable embedding for signals with concise geometric structure

sparse signals | compressible signals | manifolds

• Information scalability for compressive inference– compressive measurements ~ sufficient statistics– many fewer measurements may be required to

detect/classify/estimate than to reconstruct

dsp.rice.edu/cs