Random Projection, Generative Lattices, and Redundancy to...

Random Projection, Generative Lattices, andRedundancy to Combat Scalability Bounds In

Distributed Computing

Lee A. Carraher

School of Electronic and Computing SystemsUniversity of Cincinnati

March 3, 2014

Introduction

I From CincinnatiI B.S. Computer Engineering (UC 2008)I M.S. Computer Science (UC 2012)

I Advisor Prof. Fred AnnexsteinI Ph.D Computer Engineering (ongoing)

I Advisor: Prof. Fred Annexstein

2 / 60

Interests

Some research interests:I Machine Learning (bioinformatics, filtering)I Inverse Problems (min/max problems)I Parallel Computing (CUDA, MPI)I Distributed Computing (Mapreduce, Spark)I Big Data

3 / 60

Motivations

“What are the important problems of your field?”- Richard Hamming

4 / 60

Data Storage is Cheap

I Store Everything Because it’s cheap (NSA...)

5 / 60

Big Problems in Computing

Two problems:1. We are storing more data than we can effectively process

(n→∞)2. Stagnated Clock speeds

I materials problemI energy problemI fundamental cooling problem (Landauer’s Principle)

6 / 60

Ways To Attack Computing Problems

Faster Processors

More Processors

Better Algorithms

7 / 60

Parallel Computing

Simple, Add more processors!Basic Issues

I Communication BottlenecksI Algorithmic Bottlenecks

8 / 60

Scaled Speedup Example

0 50 100 150 200Parallel Speedup

Query: Total Speedup as a function of Parallel Speedup

ParallelismOur Theoretical Speedup (93.4%)90% Serial Code95% Serial Code97% Serial CodeTheoretical Speedup

I 80/20 rule (Pareto) isn’t even on here!

9 / 60

Faster Processors

More Processors

Better Algorithms

10 / 60

Has Moore’s Law Stalled?

I Clock speed is dead.11 / 60

Faster Processors

More Processors

Better Algorithms

12 / 60

Better Algorithms?

I This attack plan is not well definedI Some algorithms are optimal

13 / 60

Faster Processors

More Processors

Better Algorithms

I What about the overlaps?

14 / 60

RPHASH

Random Projection Hashing Goals:I ScalabilityI Minimize Communication Complexity

Tradeoff:I RedundancyI Accuracy

15 / 60

Related Work

Database Clustering MethodsI DBScanI CliqueI CLARANSI Proclus

16 / 60

Background

I Big DataI CODI Locality Sensitive Hash FunctionsI Space PartitioningI Lattices

I A Decoding ExampleI Leech LatticeI Leech Decoder

I Functional ProgrammingI MR/HadoopI Random Projection

17 / 60

Working Definition of “Big Data”

Sales and Commercial hype aside,

Definition (Big Data)A set of data processing problems in which the required data istoo large to reside in main memory.Thrashing between MM and HD(even solid state) . unscalablealgorithm

I Health MetricsI DNA SequencesI Website/Click Metrics

18 / 60

Curse of DimensionalityCOD is sometimes cited as the cause for the distance functionloosing its usefulness for high dimensions. This arises from theratio of metric space partitioning to hypersphere embedding.

limd→∞

Vol(Sd )

Vol(Cd )=

d2d−1Γ (d/2)→ 0

Given a single distribution, the minimum and the maximumdistances become indiscernible. Or the relative majority ofspace is outside of the sphere

19 / 60

LSH Hash Families

Definition (Locality Sensitive Hash Function)let H = {h : S → U} is (r1, r2,p1,p2)−sensitive if for anyu, v ∈ S

1. if d(u, v) ≤ r1 then PrH[h(u) = h(v)] ≥ p1

2. if d(u, v) > r2 then PrH[h(u) = h(v)] ≤ p2

For this family ρ = log p1log p2

20 / 60

An Example Hash Family

Figure : Random Projection of R2 → R1

21 / 60

Voronoi Tiling

Voronoi partitioning is optimal in 2D.

Figure : Voronoi Partitioning of R2

22 / 60

Voronoi

I Voronoi diagrams make for very efficient hash functions in2d because, by definition, a point within a Voronoi region isnearest to the regions representative point.

I Voronoi regions provide an optimal solution to the NNpartitioning in 2-d Space!

I However, for arbitrary dimension d, Voronoi diagramsrequire Θ(nd/2)-space, and no known optimal pointlocation algorithms exists.

23 / 60

Lattices

Instead we will consider lattices, which provide regular spacepartitioning and scale to arbitrarily large dimensional space,and have sub-linear nearest center search algorithmsassociated with them.

Definition (Lattice in Rn)let v1, ..., vn be n linear independent vectors wherevi = vi,1, vi,2, ..., vi,n The lattice Λ with basis {v1, ..., vn} is the setof all integer combinations of v1, ..., vn the integer combinationsof the basis vectors are the points of the lattice.

Λ = {z1v1 + z2v2 + ...+ znvn|zi ∈ Z,1 ≤ i ≤ n}

24 / 60

Examples in 2D

Figure : Square(left) and Hexagonal(right) Lattices in R2

25 / 60

Constant Time Decoding

I Certain lattices allow us to find the nearest representativepoint in constant time.

I For example the above square lattice.I The nearest point can be found by simply rounding our real

valued point to its nearest integer.I With the exception of a few exceptional lattices, more

complex lattices have more complex searches (exponentialas d increases ).

26 / 60

Exceptional Higher Dimensional Lattices

The previous lattices work well in R2, but our data spaces are ingeneral� 2.

I fortunately there are some higher dimensional lattices, withefficient nearest center search algorithms.

I E8 or Gosset’s Lattice, is one such lattice inI it is also the densest lattice packing in R8.

27 / 60

Example of decoding E8

E8 can be formed by gluing two D8 integer lattices together andshifting by a vector of 1

2 . This gluing of less dense lattices andshifting by a “glue vector” is a common theme in finding denselattices.

I Decoding D8 is simpleI E8 = D8 ∩ D8 + 1

2I both cosets of D8 can be computed in parallelI D8’s decoding algorithm consists of rounding all values to

their nearest integer value s.t they sum to an even number

28 / 60

Example of decoding E8

define f (x) and g(x) to round the components of x, except ing(x) we round the furthest value from an integer in the wrongdirection.let

x =< 0.1,0.1,0.8,1.3,2.2,−0.6,−0.7,0.9 >

thenf (x) =< 0,0,1,1,2,−1,−1,1 >, sum = 3

andg(x) =< 0,0,1,1,2,0,−1,1 >, sum = 4

I since g(x) is even, it is the nearest lattice point in D8

29 / 60

Example of decoding E8 conti..

I Include the coset ∩D8 + 12 .

I We can do this by subtracting 12 from all the values of x.

f (x − 12

) =< 0,0,0,1,2,−1,−1,0 >, sum = 1

g(x − 12

) =< -1,0,0,1,2,−1,−1,0 > sum = 0

Now we find the coset representative that is closest to x using asimple distance metric.

||x − g(x)||2 = 0.65

||x − g(x12

)||2 = 0.95

So this case it is the first coset representative:

< 0,0,1,1,2,0,−1,1 >

30 / 60

Leech Lattice

By gluing sets of E8 together in a way originally conceived byCurtis’ MOG, we can get an even higher dimensional denselattice called the Leech lattice.

31 / 60

Leech Lattice

Here we will state some attributes of the leech lattice as well asgive a comparison to other lattices by way of Eb/N0 and thecomputational cost of decoding.Some Important Attributes:

I Densest Regular Lattice Packing in R24

I Lattice Construction can be based on 2 cosets G24

I Sphere Packing Density: π12

12! ≈ 0.00192957I Kmin = 196560

32 / 60

Leech Lattice

0 5 10 15 20 25EB/NO (dB)

r - lo

Error Correcting Performance (3072000 Samples)

Leech hex -.2db and QAM16E8 2PSKUnencoded QAM64Unencoded QAM16

Figure : Performance of Some Coded and Unencoded DataTransmission Schemes

33 / 60

Leech Lattice Decoding

Some information about the decoding algorithm:I The decoding of the leech lattice is based closely on the

Decoding of the Golay Code.I In general, advances in either Leech decoding or binary

Golay decoding imply an advance in the other.I The decoding method used in this implementation is based

on Amrani and Be’ery’s ’96 publication for decoding theLeech lattice, and consists of around 519 floating pointoperations and suffers a gain loss of only 0.2bB.

I In general decoding complexity scales exponentially withdimension.

Next is an outline of the decoding process.

34 / 60

Leech Decoder

QAM Decoder

A Points

Block Confidence

Even Reps

Min Over H6

Even Reps

H-Parity

K-Parity

QAM Decoder

B Points

Block Confidence

Odd Reps

Block Confidence

Even Reps

Block Confidence

Odd Reps

Min Over H6

Odd Reps

H-Parity

Min Over H6

Even Reps

Min Over H6

Odd Reps

H-Parity

K-Parity

H-Parity

Compare Candidate Codewords

24 Reals Received

K-Parity

Figure : Leech Lattice Decoder

35 / 60

Functional Programming

Figure : Scargill, by Tim (timble.me.uk/blog/author/tim)

Parallel and Functional ProgrammingI Instead of moving the mountain to the people,Move the the

people to the mountainsI Where mountains are data and people are functions

respectively

36 / 60

Map Reduce Program Design

Figure : Map Reduce (courtesy: map-reduce.wikispaces.asu.edu

37 / 60

Hadoop

Hadoop is an open source implementation of the map reduceframework created and maintained by the Apache SoftwareGroup. Benefits

I Open SourceI Popular, and MaintainedI FreeI Implemented and compatible with Amazon EC2I Takes care of the networking and fault tolerance drudgery

of parallel system programming.

38 / 60

Hadoop Parallel System Design

Figure : Hadoop System Design (ibm.com/developerworks)

39 / 60

EC2 Cloud

Cloud and Distributed ServicesI Scalable to data processing problems needsI Very Low Cost Processing ModelI Always Up to Data HW ResourcesI Zero HW maintenance and overhead costs

40 / 60

Mahout

Mahout is an open source library of machine learningalgorithms made for Hadoop.Clustering Algorithms:

I Canopy ClusteringI K-MeansI Mean ShiftI LDAI MinHash

41 / 60

Random Projection

Figure : Random Projection: R3 → R2

42 / 60

Random Projection

Theorem (JL - Lemma)

(1− ε)‖u − u′‖2 ≤ ‖f (u)− f (u′)‖2 ≤ (1 + ε)‖u − u′‖2

ε - is an distortion quantityu,u′ ∈ U - two independent vectorsf - a random projection mapping

Rd → Rl

l ∝ Θ(log(n)

ε2log(1/ε))

43 / 60

Fast Johnson-Lindenstrauss TransformI New and cool!I Using Heisenberg Uncertainty in Harmonic Analysis, a

spectrum and its signal cannot both be concentratedI Precondition projection with DFT, (some matrices have

very fast DFTs)I ≈ Θ(d log(d) + ε−3 log2(n)) vs Θ(dε−2 log(n))

44 / 60

Motivations of RP Hash

I Parallel Structure of a scalable parallel algorithm (LogReduce)

I Low Communication Overhead (Hashes)I Non -Parallel Iterative Structure (Per core redundancy)I Approximation is usually good enough

45 / 60

General Idea of RPHash

Figure : Multi-Probe Random Projection Intersection Probabilities

I generative space quantizationI random projectionI sequential multi-probe stochastic process

46 / 60

Occultation Problem

The occultation problem is the probability of two or moreindependent distributions overlapping in projected space.

I based on the distribution variance and angle of theprojective plane

I Applicable bounds from Urruty ’07. d is number of probes

I limd→∞

2(r1+r2)π‖d−c‖

I In RPHash, d is the dimensionality (24).I RPHash projections are orthogonal

47 / 60

Sequential algorithm

1. Generate Random Projection matrix P2. Maintain DBcount of hash id’s3. Maintain DBcent Array of centroids corresponding to vectors4. Forall x ∈ X :

4.1 index = LatticeDec(xP>)4.2 DBcount [ID]++4.3 DBcent [ID]+ = x

5. sort[DBcount ,DBcent ]6. return DBcent [0 : k ]

48 / 60

Comparison with standard k-means

49 / 60

Sequential Algorithm Time Results

5000 10000 15000 20000Number of Vectors

Time on Gaussian Data

RP Hash Mean: y = -0.101 + 0.0002xK-Means Mean: y = -0.050 + 8.3470E-5xRP HashK-Means

4000 5000 6000 7000 8000 9000 10000 11000 12000 13000Number of Dimensions

Time on Gaussian Data

RP Hash Mean: y = .3730 + .0006xK-Means Mean: y = .0963 + .0001xRP HashK-Means

50 / 60

Sequential Algorithm Accuracy Results

0 1 2 3 4 5 6 7 8 9Number of Clusters

Precision Recall on 50:50 Gaussian Data

RP Hash Mean: y = .8803 - .0155xK-Means Mean: y = .9963 - .0189xRP HashK-Means

0 2000 4000 6000 8000 10000 12000 14000 16000Dimensions

Precision Recall on 50:50 Gaussian Data

RP Hash Mean: y = .7942 + 3.885E-7xK-Means Mean: y = .8681 - 1.040E-6xRP HashK-Means

51 / 60

Scalability Goal

I It would be nice if our algorithm scaled with the number ofprocessing nodes

I lets look at an algorithm that scales well and try to apply itto our problem

I the canonical hadoop ”Hello World” simulacrum ”WordCount” is a good place to start

52 / 60

Hadoop Word Count

53 / 60

Hadoop Word Count Scalability

54 / 60

Basic Outline

I Each processing node computes hashes and countsI Share the top k buckets to all computersI Aggregate all centroid averages

A Trick to minimize communication and storage reqrmnt.I Two Phase:I Phase 1: Only store counts and communicate IDsI Phase 2: Only accept hash collisions with Phase 1’s top

IDs for all clusters

55 / 60

Parallel Algorithm Phase 1

beginX = {x1, ..., xn}, xk ∈ Rm - data vectorsD - set of available compute nodesH - is a d dimensional LSH functionX̃ ⊆ X - vectors per compute nodePm→d - Gaussian projection matrixCs = {∅} - set of bucket collision countsforeach xk ∈ X̃ do

x̃k ←√

t = H(x̃k )Cs[t ] = Cs[t ] + 1

endsort({Cs,Cs.index}) return {Cs,Cs.index}[0 : k log(n)]

56 / 60

Parallel Algorithm Phase 2

beginX = {x1, ..., xn}, xk ∈ Rm - data vectorsD - set of available compute nodes{Cs,Cs.index} - set of klogn cluster IDs and countsH - is a d-dimensional LSH functionX̃ ⊆ X - vectors per compute nodepm→d ∈ P - Gaussian projection matricesC = {∅} - set of centroidsforeach xk ∈ X̃ do

foreach pm→d ∈ P̃ do

x̃k ←√

md pᵀxk

t = H(x̃k ) if t ∈ Cs.index thenC[t ] = C[t ] + xk

endend

endreturn C

57 / 60

What to Use this For?

202917_s_at : S100A8 220414_at : CALML5 204475_at : MMP1 201195_s_at : SLC7A5 203915_at : CXCL9 204533_at : CXCL10 210163_at : CXCL11 211122_s_at : CXCL11 201291_s_at : TOP2A 203560_at : GGH 202779_s_at : UBE2S 217755_at : HN1 201890_at : RRM2 209773_s_at : RRM2 211762_s_at : KPNA2 203213_at : CDC2 218883_s_at : MLF1IP 218585_s_at : DTL 201292_at : TOP2A 218662_s_at : NCAPG 203554_x_at : PTTG1 210052_s_at : TPX2 203764_at : DLGAP5 209714_s_at : CDKN3 214710_s_at : CCNB1 219148_at : PBK 204170_s_at : CKS2 202870_s_at : CDC20 202705_at : CCNB2 204962_s_at : CENPA 202095_s_at : BIRC5 202954_at : UBE2C 208079_s_at : AURKA 201088_at : KPNA2 204033_at : TRIP13 204825_at : MELK 218355_at : KIF4A 205034_at : CCNE2 206102_at : GINS1 204641_at : NEK2 207828_s_at : CENPF 218009_s_at : PRC1 202503_s_at : KIAA0101 218039_at : NUSAP1 204026_s_at : ZWINT 203362_s_at : MAD2L1 210559_s_at : CDC2 222077_s_at : RACGAP1 205357_s_at : AGTR1 209189_at : FOS 201693_s_at : EGR1 202768_at : FOSB 203929_s_at : MAPT 203438_at : STC2 203439_s_at : STC2 203130_s_at : KIF5C 219197_s_at : SCUBE2 204014_at : DUSP4 201860_s_at : PLAT 205898_at : CX3CR1 217889_s_at : CYBRD1 221796_at : NTRK2 204731_at : TGFBR3 205710_at : LRP2 204823_at : NAV3 205381_at : LRRC17 212865_s_at : COL14A1 209291_at : ID4 213933_at : PTGER3

Clustering absolute log2 levels

−3 −1 1 2 3Value

Color Key

Figure : Gene Expression Levels in Primary Breast Cancer TumorSamples

58 / 60

Data Security in BioInformatics

I New attacks on anonymized data present a risk to patientsprivacy.

I Very few cloud services guarantee secure processing.I Highly distributed systems add even more attack vectors.I Attacks prompted a Presidential commission on WGS

privacy.

59 / 60

Data Security: A Freebie!

I Random Projection Offers Some ProtectionsI The only full vectors transmitted are cluster centroids,

which by definition are an aggregate of many vectors.I Showing dissimilarity should be somewhat straightforward

I v =√

nk RT u, v ′ =

√kn vT R̂−1

I similarity = ||v , v ′||2

60 / 60

Questions ?

61 / 60

Random Projection, Generative Lattices, and Redundancy to...

Documents

Transcript of Random Projection, Generative Lattices, and Redundancy to...

RAFTS, BAFFLES & LATTICES

Lattices, Cryptography and Computing with Encrypted Data Vinod Vaikuntanathan M.I.T.

Relational Lattices: From Databases to Universal Algebraszabolcs/rellat-jlamp-second-submission-2.pdf · Relational Lattices: From Databases to Universal Algebra ... relational lattices,

Exam P Sample Questions - homepages.uc.edu

Lattices Diffraction.

Discrete Structures - CM0246 Lattices - EAFIT Structures - CM0246 Lattices Andrés Sicard-Ramírez EAFIT University Semester 2014-2. Lattices from the Partial Orders Theory Definition

4 Distributive Lattices

Counting and Computing Join-Endomorphisms in Lattices · Counting and Computing Join-Endomorphisms in Lattices? Santiago Quintero2, Sergio Ramirez 1, Camilo Rueda , Frank Valencia;3

Discrete Math For Computing II. Main Text: Topics in enumeration; principle of inclusion and exclusion, Partial orders and lattices. Algorithmic complexity;

Wyoming Lattices

Cryptography with Lattices - Xagawaxagawa.net/pdf/2010Thesis.pdfCryptography with Lattices 07D37042 Keita Xagawa Supervisor: Keisuke Tanaka Department of Mathematical and Computing

Lattices, Cryptography and Computing with Encrypted Data

Crystal 102 m 2 m 2 mm 7 Crystal lattices 14 Bravais lattices.

Congruence Lattices of A Finite Uniform Lattices

Crustal Lattices and Reciprocal Lattices

Supermodular Lattices

Bravais lattices

Ultralight Metallic Micro Lattices

Lattices and Minkowski' Theorm

Point Lattices: Bravais Lattices - MIT - Massachusetts Institute