Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R....
Transcript of Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R....
![Page 1: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/1.jpg)
Computational Methods for pLarge-Scale Data Analysis
Al d GAlexander GrayGeorgia Institute of Technology
C ll f C tiCollege of Computing
FASTlab: Fundamental Algorithmic and Statistical Tools
![Page 2: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/2.jpg)
Is science in 2008different from science in 1908?different from science in 1908?
Instruments
[Science Szalay & J Gray 2001][Science, Szalay & J. Gray, 2001]
![Page 3: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/3.jpg)
Is science in 2008different from science in 1908?different from science in 1908?
InstrumentsData: CMB Maps
1.0E+06
1.0E+07
1.0E+06
1.0E+07
1 0E 03
1.0E+04
1.0E+05
1.0E+03
1.0E+04
1.0E+05
1990 COBE 1,000
1.0E+02
1.0E+03
1985 1990 1995 2000 2005 2010
1.0E+021985 1990 1995 2000 2005 2010
[Science Szalay & J Gray 2001] 1990 COBE 1,0002000 Boomerang 10,0002002 CBI 50,0002003 WMAP 1 Million2008 Planck 10 Million
Data: Local Redshift Surveys1986 CfA 3,5001996 LCRS 23,000
Data: Angular Surveys1970 Lick 1M
[Science, Szalay & J. Gray, 2001]
2008 Planck 10 Million,2003 2dF 250,0002005 SDSS 800,000
1990 APM 2M2005 SDSS 200M2008 LSST 2B
![Page 4: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/4.jpg)
Sloan Digital Sky Survey (SDSS)(SDSS)
![Page 5: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/5.jpg)
1 billion objectsj144 dimensions
(~250M galaxies in 5 colors, ~1M 2000-D spectra)~1M 2000-D spectra)
Size matters! Now possible:• low noise: subtle patterns• global properties and patterns• rare objects and patterns• more info: 3d, deeper/earlier, bands• in parallel: more accurate simulations• 2008: LSST time varying phenomena• 2008: LSST – time-varying phenomena
![Page 6: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/6.jpg)
Happening everywhere!Molecular biologymicroarray chips nuclear mag. resonance Drug discovery
Earth sciencessatellite topography microprocessor Physical simulation
Neurosciencefunctional MRIInternet
fiber optics
![Page 7: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/7.jpg)
1.How did galaxies evolve?2.What was the early universe like?Astrophysicist 2.What was the early universe like?3.Does dark energy exist?4. Is our model (GR+inflation) right?
p y
R. Nichol, Inst. Cosmol. GravitationA. Connolly, U. Pitt PhysicsC. Miller, NOAOR. Brunner, NCSAG. Kulkarni, Inst. Cosmol. GravitationD. Wake, Inst. Cosmol. Gravitation
Machine learning/
R. Scranton, U. Pitt PhysicsM. Balogh, U. Waterloo PhysicsI. Szapudi, U. Hawaii Inst. AstronomyG Ri h d Machine learning/
statistics guyG. Richards, Princeton PhysicsA. Szalay, Johns Hopkins Physics
![Page 8: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/8.jpg)
1.How did galaxies evolve?2.What was the early universe like?Astrophysicist 2.What was the early universe like?3.Does dark energy exist?4. Is our model (GR+inflation) right?
p y
O(N2)• Kernel density estimator O(Nn)
O(N2)O(N2)
R. Nichol, Inst. Cosmol. Grav.A. Connolly, U. Pitt PhysicsC. Miller, NOAO
• n-point spatial statistics• Nonparametric Bayes classifier• Support vector machine O(N )
O(N2)O(N3)
O(cDT(N))
R. Brunner, NCSAG. Kulkarni, Inst. Cosmol. Grav.D. Wake, Inst. Cosmol. Grav.
Support vector machine• Nearest-neighbor statistics• Gaussian process regression
Hierarchical clustering
Machine learning/
O(cDT(N))R. Scranton, U. Pitt PhysicsM. Balogh, U. Waterloo PhysicsI. Szapudi, U. Hawaii Inst. Astro.G Ri h d
• Hierarchical clustering
Machine learning/statistics guy
G. Richards, Princeton PhysicsA. Szalay, Johns Hopkins Physics
![Page 9: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/9.jpg)
1.How did galaxies evolve?2.What was the early universe like?Astrophysicist 2.What was the early universe like?3.Does dark energy exist?4. Is our model (GR+inflation) right?
p y
• Kernel density estimator O(N2)R. Nichol, Inst. Cosmol. Grav.A. Connolly, U. Pitt PhysicsC. Miller, NOAO
• n-point spatial statistics• Nonparametric Bayes classifier• Support vector machine
O(Nn)O(N2)
O(N2)R. Brunner, NCSAG. Kulkarni, Inst. Cosmol. Grav.D. Wake, Inst. Cosmol. Grav.
Support vector machine• Nearest-neighbor statistics• Gaussian process regression
Hierarchical clustering
O(N )O(N2)
O(N3)O(N3)R. Scranton, U. Pitt Physics
M. Balogh, U. Waterloo PhysicsI. Szapudi, U. Hawaii Inst. Astro.G Ri h d
• Hierarchical clustering
Machine learning/
O(N3)
G. Richards, Princeton PhysicsA. Szalay, Johns Hopkins Physics
Machine learning/statistics guy
![Page 10: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/10.jpg)
1.How did galaxies evolve?2.What was the early universe like?Astrophysicist 2.What was the early universe like?3.Does dark energy exist?4. Is our model (GR+inflation) right?
p y
• Kernel density estimator O(N2)R. Nichol, Inst. Cosmol. Grav.A. Connolly, U. Pitt PhysicsC. Miller, NOAO
• n-point spatial statistics• Nonparametric Bayes classifier• Support vector machine
O(Nn)O(N2)
O(N2)R. Brunner, NCSAG. Kulkarni, Inst. Cosmol. Grav.D. Wake, Inst. Cosmol. Grav.
Support vector machine• Nearest-neighbor statistics• Gaussian process regression
Hierarchical clustering
O(N )O(N2)
O(N3)O(N3)R. Scranton, U. Pitt Physics
M. Balogh, U. Waterloo PhysicsI. Szapudi, U. Hawaii Inst. Astro.G Ri h d
• Hierarchical clustering
B t I h 1 illi i t Machine learning/
O(N3)
G. Richards, Princeton PhysicsA. Szalay, Johns Hopkins Physics
But I have 1 million points Machine learning/statistics guy
![Page 11: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/11.jpg)
Statistics/learning challengesStatistical (modeling, validation):
Statistics/learning challenges
• Best performance with fewest assumptions
Computational:Computational:• Large N (#data), D (#features)
D
N
![Page 12: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/12.jpg)
Statistics/learning challengesStatistical (modeling, validation):
Statistics/learning challenges
• Best performance with fewest assumptions
Computational:Computational:• Large N (#data), D (#features), M (#models)
D
N
M
![Page 13: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/13.jpg)
Statistics/learning challengesStatistical (modeling, validation):
Statistics/learning challenges
• Best performance with fewest assumptions
Computational:Computational:• Large N (#data), D (#features), M (#models)
D
N Reduce? Simplify? Poor modeling
M
![Page 14: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/14.jpg)
Statistics/learning challengesStatistical (modeling, validation):
Statistics/learning challenges
• Best performance with fewest assumptions
Computational:Computational:• Large N (#data), D (#features), M (#models)
D
N Reduce? Simplify? Poor modeling
Avoid hard problems? Poor fundingM
![Page 15: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/15.jpg)
My motivating datasetsMy motivating datasets• 1993-1999: POSS-II• 1999-2008: SDSS
Coming Pan STARRS LSST• Coming: Pan-STARRS, LSST• Also:
– Millennium simulation dataLarge Hadron Collider data– Large Hadron Collider data
– network traffic (email) data– Inbio ecology data
![Page 16: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/16.jpg)
What I like to think about…What I like to think about…• The statistical problems and
methods needed for answering scientific questionsscientific questions
• The computational problems and methods involved in scalingand methods involved in scaling all of them up to big datasets
• MLPACK: software for large-scale machine learning (later inscale machine learning (later in 2008)
![Page 17: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/17.jpg)
OUTLINE1. What are some of the
statistical problems and methods to consider?
2. What are some of the2. What are some of the computational problems and methods to consider?and methods to consider?
3. What might the softwarewhich implements all thiswhich implements all this look like?
![Page 18: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/18.jpg)
OUTLINE1. What are some of the
statistical problems and methods to consider?
2. What are some of the2. What are some of the computational problems and methods to consider?and methods to consider?
3. What might the softwarewhich implements all thiswhich implements all this look like?
![Page 19: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/19.jpg)
10 data analysis problems, and l bl l ’d lik f hscalable tools we’d like for them
1. Querying (e.g. characterizing a1. Querying (e.g. characterizing a region of space, defining a trigger):nearest-neighbor, spherical range-g p gsearch, orthogonal range-search
2. Density estimation (e.g. comparing y ( g p ggalaxy types): kernel density estimation, mixture of Gaussians
3. Regression (e.g. optical redshifts):linear regression, kernel regression, G i iGaussian process regression
![Page 20: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/20.jpg)
10 data analysis problems, and l bl l ’d lik f hscalable tools we’d like for them
4. Classification (e.g. quasar detection, star-( g q ,galaxy separation): nearest-neighbor classifier, nonparametric Bayes classifier, support vector machinemachine
5. Dimension reduction (e.g. galaxy characterization): principal componentcharacterization): principal component analysis, kernel PCA, maximum variance unfolding
6. Outlier detection (e.g. new object types, data cleaning): by robust L2 estimation, by density estimation by dimension reductionestimation, by dimension reduction
![Page 21: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/21.jpg)
10 data analysis problems, and l bl l ’d lik f hscalable tools we’d like for them
7. Clustering (e.g. automatic Hubble g ( gsequence): k-means, hierarchical clustering (“friends-of-friends”), by dimension reductiondimension reduction
8. Time series analysis (e.g. asteroid tracking variable objects): Kalman filtertracking, variable objects): Kalman filter, hidden Markov model, trajectory tracking
9. 2-sample testing (e.g. cosmological p g ( g gvalidation): n-point correlation
10. Cross-match (e.g. multiple databases):bi tit t hibipartite matching
![Page 22: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/22.jpg)
OUTLINE1. What are some of the
statistical problems and methods to consider?
2. What are some of the2. What are some of the computational problems and methods to consider?and methods to consider?
3. What might the softwarewhich implements all thiswhich implements all this look like?
![Page 23: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/23.jpg)
Core computational problemsCore computational problems
What are the basic mathematicalWhat are the basic mathematical operations, or bottleneck subroutines, can we focus on developing fast algorithms for?developing fast algorithms for?
![Page 24: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/24.jpg)
Core computational problemsCore computational problems
• Aggregations• Aggregations• Generalized N-body problemsy p• Graphical model inference• Linear algebra• Optimization• Optimization
![Page 25: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/25.jpg)
Core computational problemsp pAggregations, GNPs, graphical models, linear algebra, optimization
• Querying: nearest-neighbor, sph range-search, ortho range-searchD i i i k l d i i i i f• Density estimation: kernel density estimation, mixture of Gaussians
• Regression: linear regression, kernel regression, Gaussian process regressiong
• Classification: nearest-neighbor classifier, nonparametric Bayes classifier, support vector machine
• Dimension reduction: principal component analysis, kernel PCA, maximum variance unfoldingmaximum variance unfolding
• Outlier detection: by robust L2 estimation, by density estimation, by dimension reduction
• Clustering: k-means, hierarchical clustering (“friends-of-friends”), by g , g ( ), ydimension reduction
• Time series analysis: Kalman filter, hidden Markov model, trajectory tracking
• 2 sample testing: n point correlation• 2-sample testing: n-point correlation• Cross-match: bipartite matching
![Page 26: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/26.jpg)
AggregationsAggregations
• How it appears: nearest-neighborHow it appears: nearest neighbor, sph range-search, ortho range-search
• Common methods: locality sensitive• Common methods: locality sensitive hashing, kd-trees, metric trees, disk-based treesbased trees
• Mathematical challenges: high di i bl idimensions, provable runtime
• Mathematical topics: computational p pgeometry, randomized algorithms
![Page 27: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/27.jpg)
AggregationsAggregations• Interesting method: Cover-trees [Beygelzimer g [ yg
et al 2004]– Provable runtime
C i t tl d f i hi h– Consistently good performance, even in higher dimensions
• Interesting method: Learning trees [Cayton etInteresting method: Learning trees [Cayton et al 2007]– Learning data-optimal data structures– Improves performance over kd-trees
• Interesting method: MapReduce [Google]Brute force– Brute-force
– But makes HPC automatic for a certain problem form
![Page 28: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/28.jpg)
Generalized N-body ProblemsGeneralized N body Problems• How it appears: kernel density estimation, pp y
mixture of Gaussians, kernel regression, Gaussian process regression, nearest-neighbor classifier, nonparametric Bayes classifier, , p y ,support vector machine, kernel PCA, hierarchical clustering, trajectory tracking, n-point correlationpoint correlation
• Common methods: FFT, Fast Gauss Transform, Well-Separated Pair DecompositionM h i l h ll hi h di i• Mathematical challenges: high dimensions, strong error guarantee
• Mathematical topics: approximation theoryMathematical topics: approximation theory, computational physics
![Page 29: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/29.jpg)
Generalized N-body ProblemsGeneralized N body Problems• Interesting method: Generalized FastInteresting method: Generalized Fast
Multipole Method, aka multi-tree methods [Gray et al. 2000-2008][ y ]– Fastest practical algorithms for most of the
problems to which it has been applied– Hard relative error bounds– Automatic parallelization (THOR: Tree-based
Higher Order Reduce)Higher-Order Reduce)– Big astrophysics results (dark energy
evidence Science 2003 cosmic magnificationevidence Science 2003, cosmic magnification verification Nature 2005, 1M quasars 2008)
![Page 30: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/30.jpg)
Graphical model inferenceGraphical model inference• How it appears: hidden MarkovHow it appears: hidden Markov
models, bipartite matching • Common methods: beliefCommon methods: belief
propagation, expectation propagation• Mathematical challenges: largeMathematical challenges: large
cliques, upper and lower bounds, graphs with loopsg p p
• Mathematical topics: variational methods, statistical physics, turbo , p y ,codes
![Page 31: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/31.jpg)
Graphical model inferenceGraphical model inference
• Interesting method: SurveyInteresting method: Survey propagation [Mezard et al 2002]
Good results for combinatorial– Good results for combinatorial optimizationBased on statistical physics ideas– Based on statistical physics ideas
• Interesting method: Expectation propagation [Minka 2001]propagation [Minka 2001]– Variational method based on moment-
t hi idmatching idea
![Page 32: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/32.jpg)
Linear algebraLinear algebra
• How it appears: linear regression• How it appears: linear regression, Gaussian process regression, PCA, k l PCA K l filtkernel PCA, Kalman filter
• Common methods: QR, Krylovy• Mathematical challenges: numerical
stability sparsity preservationstability, sparsity preservation• Mathematical topics: linear algebra
![Page 33: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/33.jpg)
Linear algebraLinear algebra• Interesting method: Monte Carlo SVDInteresting method: Monte Carlo SVD
[Frieze, Drineas, et al. 1998-2008]– Sample either columns or rows, from squared p , q
length distribution– For rank-k matrix approx; must know k
• Interesting method: QUIC-SVD [Holmes, Gray, Isbell 2008]– Sample using cosine trees and stratification– Automatically sets rank based on desired
error
![Page 34: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/34.jpg)
OptimizationOptimization• How it appears: support vectorHow it appears: support vector
machine, maximum variance unfolding, robust L2 estimation g 2
• Common methods: interior point, Newton’s method
• Mathematical challenges: large number of variables / constraints
• Mathematical topics: optimization theory, linear algebra, convex y, g ,analysis
![Page 35: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/35.jpg)
OptimizationOptimization
• Interesting method: Sequential• Interesting method: Sequential minimization optimization (SMO) [Platt 1999][Platt 1999]–Much more efficient than interior-
point, for SVM QPs• Interesting method: StochasticInteresting method: Stochastic
quasi-Newton [Schraudolf 2007]Does not require scan of entire data–Does not require scan of entire data
![Page 36: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/36.jpg)
Interaction between i i d istatistics and computation
• Explicitly trade off between• Explicitly trade off between statistical accuracy and runtime
• Monte Carlo: a statistical idea for computational purposescomputational purposes
• Active learning, aka design of g gexperiments: choose the important pointsimportant points
![Page 37: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/37.jpg)
OUTLINE1. What are some of the
statistical problems and methods to consider?
2. What are some of the2. What are some of the computational problems and methods to consider?and methods to consider?
3. What might the softwarewhich implements all thiswhich implements all this look like?
![Page 38: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/38.jpg)
Keep in mind the machineKeep in mind the machine
• Memory hierarchy: cache RAM• Memory hierarchy: cache, RAM, out-of-core
• Dataset bigger than one machine: parallel/distributedparallel/distributed
• Everything is becoming multicorey g g
![Page 39: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/39.jpg)
Keep in mind the overall systemKeep in mind the overall system
• Databases can be more useful• Databases can be more useful than ASCII files
• Workflows can be more useful than brittle perl scriptsthan brittle perl scripts
• Visual analytics connects yvisualization/HCI with data analysisanalysis
![Page 40: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/40.jpg)
Keep in mind the software l icomplexity
• Automatic code generation (e g• Automatic code generation (e.g. MapReduce)
• Automatic tuning (e.g. OSKI)A t ti l ith d i ti• Automatic algorithm derivation(e.g. AutoBayes, SPIRAL)( g y )
![Page 41: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/41.jpg)
Our upcoming productsOur upcoming products
• MLPACK: “the LAPACK of• MLPACK: the LAPACK of machine learning” – Dec. 2008
• THOR: “the MapReduce of Generalized N body Problems”Generalized N-body Problems –Apr. 2009
• Algorithmica: Automatic derivation of the above – 2010
![Page 42: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/42.jpg)
The endThe end
Always looking for collaborators, y g ,challenging applications, and
generous funding!generous funding!
Alexander Gray@ t h [email protected]
![Page 43: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/43.jpg)
Goal of this talk:M k b h d f !Make our best methods fast!
k l d it ti t• kernel density estimator • n-point statistics
nonparametric Bayes classifier• nonparametric Bayes classifier• support vector machine • nearest neighbor statistics• nearest neighbor statistics• Gaussian process regression• Bayesian inference• Bayesian inference• …
![Page 44: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/44.jpg)
OUTLINE“What’s the distribution?” 1. warm-up: generalized histogram
2. n-point statistics2. n point statistics3. kernel density estimator4. general strategy: multi-tree
1. nonparametric Bayes classifier2 support vector machine2. support vector machine 3. nearest neighbor statistics4. Gaussian process regression5 Bayesian inference5. Bayesian inference
5. science!
![Page 45: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/45.jpg)
OUTLINEComparing: “Same distribution?” 1. warm-up: generalized histogram
2. n-point statistics2. n point statistics3. kernel density estimator4. general strategy: multi-tree
1. nonparametric Bayes classifier2 support vector machine2. support vector machine 3. nearest neighbor statistics4. Gaussian process regression5 Bayesian inference5. Bayesian inference
5. science!
![Page 46: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/46.jpg)
OUTLINEThese are all “Generalized N body problems”1. warm-up: generalized histogram
2. n-point statistics“Generalized N-body problems”
[Gray thesis, 2003]2. n point statistics 3. kernel density estimator4. general strategy: multi-tree
1. nonparametric Bayes classifier2 support vector machine2. support vector machine 3. nearest neighbor statistics4. Gaussian process regression5 Bayesian inference5. Bayesian inference
5. science!
![Page 47: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/47.jpg)
OUTLINEScience #1 Breakthrough of 2003 1. warm-up: generalized histogram
2. n-point statistics2. n point statistics 3. kernel density estimator4. general strategy: multi-tree
1. nonparametric Bayes classifier2 support vector machine2. support vector machine 3. nearest neighbor statistics4. Gaussian process regression5 Bayesian inference5. Bayesian inference
5. science!
![Page 48: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/48.jpg)
OUTLINESpecial case of 2 and 3 1. warm-up: generalized histogram
2. n-point statistics2. n point statistics 3. kernel density estimator4. general strategy: multi-tree
1. nonparametric Bayes classifier2 support vector machine2. support vector machine 3. nearest neighbor statistics4. Gaussian process regression5 Bayesian inference5. Bayesian inference
5. science!
![Page 49: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/49.jpg)
Histogram (1-D)Histogram (1 D)
![Page 50: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/50.jpg)
Generalized histogram (1-D)Generalized histogram (1 D)( )∑ <−∝ j hxqIqf )(ˆ ( )∑
jjqqf )(
b d idth hquery point q
bandwidth h
![Page 51: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/51.jpg)
Generalized histogramGeneralized histogram
( )∑ <∝ hxqIqf )(ˆ ( )∑ <−∝j
j hxqIqf )(j
bandwidth h
query point q
kernel function
![Page 52: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/52.jpg)
How can we compute this efficiently?
kd-trees:
p y
kd-trees:most widely-used space-
partitioning tree[Bentley 1975], [Friedman, Bentley &
Finkel 1977] [Moore & Lee 1995]Finkel 1977],[Moore & Lee 1995]
![Page 53: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/53.jpg)
A kd-tree: level 1
![Page 54: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/54.jpg)
A kd-tree: level 2
![Page 55: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/55.jpg)
A kd-tree: level 3
![Page 56: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/56.jpg)
A kd-tree: level 4
![Page 57: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/57.jpg)
A kd-tree: level 5
![Page 58: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/58.jpg)
A kd-tree: level 6
![Page 59: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/59.jpg)
Range-count recursive algorithm
![Page 60: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/60.jpg)
Range-count recursive algorithm
![Page 61: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/61.jpg)
Range-count recursive algorithm
![Page 62: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/62.jpg)
Range-count recursive algorithm
![Page 63: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/63.jpg)
Range-count recursive algorithm
Pruned!(inclusion)(inclusion)
![Page 64: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/64.jpg)
Range-count recursive algorithm
![Page 65: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/65.jpg)
Range-count recursive algorithm
![Page 66: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/66.jpg)
Range-count recursive algorithm
![Page 67: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/67.jpg)
Range-count recursive algorithm
![Page 68: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/68.jpg)
Range-count recursive algorithm
![Page 69: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/69.jpg)
Range-count recursive algorithm
![Page 70: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/70.jpg)
Range-count recursive algorithm
![Page 71: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/71.jpg)
Range-count recursive algorithm
Pruned!(exclusion)(exclusion)
![Page 72: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/72.jpg)
Range-count recursive algorithm
![Page 73: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/73.jpg)
Range-count recursive algorithm
![Page 74: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/74.jpg)
Range-count recursive algorithm
fastestti lpractical
algorithm[Bentley 1975]
our algorithmsalgorithmscan use any tree
![Page 75: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/75.jpg)
OUTLINE1. warm-up: generalized histogram
2. n-point statistics2. n point statistics3. kernel density estimator4. general strategy: multi-tree
1. nonparametric Bayes classifier2 support vector machine2. support vector machine 3. nearest neighbor statistics4. Gaussian process regression5 Bayesian inference5. Bayesian inference
5. science!
![Page 76: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/76.jpg)
Characterization of an entire distribution?
2-point correlation“How many pairs have distance < r ?”
∑∑ <N N
I )(
How many pairs have distance < r ?
∑∑≠
<−i ij
ji rxxI )(≠i ij
r2-point correlationfunction
![Page 77: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/77.jpg)
The n-point correlation functionsThe n point correlation functions• Spatial inferences: filaments, clusters, voids,
homogeneity isotropy 2 sample testinghomogeneity, isotropy, 2-sample testing, …
• Foundation for theory of point processes [Daley Vere-Jones 1972] unifies spatial statistics [Ripley[Daley,Vere Jones 1972], unifies spatial statistics [Ripley 1976]
• Used heavily in biostatistics, cosmology, particle physics, statistical physics
2pcf definition:)](1[21
2 rdVdVdP ξλ +=2pcf definition:
3 f d fi iti)],,()()()(1[ 132312132312321
3 rrrrrrdVdVdVdP ζξξξλ ++++⋅=3pcf definition:
![Page 78: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/78.jpg)
rStandard model: n>0 terms should be zero!
r3
r1
r2
should be zero!
3 point correlation
3
3-point correlation“How many triples have y ppairwise distances < r ?”
)()()( rIrIrIN N N
<<<∑∑ ∑ δδδ )()()( 321 rIrIrI kii ij ijk
jkij <<<∑∑ ∑≠ ≠≠
δδδ
![Page 79: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/79.jpg)
How can we count n-tuples efficiently?p y
“How many triples have pairwise distances < r ?”pairwise distances < r ?”
![Page 80: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/80.jpg)
Use n trees!Use n trees![Gray & Moore, NIPS 2000]
![Page 81: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/81.jpg)
“How many valid triangles a-b-c(where )CcBbAa ∈∈∈ ,,(where )
could there be? CcBbAa ∈∈∈ ,,
r
count{A,B,C} =count{A,B,C}
?A
B
C
![Page 82: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/82.jpg)
“How many valid triangles a-b-c(where )CcBbAa ∈∈∈ ,,(where )
could there be? CcBbAa ∈∈∈ ,,
r
count{A,B,C} =count{A,B,C}
count{A,B,C.left}+
count{A,B,C.right}A
B
C
![Page 83: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/83.jpg)
“How many valid triangles a-b-c(where )CcBbAa ∈∈∈ ,,(where )
could there be? CcBbAa ∈∈∈ ,,
r
count{A,B,C} =count{A,B,C}
count{A,B,C.left}A
B
+count{A,B,C.right}
C
![Page 84: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/84.jpg)
“How many valid triangles a-b-c(where )CcBbAa ∈∈∈ ,,(where )
could there be? CcBbAa ∈∈∈ ,,
rA
B
count{A,B,C} =C
count{A,B,C}
?
![Page 85: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/85.jpg)
“How many valid triangles a-b-c(where )CcBbAa ∈∈∈ ,,(where )
could there be? CcBbAa ∈∈∈ ,,
rA
B
count{A,B,C} =C
count{A,B,C}
0!
Exclusion
![Page 86: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/86.jpg)
“How many valid triangles a-b-c(where )CcBbAa ∈∈∈ ,,(where )
could there be? CcBbAa ∈∈∈ ,,
A Br
count{A,B,C} =C
{ , , }
?
![Page 87: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/87.jpg)
“How many valid triangles a-b-c(where )CcBbAa ∈∈∈ ,,(where )
could there be? CcBbAa ∈∈∈ ,,
A B
Inclusion r
count{A,B,C} =I l iC
{ , , }
|A| x |B| x |C|Inclusion
Inclusion
![Page 88: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/88.jpg)
Key idea( bi i l i i bl )(combinatorial proximity problems):
f t lfor n-tuples: n tree recursionn-tree recursion
![Page 89: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/89.jpg)
Exclusion and inclusionlti l dii i lt lon multiple radii simultaneously
Find the largest radius which gives exclusion: binary search
![Page 90: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/90.jpg)
Exclusion and inclusionlti l dii i lt lon multiple radii simultaneously
r*
Find the largest radius which gives exclusion: binary search
![Page 91: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/91.jpg)
Exclusion and inclusionlti l dii i lt lon multiple radii simultaneously
Recurse on the remaining radii
![Page 92: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/92.jpg)
Key idea( bi i l i i bl )(combinatorial proximity problems):
lti di imulti-radius recursion(two layers of recursion)(two layers of recursion)
![Page 93: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/93.jpg)
n-point correlations:blproblem status
• 50 year old problem [Peebles 1956]• 50-year-old problem [Peebles, 1956]• main proposals:
– FFT [Peebles and Groth 76] (approximate)• must interpolate to equi-spaced grid points• n=2: O(WD log WD), n=3: O(WD (WD log WD))• Case 1: no error bounds• Fourier ringing at edges• Fourier ringing at edges
– counts-in-cells (grid) [Szapudi 97] O(Wn) (approximate)(approximate)
• Case 1: no error bounds
![Page 94: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/94.jpg)
3-point runtime4500
5000Scaling behavior with data size, by tuple order
2−point3−point4−point
3 po t u t e
(biggest previous:
3500
4000
4500 4−point(biggest previous:20K) n=2: O(N)
n=3: O(Nlog3)
2500
3000P
U ti
me
(sec
onds
)
VIRGO simulation data
n=3: O(Nlog3)
n=4: O(N2)
1000
1500
2000CP
Usimulation data,N = 75,000,000
0 1 2 3 4 5 6 7 8 9 100
500
1000
Number of data
naïve: 5x109 sec.(~150 years)
55 x 105Number of datamulti-tree: 55 sec.
(exact)
![Page 95: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/95.jpg)
But…But…Depends on rD-1.p
Slow for large radii.
VIRGO simulation data,N = 75 000 000N = 75,000,000
naïve: ~150 yearsymulti-tree:
large h: 24 hrs
Let’s develop a method for large radii.
![Page 96: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/96.jpg)
c = p Tc p TEASIER?
known.hard.EASIER?
ppzp )ˆ1(ˆˆ 2/−±S
zp 2/± ε
no dependence on N! but it does depend on p
![Page 97: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/97.jpg)
c = p Tc p T
ppzp )ˆ1(ˆˆ 2/−±S
zp 2/± ε
no dependence on N! but it does depend on p
![Page 98: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/98.jpg)
c = p Tc p T
ppzp )ˆ1(ˆˆ 2/−±S
zp 2/± ε
no dependence on N! but it does depend on p
![Page 99: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/99.jpg)
c = p Tc p T
ppzp )ˆ1(ˆˆ 2/−±S
zp 2/± ε
no dependence on N! but it does depend on p
![Page 100: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/100.jpg)
c = p Tc p T
ppzp )ˆ1(ˆˆ 2/−±S
zp 2/± ε
no dependence on N! but it does depend on p
![Page 101: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/101.jpg)
c = p Tc p T
This is junk:don’t bother
![Page 102: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/102.jpg)
c = p Tc p T
This ispromisingp g
![Page 103: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/103.jpg)
Basic idea:
1. Remove some junk(R t l ith f hil )(Run exact algorithm for a while)
make p largerp g
2 Sample from the rest2. Sample from the rest
![Page 104: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/104.jpg)
Get disjoint sets from the recursion tree
all possiblen-tuples⎞⎛N n tuples
⎟⎠
⎞⎜⎝
⎛nN
… … … [prune]
![Page 105: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/105.jpg)
Now do stratified sampling
T TT T1T + + = 3T2T T
T T T+ + = 1
1 p̂TT
22 p̂
TT
33 p̂
TT p̂
+ + =21
21 σ̂⎟⎠⎞
⎜⎝⎛TT 2
2
22 σ̂⎟⎠⎞
⎜⎝⎛
TT 2
3
23 σ̂⎟⎠⎞
⎜⎝⎛TT 2σ̂
⎠⎝T ⎠⎝ T ⎠⎝ T
![Page 106: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/106.jpg)
Speedup ResultsSpeedup esu ts
VIRGO simulation dataVIRGO simulation dataN = 75,000,000
naïve: ~150 yearsmulti-tree:
large h: 24 hrs
multi-tree monte carlo:99% confidence:96 sec96 sec
![Page 107: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/107.jpg)
Key idea( bi i l i i bl )(combinatorial proximity problems):
Multi-tree Monte Carlo
![Page 108: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/108.jpg)
n-point correlation wrapupn point correlation wrapup• Properties:Properties:
– fastest practical exact algorithm for general D– polychromatic, general np y g– extends to: weighted, projected, general constraints– conjecture: O(NlogN)+O(Nlogn) under some conditions– Monte Carlo: complements exact algorithm, error bounds
• Insights: natural generalization of range-counting to t ln-tuples
• Has been used in practice [Scranton et al. 03, Kayo et al. 03, Nichol et al 04]Nichol et al. 04]
• See [Gray & Moore NIPS 00], [Moore et al 00], [Gray & Moore 04].
![Page 109: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/109.jpg)
OUTLINE1. warm-up: generalized histogram
2. n-point statistics2. n point statistics 3. kernel density estimator4. general strategy: multi-tree
1. nonparametric Bayes classifier2 support vector machine2. support vector machine 3. nearest neighbor statistics4. Gaussian process regression5 Bayesian inference5. Bayesian inference
5. science!
![Page 110: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/110.jpg)
Kernel density estimationKernel density estimation
∑N1ˆ ∑≠
−=∀qr
rqhqq xxKN
xfx )(1)(,≠qrN
![Page 111: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/111.jpg)
Kernel density estimationKernel density estimation
ˆ ∞→→ Nxfxf )()(ˆ
• Guaranteed to converge to the true underlying density (consistency)• Nonparametric distribution need only meet some• Nonparametric – distribution need only meet some weak smoothness conditions • Achieves optimal rateAchieves optimal rate • These are true given the optimal bandwidth• Most mathematically studied and widely used general (nonparametric) density estimator
![Page 112: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/112.jpg)
How to use a tree…1 How to approximate?1. How to approximate?
2. When to approximate?[Barnes and Hut, Science, 1987]
q sR
q rRμ
∑ ≈i
RRi qKNxqK ),(),( μ
if θrs >
![Page 113: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/113.jpg)
How to use a tree…3. How to know potential error?Let’s maintain bounds on the true kernel sum
∑≡Φ xqKq )()( ∑≡Φi
ixqKq ),()(R
lolololo KNqKNqq −+Φ←Φ )()()( δlolo NKq ←Φ )(At the beginning:
hiR
hiqRR
hihi
RqRR
KNqKNqq
KNqKNqq
−+Φ←Φ
−+Φ←Φ
),()()(
),()()(
δ
δhihi NKq
NKq←Φ
←Φ
)()(
![Page 114: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/114.jpg)
How to use a tree…4 How to do ‘all’ problem?4. How to do all problem?
∑ −=∀N
rqhqq xxKN
xfx )(1)(ˆ,
Single-tree:
∑≠qr
qqq N
Dual-tree (symmetric): [Gray & Moore 2000]
![Page 115: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/115.jpg)
How to use a tree…4. How to do ‘all’ problem?
rRrQ
sRQ
∑ ≈∈∀i
RRi qKNxqKQq ),(),(, μ
ifθ
),max( RQ rrs >
Generalizes Barnes-Hut to dual-tree
![Page 116: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/116.jpg)
Key idea(k l i bl )(kernel summation problems):
T t k l ti t i fTreat kernel summation as an extension of the basic proximity problems:
dual-tree + simple approximation
+ boundsbou ds
![Page 117: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/117.jpg)
BUT:BUT:θWe have a tweak parameter:
Case 1 – alg. gives no error boundsC 2 l i b d b t t bCase 2 – alg. gives error bounds, but must be rerun Case 3 – alg. automatically achieves error tolerance
So far we have case 2;So far we have case 2; let’s try for case 3
Let’s try to make an automatic stopping rule
![Page 118: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/118.jpg)
Finite-difference function approximation.
))(()()( axafafxf −′+≈Taylor expansion:
⎞⎛
))(()()( axafafxf +≈
Gregory-Newton finite form:
)()()(21)()(
1
1i
ii
iii xx
xxxfxfxfxf −⎟
⎠
⎞⎜⎝
⎛−−+≈ +
2 1 ii xx ⎠⎝ +
)()()(1)()( lolohi
lo KKKK δδδδδδ ⎟⎞
⎜⎛ −+≈ )(
2)()( lohiKK δδ
δδδδ −⎟
⎠⎜⎝ −
+≈
![Page 119: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/119.jpg)
Finite-difference function approximation.
[ ])()(1 hilo KKK δδ +
assumes monotonic decreasing kernel
( ) [ ])()(2
hiQR
loQR
RN
qrq KKNKKerrR
δδδ −≤−=∑
[ ])()(21
QRQR KKK δδ +=
2 QQr
qq ∑ε
φε
φ≤∀⇒≤∀
)(:
)(:,
q
qR
q
qR
xerr
qNN
xerr
Rq
approximate {Q,R} if
)()()( 2 QKK loNhilo Φ≤− εδδ
![Page 120: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/120.jpg)
Key idea(k l i bl )(kernel summation problems):
Automatic error control
![Page 121: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/121.jpg)
Kernel density estimation:blproblem status
• 50 year old problem [Rosenblatt 1953]• 50-year-old problem [Rosenblatt 1953]• main proposals:
– FFT [Silverman 1982, 1-D], [Fan & Marron 1994, multi-D]: designed for signal processingFGT [Greengard & Strain 1991] ‘Improved Fast Gauss– FGT [Greengard & Strain 1991], Improved Fast Gauss Transform’ [Yang & Duraiswami 2003]
( )xqK 1physical simulation
( ) aj
jxq
xqK−
=−D = 1,2,3
![Page 122: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/122.jpg)
Fast Multipole Method[Greengard & Rokhlin 1987]
RQ RQ
O( D) d id b d tO(pD) and grid-based: notintended for high dimensions
FGT: no tree; IFGT O(Dp) and clusters [Yang & Duraiswami 03]
dual-tree: like high-D FMM
![Page 123: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/123.jpg)
colors (N=50k, D=2)co o s ( 50 , )
50% ( )
10% 1% 0%(rel. error)
Exhaustive 329.7 329.7 329.7 sec. 329.7
FFT 0.1 2.9 > 660 -
IFGT 1.7 > 660 > 660 -
Dualtree (Gaussian)
12.2 (65.1*)
18.7 (89.8*)
24.8 (117.2*)
-
Dualtree 6 2 (6 7*) 6 5 (6 7*) 6 7 (6 7*) 58 2Dualtree (Epanech.)
6.2 (6.7*) 6.5 (6.7*) 6.7 (6.7*) 58.2 [111.0]
![Page 124: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/124.jpg)
sj2 (N=50k, D=2)sj ( 50 , )
50%( )
10% 1% 0%(rel. error)
Exhaustive 301.7 301.7 301.7 sec. 301.7
FFT 3.1 > 600 > 600 -
IFGT 12.2 > 600 > 600 -
Dualtree (Gaussian)
2.7 (3.1*) 3.4 (4.8*) 3.8 (5.5*) -
Dualtree 0 8 (0 8*) 0 8 (0 8*) 0 8 (0 8*) 6 5Dualtree (Epanech.)
0.8 (0.8*) 0.8 (0.8*) 0.8 (0.8*) 6.5 [109.2]
![Page 125: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/125.jpg)
bio5 (N=100k, D=5)b o5 ( 00 , 5)
50%( )
10% 1% 0%(rel. error)
Exhaustive 1966.3 1966.3 1966.3 sec
1966.3sec.
FFT > RAM > RAM > RAM -
IFGT > 4000 > 4000 > 4000 -
Dualtree (Gaussian)
72.2 (98.8*)
79.6(111.8*)
87.5(128.7*)
-
Dualtree 27 0 28 4 28 4 408 9Dualtree (Epanech.)
27.0 (28.2*)
28.4 (28.4*)
28.4 (28.4*)
408.9 [1074.9]
![Page 126: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/126.jpg)
corel (N=38k, D=32)co e ( 38 , 3 )
50%( )
10% 1% 0%(rel. error)
Exhaustive 710.2 710.2 710.2 sec. 710.2
FFT > RAM > RAM > RAM -
IFGT > 1400 > 1400 > 1400 -
Dualtree (Gaussian)
155.9(159.7*)
159.9(163*)
162.2(167.6*)
-
Dualtree 10 0 10 1 10 1 261 6Dualtree (Epanech.)
10.0 (10.0*)
10.1 (10.1*)
10.1 (10.1*)
261.6 [558.7]
![Page 127: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/127.jpg)
covtype (N=150k, D=38)co type ( 50 , 38)
50%( )
10% 1% 0%(rel. error)
Exhaustive 13157.1 13157.1 13157.1 sec
13157.1sec.
FFT > RAM > RAM > RAM -
IFGT > 26000 > 26000 > 26000 -
Dualtree (Gaussian)
139.9 (143.6*)
140.4(145.7*)
142.7(148.6*)
-
Dualtree 54 3 56 3 56 4 1572 0Dualtree (Epanech.)
54.3 (54.3*)
56.3 (56.3*)
56.4 (56.4*)
1572.0 [11486.0]
![Page 128: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/128.jpg)
Speedup Results: Large dataset
dual-N naïve tree
12.5K 7 .1225K 31 .3150K 123 .46
100K 494 1.0100K 494 1.0200K 1976* 2400K 7904* 5
One order-of-magnitude speedupi l 2M i
400K 7904 5800K 31616* 101 6M 35 hrs 23 over single-tree at ~2M points1.6M 35 hrs 23
5500x
![Page 129: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/129.jpg)
Kernel density estimation wrapupKernel density estimation wrapup
• Properties:Properties:– fastest practical algorithm for general D– all kernels, weighted, variable-kernelall kernels, weighted, variable kernel– hard bounds, automatic error control– simple, easy to programp , y p g– conjecture: O(NlogN)+O(N)
• Insights: like FMM with adaptive geometry g p g y+ automatic error control
• Has been used in practice [Balogh et al. 02, p [ g ,Miller et al. 03]
• See [Gray & Moore NIPS 00], [Gray & Moore 03]
![Page 130: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/130.jpg)
OUTLINE1. warm-up: generalized histogram
2. n-point statistics2. n point statistics 3. kernel density estimator4. general strategy: multi-tree
1. nonparametric Bayes classifier2 support vector machine2. support vector machine 3. nearest neighbor statistics4. Gaussian process regression5 Bayesian inference5. Bayesian inference
5. science!
![Page 131: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/131.jpg)
( )I δU
( )∑i
qirIq δ:Some important( )qir
i
Iiq δU:q δminarg:
Some important computational problems
qiiq δminarg:
qiiq δminarg:∀ q
( )∑∑∑i j k
kijkijrstI δδδ ,,
( )∑∀i
qirKq δ:i j k
i
( )∑∀i
qiriKwq δ:⎫⎧ ( ) ( )⎭⎬⎫
⎩⎨⎧
∀ ∑∑j
qjri
qir KKq δδ ,max:
![Page 132: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/132.jpg)
( )I δU
( )∑i
qirIq δ: (radial) range count
( )qiri
Iiq δU:q δminarg:
(radial) range search
nearest-neighborqiiq δminarg:
qiiq δminarg:∀
nearest neighbor
• different operators same algq
( )∑∑∑i j k
kijkijrstI δδδ ,,different operators, same alg.
• fastest practical algorithms
( )∑∀i
qirKq δ:i j k
i
( )∑∀i
qiriKwq δ:⎫⎧ ( ) ( )⎭⎬⎫
⎩⎨⎧
∀ ∑∑j
qjri
qir KKq δδ ,max:
![Page 133: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/133.jpg)
( )I δU
( )∑i
qirIq δ:
( )qiri
Iiq δU:q δminarg: qiiq δminarg:
qiiq δminarg:∀ all-nearest-neighborsq
( )∑∑∑i j k
kijkijrstI δδδ ,, • common, e.g. in LLE
( )∑∀i
qirKq δ:i j k
i
( )∑∀i
qiriKwq δ:⎫⎧ ( ) ( )⎭⎬⎫
⎩⎨⎧
∀ ∑∑j
qjri
qir KKq δδ ,max:
![Page 134: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/134.jpg)
All-nearest-neighborsAll nearest neighbors
• natural generalization of nn alg.
f t t ti l l ith• fastest practical algorithm
![Page 135: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/135.jpg)
( )I δU
( )∑i
qirIq δ:
( )qiri
Iiq δU:q δminarg:
• extension to n-tuples
qiiq δminarg:
qiiq δminarg:∀• fastest practical algorithm
q
( )∑∑∑i j k
kijkijrstI δδδ ,, n-point correlations
( )∑∀i
qirKq δ:i j k
i
( )∑∀i
qiriKwq δ:⎫⎧ ( ) ( )⎭⎬⎫
⎩⎨⎧
∀ ∑∑j
qjri
qir KKq δδ ,max:
![Page 136: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/136.jpg)
( )I δU
( )∑i
qirIq δ:
( )qiri
Iiq δU:q δminarg: qiiq δminarg:
qiiq δminarg:∀ • continuous kernel functionq
( )∑∑∑i j k
kijkijrstI δδδ ,, • fastest practical algorithm
( )∑∀i
qirKq δ:i j k
kernel density estimationi
( )∑∀i
qiriKwq δ:⎫⎧ ( ) ( )⎭⎬⎫
⎩⎨⎧
∀ ∑∑j
qjri
qir KKq δδ ,max:
![Page 137: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/137.jpg)
( )I δU
( )∑i
qirIq δ:
( )qiri
Iiq δU:q δminarg: qiiq δminarg:
qiiq δminarg:∀ q
( )∑∑∑i j k
kijkijrstI δδδ ,, • arbitrary scalars
f t t ti l l ith( )∑∀
iqirKq δ:
i j k • fastest practical algorithm
i
( )∑∀i
qiriKwq δ:⎫⎧
Nadaraya-Watson regression
( ) ( )⎭⎬⎫
⎩⎨⎧
∀ ∑∑j
qjri
qir KKq δδ ,max:
![Page 138: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/138.jpg)
Bayesian inferenceBayesian inference
∫∫= dxxfxg
I)()(
∫ dxxfI
)(Adaptive importance sampling
∫dxxqxqxfI )()()(
∫=Adaptive importance sampling [ ] dxxqxf∫ − 2)()( ?
xq )(
Sample from q()
[ ]If )()( 2
[ ]( )[ ]2ˆˆ)ˆ( qqq IEIEIV −=
[ ] dxxq
xIqxfq ∫
−)(
)()(min()
!
Re-estimate q() from samples New computational capabilitiesinspire new methods
![Page 139: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/139.jpg)
( )I δU
( )∑i
qirIq δ:
( )qiri
Iiq δU:q δminarg: KwyK →−1
qiiq δminarg:
qiiq δminarg:∀ • N x N matrix inverse q
( )∑∑∑i j k
kijkijrstI δδδ ,,kernel matrix-vector multiply
• problems sometimes hidden
( )∑∀i
qirKq δ:i j k problems sometimes hidden
• awaiting further testingi
( )∑∀i
qiriKwq δ:⎫⎧
Gaussian process regression
( ) ( )⎭⎬⎫
⎩⎨⎧
∀ ∑∑j
qjri
qir KKq δδ ,max:
![Page 140: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/140.jpg)
( )I δU
( )∑i
qirIq δ:
( )qiri
Iiq δU:q δminarg: qiiq δminarg:
qiiq δminarg:∀ q
( )∑∑∑i j k
kijkijrstI δδδ ,, • decision problem exact alg.using priority queues
( )∑∀i
qirKq δ:i j k g p y q
• fastest practical algorithmi
( )∑∀i
qiriKwq δ:⎫⎧ t i( ) ( )⎭⎬⎫
⎩⎨⎧
∀ ∑∑j
qjri
qir KKq δδ ,max:nonparametric
Bayes classifier
![Page 141: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/141.jpg)
( )I δU
( )∑i
qirIq δ:
( )qiri
Iiq δU:q δminarg: )(sgn: qjiKq δα∑∀
qiiq δminarg:
qiiq δminarg:∀
)(g qji
iq ∑q
( )∑∑∑i j k
kijkijrstI δδδ ,, • only 2-3x speedup over naive
( )∑∀i
qirKq δ:i j k
• failure for this problem
i
( )∑∀i
qiriKwq δ:⎫⎧ t t( ) ( )⎭⎬⎫
⎩⎨⎧
∀ ∑∑j
qjri
qir KKq δδ ,max:support vector
machine
![Page 142: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/142.jpg)
These were examples of…
Generalized N-body problemsGeneralized N-body problems[Gray thesis 2003]
}i{ δ mean shiftAll-NN:2-point: }),(,,{
},min,arg,{wIrΣΣ⋅∀
δδ mean shift
local poly. regressionCoulombic simulationp
3-point:KDE: }}{)({
}),(,,,{})({
KwIR
r
Σ∀ΣΣΣ
δδ
Coulombic simulationSPH fluid dynamicskernel PCA
KDE: }}{;),(,,{ rKr ⋅Σ∀ δGaussian process regression
t i B l if
Isomapprojection pursuit
i i i tnonparametric Bayes classif.radial basis functionsparticle filters
minimum spanning treek-meansHausdorff distanceparticle filters
nonparam. belief propagation…
Hausdorff distance mixture of Gaussians…
![Page 143: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/143.jpg)
These were examples of…
Multi tree methodsMulti-tree methods[Gray thesis 2003]
general dimensiondata structure-agnostic
quite generalsimple, recursive
general tuple orderpolychromatic
error boundsautomatic error control
multiple kernelssubset-decomposable Unifies/extends: FMM,
B H t A l’p
operatorssymmetric monotonic
Barnes-Hut, Appel’s algorithm, WSPD, nearest-neighbor search,
kernel functionsmetric space
spatial join, graphics collision detection
![Page 144: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/144.jpg)
OUTLINE1. warm-up: generalized histogram
2. n-point statistics2. n point statistics 3. kernel density estimator4. general strategy: multi-tree
1. nonparametric Bayes classifier2 support vector machine2. support vector machine 3. nearest neighbor statistics4. Gaussian process regression5 Bayesian inference5. Bayesian inference
5. science!
![Page 145: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/145.jpg)
Science: Map of the quasars, i.e. mass?
z=5
NBC on 500,000 training data, 800,000 test dataz 5
z=0 1z=0.1
Largest quasar catalog to date,deepest mass map of universe.
[Richards, Nichol, Gray, et al., ApJ 2004]
Coming: 1,000,000 quasars
![Page 146: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/146.jpg)
Science: Does the model fit the data?
Same?
3-point on 130 000 galaxies
Same?
3-point on 130,000 galaxies,1.3M randomOngoing: 3-point on VIRGO
Most comprehensive third-orderstatistics on universe to date
Ongoing: 3 point on VIRGO
statistics on universe to date.[Nichol et al., ApJL 2005 in prep.]
![Page 147: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/147.jpg)
Science: Does dark energy exist?
Do we seethe ISWthe ISWEffect?
2 i t 2 000 000 l i d WMAP i l
Physical evidence of
2-point on 2,000,000 galaxies and WMAP pixels
ydark energy.
[Scranton et al., PRL 2005 submitted]
![Page 148: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/148.jpg)
ScienceScience #1 Breakthrough of 2003
Bob Nichol on D id L tt hDavid Letterman showJuly 2003
![Page 149: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/149.jpg)
ScienceScience #1 Breakthrough of 2003
![Page 150: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/150.jpg)
SummarySummary
• Fastest practical algorithms: n point• Fastest practical algorithms: n-point, KDE, all-NN, NBC, more coming…
• Major science results: directly due to faster algorithms; much more coming…faster algorithms; much more coming…
• General principles: generalized N-body bl lti t th dproblems multi-tree methods
![Page 151: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/151.jpg)
ENDEND
![Page 152: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/152.jpg)
Machine learningin generalin general
data++
model/task+
objective function
learning algorithm
scalable learning algorithm
code
![Page 153: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/153.jpg)
Future steps… 0
2]
data+
non-vector objects!e.g. proteins, spatio-temporal, relations
et a
l. N
IPS
+model/task
+learning deduction, action!
e.g. reinforcement learning, ILP![G
ray
e
objective function generalize maximum likelihood!
mat
e!
learning algorithm generalize EM!
auto
scalable learning algorithm new N-body & Monte Carlo methods!
code
![Page 154: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/154.jpg)
Machine learningin generalin general
data++
model/task+
objective function
learning algorithm
scalable learning algorithm
code
![Page 155: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/155.jpg)
AutoBayes (Prolog system)[Buntine 95], [Gray, Fischer, Schumann, Buntine NIPS 02][Buntine 95], [Gray, Fischer, Schumann, Buntine NIPS 02]
data+
face-data.txt++
model/task+
+mixture of Gaussians / clustering
+objective function maximum likelihood
learning algorithmEM-mog { ....}
scalable learning algorithm nbody-EM-mog { ....}
code nbody_EM_mog.c
![Page 156: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/156.jpg)
Future steps…data
+
non-vector objects!e.g. proteins, spatio-temporal, relations
+model/task
+learning deduction, action!
e.g. ILP, reinforcement learning
objective function generalize maximum likelihood!
learning algorithm generalize EM!
scalable learning algorithm new N-body & Monte Carlo methods!
code deductive code optimization!
![Page 157: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/157.jpg)
SummarySummary• Fastest practical algorithms: n-point KDE all-NNFastest practical algorithms: n point, KDE, all NN,
NBC, more coming…
• Major science results: directly due to faster jalgorithms; NVO, parallel; much more coming…
• General principles: generalized N-body problems l i h dmulti-tree methods
• Next: computational principles: formalize/extend framework; distribution– computational principles: formalize/extend framework; distribution-sensitive analysis
– statistical principles: robust learning theory, active learning
M d• My dream: automated application of principles automatic data analysis (AI) [Gray, Fischer, Schumann, Buntine 02]
![Page 158: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/158.jpg)
Speedup Results: DimensionalitySpeedup esu ts e s o a ty
N Epan Gauss12.5K .12 .32
25K 31 70
N Epan. Gauss.
25K .31 .7050K .46 1.1
100K 1 0 2100K 1.0 2200K 2 5400K 5 11400K 5 11800K 10 221.6M 23 51
![Page 159: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/159.jpg)
Observation: there’s a patternp[Gray and Moore 00]
k l d it ti t ( )∑∀ xqKq• kernel density estimator• n-point statistics
nonparametric Bayes classifier
( )∑ −∀j
jxqKq,
)()()( 321 rIrIrI ki
N
i
N
ij
N
ijkjkij <<<∑∑ ∑
≠ ≠≠
δδδ( ) ( )⎬⎫⎨
⎧∀ ∑∑ xqKxqKq max• nonparametric Bayes classifier
• support vector machine • nearest neighbor statistics j
k xqq −∀ minarg
( ) ( )⎭⎬
⎩⎨ −−∀ ∑∑
jj
ii xqKxqKq ,max,
( ) ( )⎭⎬⎫
⎩⎨⎧
−−∀ ∑∑j
ji
i xqKxqKq ,max,
• nearest neighbor statistics• Gaussian process regression• Bayesian inference
jj xqq −∀ minarg,
∫ dxxpxf )()(xK 1−
• Bayesian inference
generalized N-body problems multi-tree methods
∫ dxxpxf )()(
min
generalized N body problems multi tree methods
∑ 1−A ∫
![Page 160: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/160.jpg)
Science: Spiral/elliptical galaxies - WHY?
KDE on 100,000 galaxiesspiralspiral
ellipticalelliptical
First large-scale evidenceexplaining elliptical galaxiesexplaining elliptical galaxies.
[Balogh et al., MNRAS 2004]
![Page 161: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/161.jpg)
Science: Can we ‘see’ general relativity?
more fluxmore areaarea
13.5M galaxies, 195,000 quasarsg , , q
First observation of generalrelativity of this kindrelativity of this kind.
[Scranton, Nichol, Connolly, et al. in prep.]
![Page 162: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/162.jpg)
ExperimentsExperiments• Optimal bandwidth h* found by LSCVp y• Error relative to truth: maxerr=max |est – true| / true• Only require that 95% of points meet this tolerancey q p• Note that these are small datasets for manageability• Tweak parameters
– FFT tweak parameter M: M=16, double until error satisfied– IFGT tweak parameters K, ry, p: 1) ry=2.5, K=√N 2)
K=10√N ry=16 and doubled until error satisfied; hand-tuneK=10√N, ry=16 and doubled until error satisfied; hand-tune p for dataset: {8,8,5,3,2}
– Dualtree tweak parameter tau: tau=maxerr, double until error satisfiederror satisfied
– Dualtree auto: just give it maxerr
![Page 163: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/163.jpg)
ObservationsObservations
• FGT can’t use tree; FMM doesn’t apply hereFGT can t use tree; FMM doesn t apply here • like FMM on adaptive trees (general D):
– conjecture: O(NlogN)+O(N)conjecture: O(NlogN)+O(N)– works for all density estimation kernels– case 3 error controlcase 3 error control– simple, easy to program– cf. Appel’s algorithm (1981)pp g ( )
• we trade off continous sophistication for discrete sophisticationp
let’s compare…
![Page 164: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/164.jpg)
These were examples of…
Generalized N body problemsGeneralized N-body problemsAll-NN: },min,arg,{ δ ⋅∀All NN:
2-point: }),(,,{},,a g,{
wIr δδ
ΣΣ
3-point:
KDE: }}{;)({}),(,,,{
rKwIR
δδ⋅Σ∀
ΣΣΣ
KDE:
SPH: };),(,,{}}{;),(,,{twKrK
r
r
δδ
Σ∀⋅Σ∀
Multi-tree methods:General algorithmic framework and toolkit forGeneral algorithmic framework and toolkit for
such problems
![Page 165: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/165.jpg)
Ball-trees
O l ithOur algorithms can useany of these data structures
•Auton ball-trees III [Omohundro 91],[Uhlmann 91], [Moore 99]•Cover-trees [Alina [B.,Kakade,Langford 04]•Crust-trees [Yianilos 95],[Gray,Lee,Rotella,Moore 2005]
![Page 166: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/166.jpg)
Basic proximity problemsBasic proximity problems
• nearest-neighbor search jjk xq −minargnearest neighbor search
( di l) h
jj xqminarg
( )rxqIx <U• (radial) range search ( )rxqIx jj
j <−U
• (radial) range count ( )∑ <−j
j rxqI
![Page 167: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/167.jpg)
Exclusion and inclusionExclusion and inclusion,on multiple radii simultaneously.
min||x-xi|| < r1 => min||x-xi|| < r2
Use binary search to locate critical radius: O(logB)
![Page 168: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/168.jpg)
1. Nonparametric Bayes classifier1. Nonparametric Bayes classifier
Quasar density
Star density
Quasar density
sity
x)de
ns f(x
x
Optimal decision boundary
)|(ˆ)( CxfCP)|(ˆ)()|(ˆ)(
)|()()|(
2211
111 CxfCPCxfCP
CxfCPxCP
qq +=
![Page 169: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/169.jpg)
1. Nonparametric Bayes classifier1. Nonparametric Bayes classifierkernel sum decision problem
( ) ( ){ }( ) ( ) ( ) ( )∑∑ −=Φ−=Φ
jj
ii xqKqxqKq 21 ,
( ) ( ){ }qqq 21 ,max, ΦΦ∀
( )qhi2Φ( )qhiΦ ( )q2Φ
( )qloΦ
( )q1Φ
( )qlo1Φ
( )q2Φ
![Page 170: Computational Methods for Large-Scale Data Analysis · Hierarchical clustering O(N2) O(N3) O(N3) R. Scranton, U. Pitt Physics M. Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii](https://reader035.fdocuments.us/reader035/viewer/2022062317/5edc9d88ad6a402d66675b3d/html5/thumbnails/170.jpg)
1. Nonparametric Bayes classifierkernel sum decision problem
1. Nonparametric Bayes classifier
( ) ( ){ }( ) ( ) ( ) ( )∑∑ −=Φ−=Φ
jj
ii xqKqxqKq 21 ,
( ) ( ){ }qqq 21 ,max, ΦΦ∀
( )qhi2Φ( )qhiΦ ( )q2Φ
( )qloΦ
( )q1ΦExact
( )qlo1Φ
( )q2Φ