Simulation Informatics; Analyzing Large Scientific Datasets
-
Upload
david-gleich -
Category
Technology
-
view
1.027 -
download
1
description
Transcript of Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics!Analyzing Large Datasets from Scientific Simulations
DAVID F. GLEICH !PURDUE UNIVERSITY
COMPUTER SCIENCE !DEPARTMENT
PAUL G. CONSTANTINE !STANFORD UNIVERSITY
JOE RUTHRUFF !& JEREMY TEMPLETON !SANDIA NATIONAL LABS
CS&E Seminar David Gleich · Purdue 1
This talk is a story …
CS&E Seminar David Gleich · Purdue 2
How I learned to stop worrying and love the simulation!
CS&E Seminar David Gleich · Purdue 3
I asked …!Can we do UQ on PageRank?
CS&E Seminar David Gleich · Purdue 4
PageRank by Google
1
2
3
4
5
6
The Model1. follow edges uniformly with
probability �, and2. randomly jump with probability
1� �, we’ll assume everywhere isequally likely
The places we find thesurfer most often are im-portant pages.
David F. Gleich (Sandia) PageRank intro Purdue 5 / 36
PageRank by Google
1
2
3
4
5
6
The Model1. follow edges uniformly with
probability �, and2. randomly jump with probability
1� �, we’ll assume everywhere isequally likely
The places we find thesurfer most often are im-portant pages.
David F. Gleich (Sandia) PageRank intro Purdue 5 / 36
Google’s PageRank
CS&E Seminar David Gleich · Purdue 5
Random alpha PageRankRAPr�
Model PageRank as the random variables
x(A)
and look atE [x(A)] and Std [x(A)] .
Note � “Wrapper” not “rapper”Gleich and Constantine, Workshop on Algorithms on the Web Graph, 2007; Gleich and Constantine, submitted.David F. Gleich (Sandia) Random sensitivity Purdue 16 / 36
Random alpha PageRankRAPr�
Model PageRank as the random variables
x(A)
and look atE [x(A)] and Std [x(A)] .
Note � “Wrapper” not “rapper”Gleich and Constantine, Workshop on Algorithms on the Web Graph, 2007; Gleich and Constantine, submitted.David F. Gleich (Sandia) Random sensitivity Purdue 16 / 36
Random alpha PageRankRAPr�
Model PageRank as the random variables
x(A)
and look atE [x(A)] and Std [x(A)] .
Note � “Wrapper” not “rapper”Gleich and Constantine, Workshop on Algorithms on the Web Graph, 2007; Gleich and Constantine, submitted.David F. Gleich (Sandia) Random sensitivity Purdue 16 / 36
Random alpha PageRankRAPr�
Model PageRank as the random variables
x(A)
and look atE [x(A)] and Std [x(A)] .
Note � “Wrapper” not “rapper”Gleich and Constantine, Workshop on Algorithms on the Web Graph, 2007; Gleich and Constantine, submitted.David F. Gleich (Sandia) Random sensitivity Purdue 16 / 36
Explored in Constantine and Gleich, WAW2007; and "Constantine and Gleich, J. Internet Mathematics 2011.
Random alpha PageRank or PageRank meets UQ
Which sensitivity?
(�� �P)x = (1� �)v
Sensitivity to the links : examined and understood
Sensitivity to the jump : examined, understood, and useful
Sensitivity to � : less well understood
David F. Gleich (Sandia) Sensitivity Purdue 10 / 36
CS&E Seminar David Gleich · Purdue 6
Convergence theory
Method Conv. Work Required What is N?
Monte Carlo 1pN
N PageRank systems number ofsamples from A
Path Damping(withoutStd [x(A)])
rN+2
N1+�N+ 1 matrix vectorproducts
terms ofNeumann series
GaussianQuadrature r2N N PageRank systems
number ofquadraturepoints
� and r are parameters from Bet�(�, b, �, r)
David F. Gleich (Sandia) Random sensitivity Purdue 27 / 36
Random alpha PageRank has a rigorous convergence theory.
CS&E Seminar David Gleich · Purdue 7
Working with PageRank showed us how to treat UQ more generally …
CS&E Seminar David Gleich · Purdue 8
Constantine, Gleich, and Iaccarino. Spectral Methods for Parameterized Matrix Equations, SIMAX, 2010.
Constantine, Gleich, and Iaccarino. A factorization of the spectral Galerkin system for parameterized matrix equations: derivation and applications, SISC 2011.
A(s)x(s) = b(s), A(J1)x(J1) = b(J1)) A(JN )x(JN ) = b(JN ) or) AN (J1)xN (J1) = bN (J1)
How to compute the Galerkin solution in a weakly intrusive manner.!
A(s)x(s) = b(s)
We studied parameterized matrices.
Discretized PDE with explicit parameters
Parameterized Solution
CS&E Seminar David Gleich · Purdue 9
Simulation!The Third Pillar of Science 21st Century Science in a nutshell!
Experiments are not practical or feasible. Simulate things instead.
But do we trust the simulations?! We’re trying!
Model Fidelity Verification & Validation (V&V) Uncertainty Quantification (UQ)
CS&E Seminar David Gleich · Purdue 10
The message Insight and confidence requires multiple runs.
CS&E Seminar David Gleich · Purdue 11
The problem A simulation run ain’t cheap!
CS&E Seminar David Gleich · Purdue 12
Another problem It’s very hard to “modify” current codes.
CS&E Seminar David Gleich · Purdue 13
Large scale nonlinear, time dependent heat transfer problem
105 nodes 103 time steps 30 minutes on 16 cores
Questions What is the probability of failure? Which input values cause failure?
CS&E Seminar David Gleich · Purdue 14
It’s time to ask "What can science learn from Google?""- Wired Magazine (2008)
CS&E Seminar David Gleich · Purdue 15
21.1st Century Science �in a nutshell?
Simulations are "too expensive. Let data provide a surrogate.
We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. - Wired (again)
CS&E Seminar David Gleich · Purdue 16/1
8
Our approach!Construct an interpolating reduced order model from a budget-constrained ensemble of runs for uncertainty and optimization studies.
CS&E Seminar David Gleich · Purdue 17
That is, we store the runs Supercomputer Data computing cluster Engineer
Each multi-day HPC simulation generates gigabytes of data.
A data cluster can hold hundreds or thousands of old simulations …
… enabling engineers to query and analyze months of simulation data for statistical studies and uncertainty quantification.
and build the interpolant from the pre-computed data.
CS&E Seminar David Gleich · Purdue 18
Input "Parameters
Time history"of simulation
s f
The Database
s1 -> f1 s2 -> f2
sk -> fk
f(s) =
2
66666666666664
q(x1, t1, s)...
q(xn
, t1, s)q(x1, t2, s)
...q(x
n
, t2, s)...
q(xn
, t
k
, s)
3
77777777777775
A single simulation at one time step
X =⇥f(s1) f(s2) ... f(sp)
⇤
The database as a matrix
The
simula
tion
as a
vec
tor
CS&E Seminar David Gleich · Purdue 19
The interpolant
Motivation!Let the data give you the basis. Then find the right combination
X =⇥f(s1) f(s2) ... f(sp)
⇤
f(s) ⇡rX
j=1
uj↵j (s)
This idea was inspired by the success of other reduced order models like POD; and Paul’s residual minimizing idea.
These are the left singular vectors from X!
CS&E Seminar David Gleich · Purdue 20
Why the SVD?!Let’s study a simple case.
X =
2
66664
g(x1, s1) g(x1, s2) · · · g(x1, s
p
)
g(x2, s1). . .
. . ....
.... . .
. . .g(x
m�1, s
p
)g(x
m
, s1) · · · g(xm
, s
p�1) g(xm
, s
p
).
3
77775
= U⌃VT ,
g(xi
, s
j
) =rX
`=1
U
i ,`�`Vj ,` =
rX
`=1
u`(xi
)�`v`(sj
)
treat each right singular vector as samples of the unknown basis functions
split x and s
g(xi
, s) =rX
`=1
u`(xi
)�`v`(s) v`(s) ⇡pX
j=1
v`(sj
)�(`)j
(s)
Interpolate v any way you wish
a general parameter
CS&E Seminar David Gleich · Purdue 21
Method summary
Compute SVD of X!Compute interpolant of right singular vectors Approximate a new value of f(s)!
CS&E Seminar David Gleich · Purdue 22
A B
A quiz!Which section would you rather try and interpolate, A or B?
CS&E Seminar David Gleich · Purdue 23
How predictable is a !singular vector? Folk Theorem (O’Leary 2011) The singular vectors of a matrix of “smooth” data become more oscillatory as the index increases. Implication!The gradient of the singular vectors increases as the index increases.
v1(s), v2(s), ... , vt (s) vt+1(s), ... , vr (s)Predictable Unpredictable
CS&E Seminar David Gleich · Purdue 24
A refined method with !an error model
Predictable Unpredictable ⌘j ⇠ N(0, 1)
Variance[f] = diag
0
@rX
j=t(s)+1
�jujuTj
1
A
Don’t even try to interpolate the predictable modes.
f(s) ⇡t(s)X
j=1
uj↵j (s) +rX
j=t(s)+1
uj�j⌘j
But now, how to choose t(s)? CS&E Seminar David Gleich · Purdue 25
Our current approach to choosing the predictability
1�1
⌧X
i=1
�i
����@v
i
@s
���� < threshold
t(s) is the largest 𝜏 such that
CS&E Seminar David Gleich · Purdue 26
An experimental test case
A heat equation problem Two parameters that control the material properties
CS&E Seminar David Gleich · Purdue 27
Experiments
CS&E Seminar David Gleich · Purdue
20 point, Latin hypercube sample
28
Our Reduced Order Model
The
Trut
h
Where the error is the worst
CS&E Seminar David Gleich · Purdue 29
A Large Scale Example
Nonlinear heat transfer model 80k nodes, 300 time-steps 104 basis runs SVD of 24m x 104 data matrix 500x reduction in wall clock time (100x including the SVD)
CS&E Seminar David Gleich · Purdue 30
Tall-and-skinny QR (and SVD)!on MapReduce
PART 2 !
CS&E Seminar David Gleich · Purdue 31
Quick review of QR QR Factorization
David Gleich (Sandia)
Using QR for regression
is given by the solution of
QR is block normalization“normalize” a vector usually generalizes to computing in the QR
A Q
Let , real
is orthogonal ( )
is upper triangular.
0
R
=
4/22MapReduce 2011 CS&E Seminar David Gleich · Purdue 32
Intro to MapReduce Originated at Google for indexing web pages and computing PageRank.
The idea Bring the computations to the data.
Express algorithms in "data-local operations. Implement one type of communication: shuffle. Shuffle moves all data with the same key to the same reducer.
MM R
RMM
Input stored in triplicate
Map output"persisted to disk"before shuffle
Reduce input/"output on disk
1 MM R
RMMM
Maps Reduce
Shuffle
2
3
4
5
1 2 M M
3 4 M M
5 M
Data scalable
Fault-tolerance by design
CS&E Seminar David Gleich · Purdue 33
Mesh point variance in MapReduce Run 1 Run 2 Run 3
T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3
CS&E Seminar David Gleich · Purdue 34
Mesh point variance in MapReduce
M M M
R R
1. Each mapper out-puts the mesh points with the same key.
2. Shuffle moves all values from the same mesh point to the same reducer.
Run 1 Run 2 Run 3
3. Reducers just compute a numerical variance.
Bring the computations to the data!
T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3
CS&E Seminar David Gleich · Purdue 35
Communication avoiding QR (Demmel et al. 2008) Communication avoiding TSQR
Demmel et al. 2008. Communicating avoiding parallel and sequential QR.
First, do QR factorizationsof each local matrix
Second, compute a QR factorization of the new “R”
David Gleich (Sandia) 6/22MapReduce 2011CS&E Seminar David Gleich · Purdue 36
Serial QR factorizations!(Demmel et al. 2008) Fully serial TSQR
Demmel et al. 2008. Communicating avoiding parallel and sequential QR.
Compute QR of , read , update QR, …
David Gleich (Sandia) 8/22MapReduce 2011
CS&E Seminar David Gleich · Purdue 37
Tall-and-skinny matrix storage in MapReduce MapReduce matrix storage
Key is an arbitrary row-idValue is the array for
a row.
Each submatrix is an input split.
A1
A2
A3
A4
David Gleich (Sandia) 10/22MapReduce 2011
CS&E Seminar David Gleich · Purdue 38
A1
A2
A3
A1
A2qr
Q2 R2
A3qr
Q3 R3
A4qr Q4A4
R4
emit
A5
A6
A7
A5
A6qr
Q6 R6
A7qr
Q7 R7
A8qr Q8A8
R8
emit
Mapper 1Serial TSQR
R4
R8
Mapper 2Serial TSQR
R4
R8
qr Q emitRReducer 1Serial TSQR
AlgorithmData Rows of a matrix
Map QR factorization of rowsReduce QR factorization of rows
CS&E Seminar David Gleich · Purdue 39
Key Limitations
Computes only R and not Q Can get Q via Q = AR+ with another MR iteration. " (we currently use this for computing the SVD) Dubious numerical stability; iterative refinement helps. Working on better ways to compute Q "(with Austin Benson, Jim Demmel)
CS&E Seminar David Gleich · Purdue 40
Full code in hadoopy In hadoopyimport random, numpy, hadoopyclass SerialTSQR:def __init__(self,blocksize,isreducer):self.bsize=blocksizeself.data = []if isreducer: self.__call__ = self.reducerelse: self.__call__ = self.mapper
def compress(self):R = numpy.linalg.qr(
numpy.array(self.data),'r')# reset data and re-initialize to Rself.data = []for row in R:self.data.append([float(v) for v in row])
def collect(self,key,value):self.data.append(value)if len(self.data)>self.bsize*len(self.data[0]):self.compress()
def close(self):self.compress()for row in self.data:key = random.randint(0,2000000000)yield key, row
def mapper(self,key,value):self.collect(key,value)
def reducer(self,key,values):for value in values: self.mapper(key,value)
if __name__=='__main__':mapper = SerialTSQR(blocksize=3,isreducer=False)reducer = SerialTSQR(blocksize=3,isreducer=True)hadoopy.run(mapper, reducer)
David Gleich (Sandia) 13/22MapReduce 2011CS&E Seminar David Gleich · Purdue 41
Lots of data? Add an iteration.
S(1)
A
A1
A2
A3
A3
R1map
Mapper 1-1Serial TSQR
A2
emitR2map
Mapper 1-2Serial TSQR
A3
emitR3map
Mapper 1-3Serial TSQR
A4
emitR4map
Mapper 1-4Serial TSQR
shuffle
S1
A2
reduce
Reducer 1-1Serial TSQR
S2
R2,2reduce
Reducer 1-2Serial TSQR
R2,1emit
emit
emit
shuffle
A2S3
R2,3reduce
Reducer 1-3Serial TSQR
emit
Iteration 1 Iteration 2
identity map
A2S(2)
Rreduce
Reducer 2-1Serial TSQR
emit
Too many maps? Add an iteration!
David Gleich (Sandia) 14/22MapReduce 2011
CS&E Seminar David Gleich · Purdue 42
Summary of parameters mrtsqr – summary of parametersBlocksize How many rows to
read before computing a QR factorization, expressed as a multiple of the number of columns (See paper)
Splitsize The size of each local matrix
Reduction treeThe number of reducers and iterations to use
David Gleich (Sandia)
A1
A2
A1
A2qr
Q2
A1
R1map
Mapper 1-1Serial TSQR
emit
S(2)S(1)
A
shuffle
Iteration 1
(Red)
(Red)
S(2)
(Red)
Iter 2 Iter 315/22MapReduce 2011 CS&E Seminar David Gleich · Purdue 43
Varying splitsize and the tree Varying splitsize Synthetic DataCols. Iters. Split
(MB)Maps Secs.
50 1 64 8000 388
– – 256 2000 184
– – 512 1000 149
– 2 64 8000 425
– – 256 2000 220
– – 512 1000 191
1000 1 512 1000 666
– 2 64 6000 590
– – 256 2000 432
– – 512 1000 337
Increasing split size improves performance(accounts for Hadoopdata movement)
Increasing iterations helps for problems with many columns.
(1000 columns with 64-MB split size overloaded the single reducer.)
David Gleich (Sandia) 17/22MapReduce 2011
CS&E Seminar David Gleich · Purdue 44
MapReduce TSQR summary MapReduce is great for TSQR!Data A tall and skinny (TS) matrix by rows
Map QR factorization of local rowsReduce QR factorization of local rows
Input 500,000,000-by-100 matrixEach record 1-by-100 rowHDFS Size 423.3 GBTime to compute (the norm of each column) 161 sec.Time to compute in qr( ) 387 sec.
On a 64-node Hadoop cluster with 4x2TB, one Core i7-920, 12GB RAM/node
Demmel et al. showed that this construction works to compute a QR factorization with minimal communication
David Gleich (Sandia) 2/22MapReduce 2011
CS&E Seminar David Gleich · Purdue 45
Our vision!To enable analysts and engineers to hypothesize from "data computations instead of expensive HPC computations.
Paul G. Constantine "
Sandia!Jeremy Templeton
Joe Ruthruff
… and you ? …
CS&E Seminar David Gleich · Purdue 46