CAQR ( Demmel et al, 2008)

1
CAQR (Demmel et al, 2008) MapReduce-enabled Model Reduction for Large Scale Simulation Data Sandia National Laboratories is a multi-program laboratory operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin company, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Motivation • Given input parameters, a physical simulation approximates a space/time dependent solution. • Each solution is computationally expensive, so exhaustive parameter studies (e.g., UQ/SA/optimization) are infeasible. • Cheaper reduced order models enable guesstimates for parameter inquiries. soluti on space time input parameters fixed space/time discretization experimental design truncati on amplitud es space-time basis vectors unknown parameter basis functions • The left singular vectors are the space- time basis. • The singular values give the amplitudes. • The truncation is determined by examining the decay of the singular values. • Treat each right singular vector as samples from the unknown basis functions. Approximation Model Tuning the Approximation SVD Interpolating the Parameter Basis • Construct parameter basis functions by interpolating the samples. • Can use any appropriate interpolation method (e.g., polynomials, radial basis functions, Gaussian process models, piecewise linear, splines, … ) A B Which section would you rather try to interpolate, A or B? Ordered by Increasing Oscillations • From the SVD, it is natural to expect that the parameter bases will become more oscillatory as the index increases. • Therefore, we expect the gradient between sample points to increase as the index increases. “PREDICTABLE” “UNPREDICTABLE” “predictable” “unpredictable” Space-time prediction variance: The truncation is the largest such that For and Heat Conduction Example Paul G. Constantine Stanford University David F. Gleich Purdue University Joe Ruthruff Sandia National Labs Jeremy Templeton Sandia National Labs SVD in MapReduce http://github.com /dgleich/mrtsqr Simulation Informatics! SUPERCOMPUTER DATA CENTER ENGINEER Each HPC simulation generates gigabytes of data. A data cluster can hold hundreds or thousands of simulations … … enabling engineers to query simulation data for statistical studies and uncertainty quantification. • We implement the SVD via a communication- avoiding QR factorization for tall, skinny matrices in MapReduce. Constantine and Gleich, Tall and Skinny QR Factorizations in MapReduce Architectures, 2011 • 80k nodes, 300 time steps • 104 basis runs • SVD of 24m x 104 data matrix (18G) • 500x reduction in wall clock time Truth ROM Variance

description

MapReduce -enabled Model Reduction for Large Scale Simulation Data. SUPERCOMPUTER. DATA CENTER. ENGINEER. Joe Ruthruff Sandia National Labs. Jeremy Templeton Sandia National Labs. Paul G. Constantine Stanford University. David F. Gleich Purdue University. - PowerPoint PPT Presentation

Transcript of CAQR ( Demmel et al, 2008)

Page 1: CAQR ( Demmel  et al, 2008)

CAQR (Demmel et al, 2008)

MapReduce-enabled Model Reduction for Large Scale Simulation Data

Sandia National Laboratories is a multi-program laboratory operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin company, for the U.S. Department of Energy’s National Nuclear

Security Administration under contract DE-AC04-94AL85000.

Motivation

• Given input parameters, a physical simulation approximates a space/time dependent solution.• Each solution is computationally expensive, so exhaustive parameter studies (e.g., UQ/SA/optimization) are infeasible.• Cheaper reduced order models enable guesstimates for parameter inquiries.

solution

space time input parameters

fixed space/time discretization

experimental design

truncation amplitude

s

space-time basis

vectors

unknown parameter basis

functions

• The left singular vectors are the space-time basis.• The singular values give the amplitudes.• The truncation is determined by examining the decay of the singular values. • Treat each right singular vector as samples from the unknown basis functions.

Approximation Model

Tuning the Approximation

SVD

Interpolating the Parameter Basis

• Construct parameter basis functions by interpolating the samples. • Can use any appropriate interpolation method (e.g., polynomials, radial basis functions, Gaussian process models, piecewise linear, splines, … )

A B

Which section would you rather try to interpolate,

A or B?

Ordered by Increasing Oscillations• From the SVD, it is natural to expect that the parameter bases will become more oscillatory as the index increases. • Therefore, we expect the gradient between sample points to increase as the index increases.

“PREDICTABLE” “UNPREDICTABLE”

“predictable” “unpredictable”

Space-time prediction variance:

The truncation is the largest such that

For and

Heat Conduction Example

Paul G. ConstantineStanford University

David F. GleichPurdue University

Joe RuthruffSandia National Labs

Jeremy TempletonSandia National Labs

SVD in MapReduce

http://github.com/dgleich/mrtsqr

Simulation Informatics!SUPERCOMPUTER DATA CENTER ENGINEER

Each HPC simulation generates gigabytes of data.

A data cluster canhold hundreds or thousands of simulations …

… enabling engineers to query simulation data for statistical studies and uncertainty quantification.

• We implement the SVD via a communication-avoiding QR factorization for tall, skinny matrices in MapReduce.

Constantine and Gleich, Tall and Skinny QR Factorizations in MapReduce Architectures, 2011

• 80k nodes, 300 time steps• 104 basis runs

• SVD of 24m x 104 data matrix (18G)• 500x reduction in wall clock time

Truth

ROM

Variance