Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
2
Transcript of Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence...
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive
Computational TurbulenceKalin Kanov
Department of Computer Science Johns Hopkins University
Streaming Evaluation Method
• Linear data requirements of the computation allow for:– Incremental evaluation– Streaming over the data– Concurrent evaluation of batch queries
Motivation
• Heavy DB usage slows down the service by a factor of 10 to 20
• Query evaluation techniques adapted from simulation code do not access data coherently
• Substantial storage overhead incurred to localize each computation
• 95% of queries perform Lagrange Polynomial interpolation
Turbulence Database Cluster
MHD Database
• Stores velocity, magnetic field, magnetic vector potential and pressure fields– 10 attributes, 4 bytes each– 1024 time-steps over a 10243 grid– 40TB total size
• In order to reduce total amount of I/O:– Smaller atoms (43 voxel)– No replication
Lagrange Polynomial Interpolation
f (x',y ') lypN
2 j
j1
N
(y') lxnN
2i
i1
N
(x')f (xnN
2i,y
pN
2 j)
Lagrange coefficientsData
Processing a Batch Query
Additional Optimizations
• Process the computation of values that are stored together concurrently
• Iterate in the appropriate order• Compute the Lagrange coefficients with the
procedures described by Purser and Leslie*
*R. J. Purser and L. M. Leslie. An Efficient Interpolation Procedure for High-Order Three-Dimensional Semi-Lagrangian Models. Monthly Weather Review, 119:2492–+, 1991.
Experimental Evaluation
• Random workloads:– across the entire cube space – a 1283 subset of the entire space
• Workload derived from the usage log of the Turbulence Database cluster
• Compare with:– Direct methods of evaluation
Setup
• Experimental version of the MHD database– ~300 timesteps of the velocity fields of the MHD
DNS– Two 2.33 GHz dual quad-core Windows 2003
servers with SQL Server 2008 and 8GB of memory– Data tables striped across 7 disks
Questions/Comments