Grid Computing – Issues in Data grids and Solutions Sudhindra Rao.
-
date post
20-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Grid Computing – Issues in Data grids and Solutions Sudhindra Rao.
Grid Computing – Issues in Data grids
and Solutions
Sudhindra Rao
Grid Computing OSCAR Lab 2
Outline
Grid Computing – introduction Computational Grids Data Grids Data Management Related Work Technologies – JavaSpaces, OceanStore Our research plan Discussion
Grid Computing OSCAR Lab 3
What is grid computing?
Use a network of PCs Faster networks, cheaper PCs, lot of idle time Easy to build, maintain, scale Generic solution for scientific and business
problems alike Some form of grid computing - SETI@Home,
Argonne National Lab, Google etc.
Grid Computing OSCAR Lab 4
Capabilities
Security
Manageablity
Agility
Goals
Efficiency
Profitability Control
Uncertainty
Complexity
Distribution
New Opportunities
Wo
rld E
ven
tsM
arke
t Dyn
am
ics
Grid Computing
Maturing Technology
Why today?
Grid Computing OSCAR Lab 5
Compute-intensiveanalytics
OLAP dataanalysis
DataCenteroperations
ComputeUtilityservices
•Value at risk•Credit risk•Real-time risk management•Automated trade programs
•Anti-money laundering•Credit card (risk and customerData mining)•Billing
•In-process system migration•High fault tolerance•Geographic data center independence for failover and business applications
•Data center compute farms•Corporate compute utilityservices creating a low-cost infrastructure similar to the electric grid
Applications – data grids
Geographic distribution of data Computations on large scale data
Grid Computing OSCAR Lab 6
Distributed Computing Evolution
File sharing CORBA Data translation
Data queues Publish/Subscribe Smart routing
Pipes/sockets Clusters Data grids Utility service
Middleware
Client/ServerGrid Computing
Evolution of distributed computing
Grid Computing OSCAR Lab 7
Compute grid
Distributed pool of resources Completing a task for a user User requests and reserves resources Some kind of middleware manages
resources and tasks Resilient and fault tolerant
Grid Computing OSCAR Lab 8
Data grid
Client
Network pipe1-1 connectivity
Server
Data Storage
Compute grid – coordinating set of tasks
Multiple applications/worker threadsaccessing single datastore
Business AppServer
Client
Network pipe1-1 connectivity
Server
Grid Computing OSCAR Lab 9
Data Storage
Compute grid – coordinating set of tasks
Data grid – manages data
Data grid – eliminates data access bottlenecks
Grid Computing OSCAR Lab 10
Data grid architecture
Mechanism neutrality Policy neutrality Compatibility with
compute grid Uniformity with
information infrastructure
Services Storage Service Grid storage API Metadata service
Grid Computing OSCAR Lab 11
Data grid architecture
Expectations Coordination between
compute and data grid Data delivery to
facilitate task and resource management
Sharing data distribution and location information
Leveraging data locality
Guarantees Dependability Consistency Pervasiveness Security Inexpensive
Grid Computing OSCAR Lab 12
BatchSynchronousStatic dataNontransactional
AtomicSynchronousStatic DataNontransactional
AtomicAsynchronousStatic DataNontransactional
AtomicAsynchronousDynamic dataNontransactional
AtomicSynchronousStatic dataTransactional
AtomicAsynchronousDynamic dataTransactional
AtomicAsynchronousStatic dataTransactional
BatchSynchronousStatic dataTransactional
Application ComplexityWork, Time, Data, Transactional
Dat
a G
rid Q
oS
Leve
l 0Le
vel 1
OLAPReal-time datamart
Monte Carlo Simulation
Data delivery - QoS requirements
Grid Computing OSCAR Lab 13
Related Work
Grid File System - provides primitives like a file system – Level 0 QoS
NFSv4 – High performance, extensible, secure – in the works
Secure File System – self certifying paths, unique identifiers, global namespace, key based certification
Grid Computing OSCAR Lab 14
Technologies related to data grids - JavaSpaces
“Make Room for JavaSpaces, Part IEase the Development of Distributed Apps with JavaSpaces” - Eric Freeman and Susan Hupfer
Grid Computing OSCAR Lab 15
OceanStore
Global replication of data Promiscuously caches data Version based archival storage Applications can control their consistency
requirements to manage performance Internal event monitors analyze access
patterns to move data and provide redundancy
Grid Computing OSCAR Lab 16
Grid Fabric - Integrasoft
Business solution provided for financial institutions, share traders
Designed to complement compute grid Works closely with compute grid to schedule
tasks based on data availability Moves data closer to computation
Grid Computing OSCAR Lab 17
WebServicesBusiness process
Data Grid
Delivers has
Req
uire
s State
SOA and Data grids
Moore’s law and Metcalf’s law Network based computation and grid computing with SOA Intelligent infrastructure – SONA
Grid Computing OSCAR Lab 18
Web 2.0
Grid Computing OSCAR Lab 19
Our research – Motivation
Issues in data management Data tightly coupled to computation Data cached locally Distribution is haphazard and reuse is
minimal Data pulled by computation – not delivered Mechanisms still improvise based on
experience on smaller systems
Grid Computing OSCAR Lab 20
Data Grid and DBMS
Grid DBMS Security Transparency Robustness Efficiency Intelligence Fragmentation Heterogeneity
DBMS Data Regions
Tables Schema Ordered Structure
Triggers Events Events
Stored Procedures
Optimizations Distributed procedures
Intra-table fields Indexing Cross-structure
Table/row level Locking Data atom level
Table joins Relation Data atom
SQL Query Programmatic string base
Indexes Repeated data access
Tags
Grid Computing OSCAR Lab 21
Data grid – eliminates data access bottlenecks
PersistenceMechanism – with data regions
Data Storage
indicatesReplicas, relations
Data grids as extended DBMS
Grid Computing OSCAR Lab 22
Datacentric grids
Automated space management and garbage collection
Space and data objects lifetime mechanism I/O allocation on storage system Estimating access from Magnetic storage Co-scheduling of compute and storage resources Space reservation dilemma Thin clients Code mobility towards data
Grid Computing OSCAR Lab 23
Expected Results
Can we move computation closer to data? Data grid –with features of persistence? Performance improvement using tags? Loosely coupled data grid and compute grid? Scalability of unique naming in file systems?
Thank you!