R*: An Overview of the Architecture

32
R*: An Overview of the Architecture R. Williams, et al IBM Almaden Research Center

description

R*: An Overview of the Architecture. R. Williams, et al IBM Almaden Research Center. Outline. Environment and Data Definitions Object Naming Distributed Catalogs Transaction Management and Commit Protoctols Query Preparation Query Execution SQL Additions and Changes. - PowerPoint PPT Presentation

Transcript of R*: An Overview of the Architecture

Page 1: R*: An Overview of the Architecture

R*: An Overview of the Architecture

R. Williams, et alIBM Almaden Research Center

Page 2: R*: An Overview of the Architecture

Outline

Environment and Data DefinitionsObject NamingDistributed CatalogsTransaction Management and Commit ProtoctolsQuery PreparationQuery ExecutionSQL Additions and Changes

Page 3: R*: An Overview of the Architecture

Environment and Data Definitions

CICS as the underlying communication modelData distribuion:

Dispersed Replicated Partitioned

Horizontal vertical

Snapshot

Page 4: R*: An Overview of the Architecture

Figure 1 from paper

Page 5: R*: An Overview of the Architecture

Figure 21.4 from CS 432 text

Page 6: R*: An Overview of the Architecture

Object Naming

System Wide Names (SWN): USER @ USER_SITE.OBJECT_NAME @

BIRTH_SITE

Page 7: R*: An Overview of the Architecture

Distributed Catalogs

Local site maintains objects in its databaseCatalog entry may be cachedEntries are versioned

SWN Type Format Access path

Object ref

(view)

Statistics

Page 8: R*: An Overview of the Architecture

Transaction Management and Commit Protocol

Transaction number: SITE.SEQ_NUM (or SITE.TIME)

Two phase commit (2PC)

Page 9: R*: An Overview of the Architecture

Query Preparation

Name resolutionAuthorization checkDistributed compilationGlobal plan generation/optimizationLocal access path selectionLocal optimizationLocal view materialization

Page 10: R*: An Overview of the Architecture

Figure 2 from paper

Page 11: R*: An Overview of the Architecture

Cost Model

3 weighted components: I/O CPU Message

# of messages sent # of bytes sent

Page 12: R*: An Overview of the Architecture

Query Execution

Synchronous vs asynchronous executionDistributed concurrency controlDeadlock detection and resolutionCrash recovery

Page 13: R*: An Overview of the Architecture

Figure 3 from paper

Page 14: R*: An Overview of the Architecture

SQL Additions and Changes

DEFINE SYNONYMDISTRIBUTE TABLE HORIZONTALLY VERTICALLY REPLICATED

DEFINE SNAPSHOTREFRESH SNAPSHOTMIGRATE TABLE

Page 15: R*: An Overview of the Architecture

R* Optimizer Validation and Performance

Evaluation for Distributed Queries

Lothar F. MackertGuy M. Lohman

IBM Almaden Research Center

Page 16: R*: An Overview of the Architecture

Outline

Distributed Compilation/OptimizationInstrumentationExperiments and Results

Page 17: R*: An Overview of the Architecture

Distributed Compilation/Optimization

Issues: Join site Transfer methods:

ship whole fetch matches

Cost model sentbytesofwsentmsgsofw

accesspageofwcallsRSSofwtTotal

bytemsg

OIcpu

##

##cos /

Page 18: R*: An Overview of the Architecture

Weights Estimation

CPU: inverse of MIPSI/O: avg seek, latency, transfer timeMSG: # of instruction per msgBYTE: effective transmission speed of network

Page 19: R*: An Overview of the Architecture

Figure 2 from paper

Page 20: R*: An Overview of the Architecture

Instrumentation

Distributed EXPLAINDistributed COLLECT COUNTERSForce optimizier

Page 21: R*: An Overview of the Architecture

Experiment I

Transfer methodMerge-scan join of 2 tables: 500 tuples in each table Project both table – 50% 100 different values for join attribute Join result: 2477 tuples

Page 22: R*: An Overview of the Architecture

Figure 4 from paper

Page 23: R*: An Overview of the Architecture

Figure 3 from paper

Page 24: R*: An Overview of the Architecture

Experiment II

Distributed vs local joinJoin of 2 tables: 1000 tuples in each table Project both table – 50% 3000 different values for join

attribute

Page 25: R*: An Overview of the Architecture

Figure 5 from paper

Page 26: R*: An Overview of the Architecture

Figure 6 from paper

Page 27: R*: An Overview of the Architecture

Experiment III

Relative importance of cost components

Page 28: R*: An Overview of the Architecture

Figure 7, 8, 9, 10 from paper

Page 29: R*: An Overview of the Architecture

Experiment IV

Optimizer evaluationAccurate estimates of # of msgs and bytes sent (<2% difference)Better estimates when tables are more distributed

Page 30: R*: An Overview of the Architecture

Experiment V

Alternative distributed join methods: Dynamically created indexes Semijoins Bloomjoins

2 tables: 1000 tuples for outer Varies inner from 100 to 6000 tuples

Page 31: R*: An Overview of the Architecture

Figure 11, 12 from paper

Page 32: R*: An Overview of the Architecture

Other Experiments

Clustered index: Bloomjoins < Semijoins < R*

50% Projection: Site 1: Bloomjoins < Semijoins < R* Site 2: Bloomjoins < R* << Semijoins

Wider join column: Bloomjoins < R* << Semijoins