Grid Collector: Enabling File-Transparent Object Access For Analysis

8
Grid Collector: Enabling File- Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National Lab In collaboration with Jerome Lauret, Victor Perevoztchikov, Valeri Faine, Jeff Porter, Sasha Vanyashin Brookhaven National Laboratory

description

Grid Collector: Enabling File-Transparent Object Access For Analysis. Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National Lab. In collaboration with Jerome Lauret, Victor Perevoztchikov, Valeri Faine, Jeff Porter, Sasha Vanyashin - PowerPoint PPT Presentation

Transcript of Grid Collector: Enabling File-Transparent Object Access For Analysis

Page 1: Grid Collector: Enabling File-Transparent Object Access For Analysis

Grid Collector: Enabling File-Transparent Object Access For Analysis

Wei-Ming Zhang

Kent State University

John Wu, Alex Sim, Junmin Gu and Arie Shoshani

Lawrence Berkeley National Lab

In collaboration with

Jerome Lauret, Victor Perevoztchikov,

Valeri Faine, Jeff Porter, Sasha Vanyashin

Brookhaven National Laboratory

Page 2: Grid Collector: Enabling File-Transparent Object Access For Analysis

June 2003 Grid Collector

Goals

• Transparent object access– No need for analysts to manage files and disk space– No need for analysts to access remote mass storage systems

• Select objects based on their attribute values– E.g., production=P03ia & numberOfPrimaryTracks>200

• Improve analysis system’s throughput by– Eliminating the need to read all objects in a file– Providing optimized disk space management and automatic

garbage collection– Automating the retrieval of files from remote storage systems

• Interactive analysis of data distributed on the GRID– Providing quick partial answers– Enabling users to transparently share files in disk caches

Page 3: Grid Collector: Enabling File-Transparent Object Access For Analysis

June 2003 Grid Collector

Previous WorkStorage Resource Access

Coordination System (STACS) and the GCA client software for STAR

Strength –

• Transparent event access for one storage site

• Efficient evaluation of selection conditions through bitmap index

• Interactive estimation of the selection size

Weakness –

• GCA client software was designed for Objectivity data

– Need to access ROOT files now

• STACS only accesses one HPSS– STAR data is to be distributed on

the Grid

QueryEstimator

(QE)

CacheManager

(CM)

QueryMonitor

(QM)

Query estimation /execution requests

file cachingrequest

CachingPolicy

Module

FileCatalog

(FC)

Bitmapindex

file purging

Disk Cache

file caching

User’sApplication

open,read,close

Page 4: Grid Collector: Enabling File-Transparent Object Access For Analysis

June 2003 Grid Collector

Grid Collector

Grid Collector is a collection of modules that include functionalities of STACS and GCA client

New features –• Integrate with STAR analysis framework to extract events from ROOT

files• Use Storage Resource Manager for disk (DRM) and HPSS (HRM)

– GRID enabled, capable of accessing multiple sites

• More efficient implementation of bitmap index

Disk Disk Cache

EventIterator

Analysis

LogicalRequest

BitmapIndex

Filescheduler

DRM

FileCatalog

HRM

Disk Disk Cache

HRM

Disk Disk Cache

BNL

LBNL

Page 5: Grid Collector: Enabling File-Transparent Object Access For Analysis

June 2003 Grid Collector

The Building Blocks• Bitmap Index

– Indexes each event– Efficient for partial range queries

• Storage Resource Manager– Manages disk cache– Automatic retrieval of needed files from the Grid

• File Scheduler– Coordinates file accesses

• File Catalog– Provides location information about files

• Index Feeder– Digests ROOT files to extract information about events (tags)

• Event Iterator– Feeds events to analysis code in a stream

Page 6: Grid Collector: Enabling File-Transparent Object Access For Analysis

June 2003 Grid Collector

Using Grid Collector

• Existing practice– Specify a list of files or directories containing the desired events– Analyze all events in the files

• Reading more events than needed

– Files have to be on disk before analysis• User has to manage the files and space

• All files have to be present at the same time

• Using Grid Collector– Specify the conditions characterizing the desired events, such as

“production=P03ia & numberOfPrimaryTracks>=200”– Analyze only events satisfying the conditions

• By reading only the events selected using the bitmap index

– Files are retrieved and managed by the Grid Collector• User does not have to know about the files

• Files are retrieved in a stream, reducing the disk space required

Page 7: Grid Collector: Enabling File-Transparent Object Access For Analysis

June 2003 Grid Collector

Detailed Use Case

• Using a sample analysis script called doEvents.C• Analyzing first 100 events from production P03ia with 200 or more

primary tracks– .x doEvents.C(100, “select production=P03ia &

numberOfPrimaryTracks>=200”)

• To analyze all events, set the first argument to a negative integer• To try different conditions without analyzing them, a separate

command is available

• Creating your own script to use the Grid Collector– Load StGridCollector library

– Create an object of type StGridCollector

– Initialize the object with a select statement

– Pass the object to StIOMaker just like a StFile object, the rest of the code is exactly the same as using StFile

Page 8: Grid Collector: Enabling File-Transparent Object Access For Analysis

June 2003 Grid Collector

Status and Future Plans

• Current state– Grid Collector is ready to be used– Currently (June, 2003), we are populating the bitmap index for a

STAR user (John Amonett, Kent State University) to do flow analyses

• Future plans– Speed up the index building process– Enable parallel and distributed analyses for large jobs– Provide capability for users to analyze events in a specified order– Make it into a Grid-enabled service– Collaborate with other experiments (?)

• Contact information– John Wu <[email protected]>– Wei-Ming Zhang <[email protected]>– Jerome Lauret <[email protected]>