Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team [email protected] NERSC...

21
Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team [email protected] NERSC User Group Meeting September 17, 2007

description

Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team [email protected] NERSC User Group Meeting September 17, 2007. NERSC Analytics Team Members. C. Aragon. Wes Bethel, Team Lead Cecilia Aragon Janet Jacobsen Peter Nugent Kurt Stockinger Gunther Weber - PowerPoint PPT Presentation

Transcript of Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team [email protected] NERSC...

Page 1: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

Overview of NERSC Analytics Program

Cecilia AragonNERSC Analytics [email protected]

NERSC User Group MeetingSeptember 17, 2007

Page 2: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 2

NERSC Analytics Team Members

• Wes Bethel, Team Lead

• Cecilia Aragon• Janet Jacobsen• Peter Nugent• Kurt Stockinger• Gunther Weber(~3 FTEs with 1 FTE to be

hired)

W. BethelC. Aragon

J. Jacobsen

P. Nugent

K. Stockinger

G. Weber

Page 3: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 3

What is the Analytics Program at NERSC?

• At NERSC, the Analytics Program is the confluence of several key technologies:– Data management

• Data storage/retrieval/sharing/movement, data indexing/querying, format conversion, sharing.

– Data analysis, exploration and visualization• Feature detection/tracking.• Statistical analysis.• Subsetting, filtering, partitioning.• Comparison: models to models, models to data, etc.• Interactive data exploration.• Visualization: visual analysis.

– Workflow management• Systematic approach to “data processing pipelines,”

especially those that use multiple distributed resources and automate scientific data processing activities.

Page 4: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 4

Analytics Components

Experiment

SimulationData Analysis Results

Filter, feature detection,

search, subset,

transform, visualize.

Images, movies, data,

decisions, knowledge.

Raw data files, metadata, location

transparency, single project,

community repositories.

Workflow

High Performance I/O Libraries, Data Models

Page 5: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 5

What we do for NERSC users

Mission statement:Facilitate NERSC User knowledge discoverythrough use, adaptation, extension, creation,

application and deployment of a diverse array of technologies spanning the domains of

– data management– data analysis and exploration– visualization– workflow management

Page 6: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 6

What we do for NERSC users

• Generally, no off-the-shelf, general purpose solutions for Analytics exist.– The Analytics Program adapts, extends,

integrates and sometimes creates technologies to meet user needs.

– Consulting and collaborative projects with users in: visualization, data management, data exploration, data analysis, workflows.

– Substantive impact on science comes through in-depth work with stakeholder/users.

• Contact us at [email protected]

Page 7: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 7

NERSC Analytics Web Site

• http://www.nersc.gov/nusers/analytics/– Completely redesigned in March 2007– Response to 2006 User Survey (need for

more web-based analytics documentation)

Page 8: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 8

Resources of the Analytics Program: Personnel, Analytics System

• Team of six (~3 FTEs with 1 FTE to be hired) with experience spanning all aspects of analytics, high performance computing, and many science domains.

• DaVinci: SGI Altix – 32 processors, 192GB RAM, 40TB attached FC storage

– Architectural balance favors data intensive operations: large SMP memory, best I/O bandwidth on the floor at NERSC.

• Procurement process underway for new analytics machine (response to user predictions of substantial increase in data size over next two years).

Page 9: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 9

Supported Science Areas

Supported Science Areas 2005-2007

Accelerator12%

Chemistry6%

Climate6%

Combustion3%

DOE3%

Fusion21%

Life Sciences9%

Materials Sciences6%

Math3%

Nuclear Physics3%

CS3%

Astrophysics25%

Accelerator

Astrophysics

Chemistry

Climate

Combustion

CS

DOE

Fusion

Life Sciences

Materials Sciences

Math

Nuclear Physics

Page 10: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 10

Analytics CustomerTechnology Matrix

XXXMathXXXFusionXXCSXXCombustionXXXClimate

XChemistryXXBiologyXXXXAstrophysics

XXXXAccelerator

VisualizationAnalysisWorkflowSDM

Page 11: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 11

Analytics Customers

• Samples of our work:– Climate– Fusion– Spectrum Synthesis– Laser Wakefield Particle Acceleration– Astrophysics– General purpose

Page 12: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 12

Analysis of Climate Modeling: Automatic Feature Extractionby Blind Source Separation

Tropical storm visible in sea level pressure

simulations at multiple time steps.

Images of extracted features: top ten independent

components were extracted from set of all 8x8 subimages.

Extracted features can be used as templates for finding similar features.

In this case, the features were variations on rotatinglow-pressure systems.This was not assumeda priori.

Page 13: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 13

Fusion: Comparative Analysis

• Science objective: compare experiment (SSPX) and simulation (NIMROD).

• Problem: data formats are incompatible with each other and with visual and comparative analysis tools.

• Solution #1: one-step conversion from NIMROD binary output to VisIt format (replaces a procedure consisting of about 10 steps).

• Solution #2: VisIt reader for SSPX data. Implement basic comparative visual analysis capabilities in VisIt.

Page 14: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 14

Spectrum Synthesis

• NERSC Analytics contribution: visualization and analysis to test and confirm theory of new type of Type Ia SN – one having “Super- Chandrasekhar” mass.

• Top: brightness vs. velocity and amount of deceleration.

• Middle: velocity vs.mass and unburned carbon using a 1.4 solar mass model.

• Bottom: moving to a 2 solar mass model includes new observation.

Page 15: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 15

PIC Simulation of Laser Wakefield Particle Acceleration

This image shows a horizontal slice through the electric field; the electrons are colored by the magnitude of the momentum. (AVS/Express)

This image uses volume rendering to show the plasma density field. (VisIt)

Page 16: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 16

Accretion-Induced Collapse of White Dwarfs

• Data from 2D radiation-hydrodynamics simulations.

• Data include 68 scalar and 25 vector fields.• Future simulations will be 3D.

Entropy Electron fraction Mach number

Page 17: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 17

Analytics with Cooperative Funding – SNfactory

• New supernova data analysis and workflow visualization tools (Sunfall and SNwarehouse) have improved usability and situational awareness, and enabled faster and easier access to data for supernova scientists worldwide

• Advanced image processing (Fourier contour analysis) and machine learning techniques running on NERSC platforms have achieved a ~90% decrease in human workload in nightly supernova search (2.75 FTE)

Page 18: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 18

Scientific Data Management

Storage Resource Manager (SRM) for distributed data management:

• integrated mechanism for transferring files from one location to another,

• uniform access to heterogeneous storage (disk, tape),

• fault tolerant.FastBit for efficient indexing and querying.HDF5 FastQuery combines bitmap indices with

HDF5.

Page 19: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 19

Improving Remote Display Performance

• Remote Analytics – improve performance of remote display through X11 protocol acceleration/proxies.– General purpose solution widely applicable to

many different applications– Addresses a major user concern of users– Conducted performance tests to evaluate various

protocol acceleration technologies– Project scope and objectives documented on

internal NERSC website -- next step is coordination with other groups within NERSC

– Intent is to deploy technology to accelerate performance of applications with remote display capability

Page 20: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 20

NERSC Remote Licensing

• Objective:– Allow remote users to take advantage of

(expensive) commercially licensed software.

• Implementation:– Consolidate all license serving inside NERSC to

a central location.– Set up facility whereby remote users can

“check out licenses” for use on their desktop machines.

– Software supported: IDL, AVS, AVS/Express, CEI/Ensight Gold.

Page 21: Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, 2007 21

Questions?

http://www.nersc.gov/nusers/analytics/

Wes Bethel, NERSC Analytics Team Lead, [email protected]

Cecilia Aragon, [email protected]

Analytics Team, [email protected]