The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma...

23
he Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006

Transcript of The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma...

Page 1: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

The Sharing and Training of HPC Resources at the University of Arkansas

Amy Apon, Ph.D.Oklahoma Supercomputing Symposium

October 4, 2006

Page 2: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 2

Outline of Talk• HPC at the University of Arkansas

– Current status• A New Mechanism for Sharing Resources

– AREON• HPC Training

– New course delivery via HDTV collaboration with LSU

• Collaboration opportunities and challenges– GPNGrid and SURAGrid– Resource allocation issues

Page 3: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 3

High Performance Computing Resourcesat the University of Arkansas

• Red Diamond supercomputer– NSF MRI grant, August, 2004

• Substantial University match• Substantial gift from Dell

– First supercomputer in Arkansas• Number 379 on the Top 500 list, June, 2005• 128 node (256 processor), 1.349 TFlops

Page 4: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 4

More Resources• Prospero cluster

– 30 dual processor PIII nodes– SURAGrid resource

• Ace cluster– 4 dual processor Opteron– Our entry point to the GPNGrid/Open

Science Grid• Trillion cluster

– 48 dual processor Opteron– Owned by Mechanical Engineering– About 1TFlop

Page 5: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 5

How are we doing

Page 6: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 6

We are seeing research results• Computational Chemistry and

Materials Science (NSF)– New formulas for new drugs– Nanomaterials– Chemistry, Physics, Mechanical Engineering

– Over 95% of our usage is in these areas

Page 7: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 7

Research results in other areas, also

• Multiscale Modeling• DNA Computing• Middleware and HPC Infrastructure

– Tools for managing data for large-scale applications (NSF)

– Performance modeling of grid systems (Acxiom)

Page 8: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 8

We have done some significant upgrades• For the first year we used SGE on half the

computer and half of the computer was self-scheduled PVM jobs

• LSF scheduler installed May 2006

• About 60 users, about 10 very active users

Page 9: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 9

Thanks to LSF, we are busy

LSF Daily Pending Parallel Job Statistics by Queue (jobs waiting)

Page 10: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 10

And jobs have to wait

LSF Hourly Turnaround Time of Normal Queue

Page 11: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 11

We have something exciting to share

Page 12: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

12

Fayetteville

Magnolia

RussellvilleFort Smith

Arkadelphia

Conway

Monticello

Pine Bluff

Jonesboro

Little Rock

TULSA

DALLAS

MEMPHIS

MONROE 25-July-2006

AREONAREONArkansas Research and Education Optical NetworkArkansas Research and Education Optical Network

We A

RE ON !

Page 13: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

13

• The first bond issue (last fall) failed

• Governor Huckabee of Arkansas granted $6.4M (PI Zimmerman)

• MBO loop between Tulsa and Fayetteville fiber is in place, network hardware is being shipped

• The campus (last mile) connections are in progress

All is on target for a demo to the Governor on 12/5/06!

Fayetteville

Magnolia

RussellvilleFort Smith

Arkadelphia

Conway

Monticello

Pine Bluff

Jonesboro

Little Rock

TULSA

DALLAS

MEMPHIS

MONROE

AREONAREONArkansas Research and Education Optical NetworkArkansas Research and Education Optical Network

Page 14: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 14

AREONAREONArkansas Research and Education Optical NetworkArkansas Research and Education Optical Network

• This fall, Uark will have connectivity to Internet2 and the National Lambda Rail

• The bond issue is on the ballot again this coming fall

• If it passes then the other research institutions will be connected to AREON

We hope this happens!

• The timeframe for this is about a year and a half

Page 15: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 15

Opportunities for collaboration with OneNet, LEARN, LONI, GPN, and others

Houston

Galveston

Beaumont

Corpus Christi

El Paso

San Antonio

Austin

Lubbock

College Station

Waco

Longview

LEARN TopologyNLR Topology

Leased LambdaLEARN Site

Metro Interconnect City

LEARN Topology for State

Dallas

Denton

Page 16: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 16

A Demonstration Application• High Performance Computing New course in Spring 2007

In collaboration with LSU and Dr. Thomas Sterling

– We are exploring new methods of course delivery using streaming high-definition TV

– We expect about 40 students at five locations this time

– Taught live via Access Grid and HDTVover AREON and LONI, …

– A test run for future delivery of HPC education

Page 17: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 17

Collaboration via GPN Grid

– Active middleware collaboration for almost 3 years

– GPNGrid is in the process of making application as a new Virtual Organization in Open Science Grid

– Sponsored by University of Nebraska – Lincoln, includes participants from Arkansas, UNL, Missouri, KU, KSU, OU

– Hardware grant from Sun and NSF provide 4 small Opteron clusters for the starting grid environment

– Applications are in the process of being defined

Page 18: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 18

Collaboration via SURA Grid• Uark has a 30-node

Pentium cluster in SURAGrid

• Some differences with GPN– CA is different– Account management,

discovery stacks are different

– AUP policy is different

- SURA Grid applications are increasing. Uark can run coastal modeling and is open to running other SURA applications

Page 19: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 19

More Collaboration Mechanisms

• Arkansas is participating with the recently awarded CI-TEAM award to OU, PI Neeman– Will deploy Condor across Oklahoma and

with participating collaborators

• LSF Multicluster provides another mechanism for collaboration

• AREON will give the University of Arkansas great bandwidth

Page 20: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 20

UofA Current HPC Challenges

• We have some I/O infrastructure challenges– The system was designed to have a large

amount of storage, but it is not fast

• Supercomputing operations– AC, power, and UPS need to be upgraded

• Funding models for on-going operations– How will basic systems administration and

project director be funded?

Page 21: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 21

Collaboration and sharing bring a challenge• Usage policies

– How do you partition usage fairlyamong existing users?

– How do you incorporate usage from new faculty?

Current policy uses a fair-share scheduling policy.Dynamic Priority = (# shares) / (#slots*F1 + cpu_time*F2 + run_time*F3);

Shares divided among largest users groups: chem, phys, others

Page 22: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 22

Collaboration and sharing bring a challenge• Are max run times needed?

– Almost everyone has them– Requires checkpointing of jobs which is hard to do with our current I/O infrastructure– Requires user education and a change of culture

• Are user allocations and accounting of usage needed?

• Your suggestions here

Page 23: The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.

Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 23

Questions?

Contact information:http://hpc.uark.eduhttp://comp.uark.edu/[email protected]