Workload Management PBS Professional

27
Workload Management PBS Professional Since 1999 We measure our success in terms of our customers’ business performance.

description

Since 1999. Workload Management PBS Professional. We measure our success in terms of our customers’ business performance. High Performance Computing. Finite Compute Resources Large financial investment User Competition for Resources Types of Jobs Computation too large for desktops - PowerPoint PPT Presentation

Transcript of Workload Management PBS Professional

Page 1: Workload Management PBS Professional

Workload ManagementPBS Professional Since 1999

We measure our success in terms ofour customers’ business performance.

Page 2: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 2

High Performance Computing

Finite Compute Resources• Large financial investment

User Competition for Resources

Types of Jobs• Computation too large for desktops• Long running Jobs

Organized by Asking- • Are you done yet?• What exactly are you using?• What about my job?

CPUs

Running

Job

Job 2

Job 1

Job 3

Time

Page 3: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 3

Altair Engineering

A global software and services company focused on information analysis,

visualization, product development and high performance computing

Founded in 1985

Privately held

~ 1,600 employees

40 offices: N.A., Europe, Asia/Pacific

Headquarters: Troy, MI USA

Next-generation Lighting Technology

On-demand Computing Technology

Business IntelligenceSolutions

Enterprise CAE Software Suite

Product Innovation Consulting

Page 4: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 4

Life with PBS Professional

Focus on science, not computers– Easy to use

Enables business continuity, bulletproof reliability and runs everywhere

– Hard to break

Automatically ensure the right work is done at the right time and eliminate waste         

– Do more (with less)

Maximize ROI                                                           

–  Keep track and plan

CPUsRunning

Job

Job 2

Job 1

Job 3

Time

Job 5

Job 6

Page 5: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 5

PBS Professional has advanced scheduling functionality and the reliability, availability and serviceability features to satisfy production enterprise customers with massive computational infrastructures.Continued product/customer focus on technical HPC – the de facto workload management standard for Linux clusters and the largest, most complex workloads.

PBS Professional server/scheduler failover Automatic job migration upon failures Topology aware scheduling Scheduler enhancements*

• Tunable formula

• Job submission hooks

• Standing reservations

• Backfill around 1-N large jobs

• Enhanced fair share

• Eligible time

v10.1

Node virtualization Green computing Central licensing server Analytics and other web-based extras Job arrays Enhanced network debugging Integration with most MPI packages* Restrict user/node lock down* Individual job “sand boxes”*

Meta-scheduling such as Windows HPC Server Generic dynamic provisioning, finer control, address “black hole”

Page 6: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 6

Tunable Job Scheduling Formula

• Extend the scheduler's tunable formula to include additional mathematical operations; custom resources currently are viewable and request able/modifiable by all users, operators, and managers.

• This formula may be in in conjunction with the standard PBS Professional scheduler.

• Example: Company A would like to create per-job coefficients in their formula which are set by system defaults and not able to be requested/modified or viewed by the user.

– For example A, B, C and D below would be these coefficient resources.– A *(Queue Priority) + B*(Job Class Priority) + C*(CPUs) + D*(Queue Wait

Time)

Page 7: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 7

Tunable Job Scheduling Formula (more)Customize job scheduling to whatever PERL allows

Define any policy – including on-the-fly “exceptions”

Simple formulas are very simple (big jobs go first)

ncpus * (walltime/3600.0)

Complex formulas are pretty simple too… (adds priority accrual for smaller jobs, high-priority queue, deferred queue, “run this job next”)

(ncpus * (walltime/3600.0)) * Wsize + (eligible_time/3600.0) * Wwait +

special_p

Page 8: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 8

Standing Reservations

Guarantee resources for recurring needs

pbs_rsub -R 0500 -E 0800 \

–r "FREQ=DAILY;BYDAY=MO,TU,WE,TH,FR;UNTIL=20091231" \

-l select=200:ncpus=2 –l place=scatter:excl

Run the weather simulations from 5-8am every weekday morning Reserve the computing lab for classes on MWF 14:00-16:00 Block out time for maintenance the first weekend of every month

Page 9: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 9

Submission Filtering “Hooks”

Change/augment capabilities in the field, on-the-fly, without source

if e.job.Resource_List["walltime"] is None:

e.reject(“You must specify a walltime")

Admission control – validate requests Allocation management On-the-fly tuning Custom logging, reporting, debugging, and even patches!

Page 10: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 10

Job 1

Now Time

Used CPUs

Running JobJob 3

Job 2

Scheduling & Policy Enhancements

Strict FIFO • Inefficient scheduling algorithm

• Not ideal for large & small job mix

• Introduces utilization inefficiencies

Page 11: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 11

Now Time

Used CPUs

Running Job

Job 2

Job 1

Job 3

Scheduling & Policy Enhancements

Strict Ordering With Backfilling• Jobs Are Sorted & Order Is Maintained

• Jobs Small Enough Not To Effect Start Times Are Backfilled

• ‘Gaps’ Are Utilized Effectively

• Large/Starving Job Starts When Expected

Page 12: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 12

• Intel MPI on Linux• MPICH 1.2.5, 1.2.6 on Linux 2.4 on Itanium 2, x86/AMD64/EM64T• MPICH 1.2.5, 1.2.6 on Linux 2.6 on x86/AMD64/EM64T• MPICH-GM/MX on Linux• MPICH2 on Linux• IBM POE on AIX 5.2• IBM POE on HPS Switch (enhanced)• IBM HPS via LoadLeveler• HP MPI 1.08.03 on HP-UX 11 on PA-RISC & Itanium 2• HP MPI 2.0.0 on Linux 2.4 & 2.6 on x86/AMD64/EM64T/Itanium 2• LAM/MPI 6.5.9/7.0.6/7.1.1 on Linux 2.4/2.6 on x86/AMD64/EM64T/Itanium 2• SGI MPI on Linux on IA64 (Altix / Itanium 2)• SGI MPI (MPT) over Infiniband• SGI MPI within and across Altix(es)• SCALI MPI-Connect• Open MPI (native)• MVAPICH across Infiniband

Tightly Integrated MPI Packages

Page 13: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 13

Restrict user-feature

PBS Head Node

PBS Professional provides guaranteed

exclusive access to compute nodes and for administrators to ensure

end users can not access nodes when jobs are not running

Page 14: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 14

PBS Per-Job Execution Directory – “Sandbox”

Job 75

Job 29

Job 73

Job 88

Job 54

Job 42

Job 61

Before PBS Professional v9.2

Page 15: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 15

PBS Per-Job Execution Directory – “Sandbox”

Job 21 Job 42

Job 97 Job 38

PBS Professional 9.2+ sandbox=PRIVATE

Page 16: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 16

PBS Per-Job Execution Directory – “Sandbox”

PBS creates a job specific staging and execution directory per running job (i.e. a job private sandbox)

Each job runs in that sandbox Files are staged in/out of the sandbox area Upon successful stage-out the job specific directory is deleted

• However, if stage out fails, the job specific directory and its contents will not be deleted. Thus allowing job results to still be retrieved.

sandbox=PRIVATE

No longer need a user’s home directory on every execution node• Simply set $jobdir_root in the mom’s config file

Able to run simultaneous instances of the same job• Jobs using common names for input/output files won’t overwrite another job’s files.

Page 17: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 17

PBS Pro Global resellers and Strategic Partners

Page 18: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 18

Contact Information

Devin Jensen • Director, Business Development

• Desk : (801) 653-2300

• Cell: (949) 322-6212

• Email: [email protected]

Stephen Gombosi• Senior Analyst

• Desk: (303) 415-0327

• Cell: (303) 325-4336

• Email: [email protected]

TM

Support:• Level 1 Support• Troy, MI, 12 hour weekdays

– Eastern Time Zone– 0800-2000 or 8am to 8pm

• Support Phone: – (248) 614 2425

• Email: [email protected]

Altair Engineering: • (248) 614 2400

Website:• http://www.altair.com• http://www.pbspro.com

Page 19: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 19

Questions & Answers

Thank You!

Next Steps? Evaluation licenses Technical questions Case studies

TM

Page 20: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 20

Summary of PBS Professional – Part I

• Node virtualization everything is a resource, DMP or SMP

• Pull-based grid support reserves control in peer-to-peer environment

• Restrict user/node lock down code “hammer,” extremely configurable

• ACL (Access Control List) limits resource usage on a per user, group, host, or network domain level

• Soft Limits/Hard Limits set per user

• Node Packing minimizes the fewest number of hosts

• Resource Minimum/Maximum are set at server and queue levels

• Queue Routing allows redirection of jobs to and from other queues.

• Prime time/Non-prime time scheduling allows adjusting of set policies

• Individual controls allows users to determine account mapping and advance reservations

• Administration easily configure queues, hosts, access lists, and resources

• Enhanced network debugging set timeouts, frequencies, etc.

• Licenses checked out by execution server, not the submission host; also keep expensive SW packages in use 24 x 7

Page 21: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 21

Summary of PBS Professional – Part II

• Job dependencies • Job preemption and suspend/resume/checkpoint (per OS or application)• Advance reservations/standing reservations• Dynamic resources (per node and per server)• Idle time detection• Job arrays• Fairshare or round robin scheduling• Strict ordering/optimal backfilling• Interactive or batch mode processing• Topology-aware scheduling• Command line interface or graphical user interface (GUI)• Parallel jobs• Dedicated time• Finer granularity control on users/groups• Meta scheduling to Windows HPC• Job execution “sandboxes”• File staging with wildcards• Heterogeneous architecture (UNIX, Linux, and Windows)• Decade + of development• Source-code access

Page 22: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 22

GridWorks AnalyticsPBS Professional Accounting Companion Product

Generate reports automatically “out-of-the-box”, then customize... Understand usage trends for capacity planning Verify project planning assumptions to meet deadlines Extract accounting data for chargeback Track software license use to optimize ROI

Page 23: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 23

GridWorks Analytics Features

Directly visualize PBS

Professional accounting data

Numerous graph types

Secure, role-based access:

users view their data;

managers view group data

Customize, save, and share

reports or entire dashboards

Drill-down to underlying data

Publish via web and export to

Excel

Page 24: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 24

GridWorks Analytics Architecture

Modern drag-and-drop Web 2.0 technology Turn-key setup from download to display Aggregates data from multiple PBS Professional servers Historical data back as far as PBS Pro 5.3.x Plugs-in to all major enterprise infrastructure

Page 25: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 25

Page 26: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 26

What Is Personal PBS?

Desktop tool that turns your multi-core desktop into your own miniature compute farm…

Run HPC jobs locally for free

Drag-and-drop jobs to back-end PBS Professional systems

Applications Pane

Toolbar Jobs Pane

Profiles Pane

Page 27: Workload Management PBS Professional

Confidential information. Copyright © 2006 Altair Engineering, Inc. All rights reserved. 27

Personal PBS Overview

Drag-and-drop input to submit

Recognizes application and fills in

default job parameters

Add new applications via wizard

Monitor, manage, prioritize jobs

Run jobs locally on your desktop

Connect to enterprise clusters

Output is automatically returnedQueues

My Jobs

Enterprise PBS Professional

Cluster

Job List