Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS...

Post on 13-Jul-2020

0 views 0 download

Transcript of Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS...

© 2014 IBM Corporation

Enterprise2014

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Chris Churchey – Principal ATS Group, LLC

churchey@TheATSgroup.com (610-574-0207)

October 2014

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20142

Why Monitor? (Clusters, Servers, Storage, Net, etc.)

Ensure the services and apps are available to our users (customers)

Ensure they perform optimally

Identify constraints, problems or configuration concerns

Learn from past behaviors and trends

Anticipate/Avoid capacity constraints vs. “reacting” to them and impact to users

It’s our job………I hope…

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20143

What to Monitor (for starters)

CPU

(User + System) >= 80%

Waiting on I/O >= 10% Possible IO bottleneck

Memory

Paging Page-In/Swap-In >= 5 per second

Scan/Free Ratio >= 4 Thrashing

Page/Swap Space Used >= 80% >90% Critical

Huge/Large pages Allocated >0 but Used=0 Waste

Network & Fiber Adapters

Running-Speed = Supported-Speed

Read/Write Throughput >= 80% Running-Speed

Load Balanced across adapters

HBA Queue Depth and Transfer Size settings give huge gains

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20144

What to Monitor………..

Filesystems

Space Used >= 90% Traditional check

Space Used >= 90% and Free < 1GB less Alerts

“/ and /var” Space Used > 95% and Free < 512MB Critical

I-nodes Used >= 90%

Disks

Write Size < 64KB and Writes/s > 20 and Service Time < 1ms

SAN storage today with write Cache should have all small to medium size writes be

< 1ms on average

Queue Depth, Algorithm and Transfer Size settings give huge gains

Processes

High CPU and/or Memory consumers

Runaway long running processes

Long running gradual memory growth (Memory Leak?)

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20145

What to Monitor………..

GPFS

All previously listed plus….

NSD’s are distributed equally and balanced across NSD servers

unless you designated specific Roles to NSD server pairs

Server and Client node GPFS specific Node/Filesystem stats

mmpmon, etc.

Special tuning cases arise with Large clusters, millions to billions of files, mixed large

and small files and the “behavior” access to them often will determine special design

considerations

Use of Meta-only NSD’s on dedicated disks using SSDs or Flash and dedicated

adapters for short size IOps intensive access away from large throughput IO

Contact IBM or the Galileo Performance team for assistance

Worker Threads

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20146

Daily Monitoring Steps (Methodology)

1. Cluster view – Check the Dashboard

2. Identify candidates to investigate…e.g. “What to Monitor”

2. Follow the data….charts…views....

3. View over a period of time

4. Determine usage mix and observed Peaks

* Make it easy with Galileo Performance Explorer GPFS and Storage agents…and new

automated Analytics capability!

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20147

Cluster view

Immediately 3 observations stand out! (May be ok…May not be….)

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20148

Investigate high CPU %Busy……which NODE?

Find out which node it is (Top: 1)…..gvicp8gpfsRH05….Lets look at Processes next

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20149

Investigate high CPU %Busy…found Node…which Process?

Find which Process(s)…(Top: 2)…runaway and every2hrs…3 & 1 Threads……..

* Checked with user…runaway is bad…every2hrs is Scheduled (good)…..

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201410

Investigate high IO Wait……which NODE?

Find out which node it is …..gvicp8gpfsaix04….next..look at nodes details

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201411

Investigate high IO…found Node…is problem HBA or Disks?

Found (4) HBAs…fcs0/fcs1 each 500MB/s…fcs2=100MB/s…fcs3=0….

* Problem was fcs3 not zoned…corrected…lets see what this improved…..

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201412

Investigate high IO…found Node…is problem HBA or Disks?

Corrected fcs3 zoning….now both fcs2 and fcs3 pushing 250MB/s each…

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201413

Investigate high IO…found Node…is problem HBA or Disks?

Fixed zoning, increased IO throughput…BUT…now caused a Memory Paging problem…

*……the OLD saying…Fixing one Perf problem often Exposes another!......

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201414

Eg. NSD Servers not Balanced (Clients constrained)

Looks like (1) NSD Server is doing all the work (gvicp8gpfsaix01)

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201415

NSD Servers not Balanced (Clients constrained) ……..

Identify what “File-System” is heavily used and the Client node(s)

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201416

Round-Robin NSD Server-list to Balance load

Changed NSD Server Order to Balance between gvicp8gpfsaix01 and …aix02

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201417

Switched the 2 Clients to Direct-attached-Node

Now Data intensive nodes can go Direct storage, major throughput improvement

….Yes…could do an all Infiniband Network…..

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201418

Galileo Analytics engine…minutes vs. hours of past 11-Slides….

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201419

Galileo Analytics engine…..Booth-22

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201420

E.g. Seq. 50/50 Read/Write 256K 8-Threads V7K-SAS

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201421

E.g. Seq. 50/50 Read/Write 256K 8-Threads Flash-840

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201422

We are seeking Use-Cases for input to Galileo PE Analytics engine for ‘automation’

– Lessons Learned / Best Practices / Thresholds as well

We have an Innovation Center lab where we test, demo and showcase technology

– Ideas to demo, POC, verify claims, etc. you would like to see us perform and share!

support@GalileoSuite.com or sales@GalileoSuite.com or churchey@TheATSgroup.com

…..Please contact us…..!!!!!!

Booth #22

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201423

Questions and Answers

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201424

We can help analyze and implement. Contact us!

Check-out Galileo Performance Explorer™

– Visit Booth #22 for a hands-on demo

– Sign-up for a trial at www.GalileoSuite.com

– Complimentary* no-strings attached 3 months use for Conference attendees

sales@GalileoSuite.com (484-320-4302)

www.GalileoSuite.com

* First time Galileo user

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201425

Deploying a big data solution using IBM GPFS-FPOhttp://public.dhe.ibm.com/common/ssi/ecm/en/dcw03051usen/DCW03051USEN.PDF

GPFS tuning guidelines for deploying SAShttp://www.sas.com/content/dam/SAS/en_us/doc/partners/ibm-gpfs-tuning-guidelines.pdf

GPFS Wiki – IBM DeveloperWorkshttps://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29

GSS / ESS https://www.ibm.com/developerworks/community/blogs/5things/entry/gpfs_storage_server?lang=en

Galileo Performance Explorerhttp://www.GalileoSuite.com

* First time Galileo user

Referenced Material