National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for...

Post on 26-Mar-2015

216 views 0 download

Transcript of National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for...

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Web100

Wendy Huntoon - PSC

Jim Ferguson - NCSA

I2 Members Meeting

May 2002

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Outline

• Project Overview– Motivation: What is the problem– Web100 Collaboration

• Progress to Date– Standardization Process– Code Release

• Code Capabilities• Overview of Users• Web100 Resources

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Motivations: What’s the Problem?

• High performance flows slower than line rate– Delays continue/increase even with higher bandwidth

• TCP tuning issues are non-trivial• Poorly conceived stacks• Router/switch buffer queues inadequate• Slow start and AIMD algorithm • Eliminate/dramatically reduce the “wizard gap”• Need for kernel instrumentation set for TCP variables

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

The Wizard Gap

TCP over a long haul path

Year Wizards Non-wizards Ratio

1988 1Mb/s 300kb/s 3:1

1991 10Mb/s

1995 100Mb/s

1999 1Gb/s 3Mb/s 300:1

Scientists/researchers not happy with this

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

TCP tuning is painful debugging

• All problems limit performance– IP routing, long round trip times

– Improper MSS negotiations or path MTU discovery

– IP Packet reordering

– Packet losses, congestion, lame hardware

– TCP sender or receive buffer space

– Inefficient applications

• Any one problem can mask all the others and confound all but the best (and few) tuning gurus

• Need for better diagnostics and visibility into problems

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Goal and Method

• Make it “easy” (transparent) for non-experts to achieve higher throughput performance

• Enhance TCP capabilities with better (finer grain) kernel instrumentation and automatic controls

• Real time triage capability determines sender, receiver, and/or network bottlenecks

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Why Focus on TCP

• TCP has an ideal vantage point into throughput problem space

• TCP can identify bottleneck subsystem(s)

• TCP already measures the network (some)

• TCP can measure the application

• TCP can adjust itself (auto-tuning feedback)

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Web100 Collaboration

• Funded by the NSF– Currently Year 2 of a 3 Year grant.– Cisco URP for initial seed funding.

• Collaborators– PSC (Matt Mathis, R. Reddy, Janet Brown,

John Heffner)– NCAR (Peter O’Neil, Marla Meehl)– NCSA (John Estabrook, Tanya Brethour,

Stephen Engelhardt, Jim Ferguson)

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

What is in the code

• Web100 software consists of:– TCP Kernel Instrument Set (TPC-KIS)

• Instruments coded directly in to the Operating System kernel.

– Derived Instrument Set (DIS)• Information that is collected based on KIS

parameters.

– Application Code• Tools, applications, etc. that use the information

provided by the KIS and DIS.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Kernel Instrument Set

• Definition– Set of instruments designed to collect as much of the

information as possible to enable a user to isolate the performance problems of a TCP connection.

• How it is implemented– Each instrument is a variable in a "stats" structure that

is linked through the kernel socket structure.

– The Linux /proc interface is used to expose these instruments outside the kernel.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

What is the TCP-KIS?

• TCP-KIS instruments group naturally into categories.– Currently roughly 19 categories.

• Already more than 125 instruments have been developed.• For each instrument:

– Precise (standards ready) definition.– Instrument code in the kernel– Implementation verification tests

• Does the kernel implementation meet the definition.

• Prototype diagnostic tool(s) to demonstrate functionality and effectiveness.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

TCP-KIS

• Basic instrumentation examples• Connection ID: 5-tuple that uniquely

identifies a connection.• State: determines what protocol features or

algorithms are enabled.• Traffic out: statistics aggregate packets and

traffic sent out on a connection.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Local Sender Triage

• Group of instruments associated with the local sender.– Determine what subsystems are throttling TCP

data transmission.– Three parallel sets of instruments that measure:

• Receiver Window

• Network Congestion

• Senders Availability

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Local Sender Groups

• Other groups of instruments associated with the Local Sender:

• Local Sender Congestion Model

• Local Sender Loss Model

• Local Sender Re-order Model

• Local Sender RTT

• Local Sender Segment Size

• Local Sender Bottlenecks

• Local Sender Tuning

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Other Instruments

• Similar instruments for the Local Receiver.• Observed Receiver instruments

– Often inferred from the data stream.

– E.g, Observed Receiver - receivers state is inferred from the ACK stream.

• Application Interface– Future instruments to collect statistics on how the

application is using the network.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Userland Distribution

• Released asynchronously with kernel distribution

• Currently at Alpha 1.1– Version 1.2 release imminent

• Consists of– The web100 library– Command line utilities– GUI utilities

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Web100 Library

• Web100 kernel exposes critical TCP variables/instruments through /proc

• Web100 library provides the necessary access functions to access these variables/instruments

• Functions– Read the value of a variable/instrument– Snap shot of a group (facilitates atomic reading of a group of

variables)– Modify tunable variables (ex. send buffer size)– Etc …

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Utilities

• Command line utilities– Useful in batch scripts– Serve as demo codes for the usage of web100

library

• GUI utilities– Based on GTK+– Useful for troubleshooting network

applications– Serve as examples for application developers

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

GUI Sample Screens – DTB

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Connection Selector

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Looking at a Variable

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Timeline - Year 1

• Alpha code development• Establish User Support

– www.web100.org

• Initial User Community– Very limited to begin with.

– Knowledgeable users, expected to provide technical input on the code.

– Understand and develop applications.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Timeline - Year 2

• Began standardization process.– Develop MIB– Submit to IETF

• Develop public code– Fix bugs in alpha versions– Add instrumentation– Code release

• Continue code development– Identify and add new instruments

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Code Releases - To date• Initial Release

– Alpha0.2, released May 23, 2001– Alpha0.3, released Sept. 19, 2001

• Alpha 1.0-Separation of Kernel and Userland code– Kernel Patch:

• Alpha 1.1 for Linux 2.4.16, released March 18, 2002• Alpha 1.0, released March 1, 2002 • Alpha 1.0, released February 26,2002

– Userland:• Alpha 1.1, released February 28, 2002• Alpha 1.0, released February 26,2002

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Timeline - Year 3

• New pathprobe diagnostic tool (wip, unreleased).• Add another 10-12 instruments.• Review instruments and code with other wizards.• Gain vendor support for ideas and code.• Finalize IETF draft by December IETF meeting.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Milestones

• Over a year of ~ 30 alpha testers – Including: SLAC, ORNL, LBNL, and universities

– www.net100.org

• Modified Linux kernel supports 2.4.16• Separation between KIS and library functions• draft-ietf-tsvwg-tcp-mib-extension-00.txt• draft-ietf-ipngwg-rfc2012-update-01.txt

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Web100 Collaborator Activity

• Rich Carlson, ANL• Tom Dunnigan, ORNL• Tom Hacker, U. of Michigan• Doug Chang, SLAC• Andreas Burkhardt & Matt Grob, Qualcomm• Larry Dunn & Scott Dier, Cisco/U. of Minnesota• Jason Lee, LBL

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Collaborator Assistance

• Bugs!– Kernel– Utilities– Release

• Request new features• Review and criticize documentation

– Way too easy on us

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Collaborator Activity

• Carlson/ANL working on a troubleshooting guide for LANs.

• Set up network of 13 identically equipped PIII connected via Cisco 5500 network switch, running Web100-enabled Linux.

• Introduces typical network faults (duplex mismatches, other config errors) and analyzes data for “signatures” of these faults.

• Modified Iperf 1.2 to collect variables and reverse flow.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Collaborator Activity

• Dunnigan/ORNL has found web100 helpful in seeing losses/retransmission and congestion avoidance parameters of individual TCP flows, and for tuning flows

• Has developed a Web100-enabled ttcp• Has developed a daemon that logs web100 variables for

designated paths when a flow closes• Has developed an autotuning daemon that uses web100 to

tune flows, including modifications to web100 to support "event notification", so the daemon knows when a new flow/socket is opened

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Collaborator Activity

• Hacker/U.Michigan has been using the web100 software to help tune and diagnose end-to-end network performance problems across the U-M campus network as well as across Abilene for the Visible Human and Atlas projects at U-M.

• Chang/SLAC is looking to fix performance problem between Linux and Solaris machines.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Collaborator Activity

• Qualcomm is using Web100 to measure TCP performance over certain types of high speed wireless links under development. Web100 is partially integrated into some other tools - in the sense that output reports are published automatically in a format similar to other tools Qualcomm uses.

• Dunn/Cisco currently using Web100 for a class at U.Minnesota. Includes accounts on test machine at NCSA.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Collaborator Activity

• Lee/LBL has obtained accounts at SLAC and ANL for WAN testing, and have co-located one of our machines in Washington D.C. to do testing over SuperNet. Still in the process of testing all this out.

• Keith Jackson at LBL has written Python wrappers to the Web100 calls using swing.

National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications

Web100 Summary

• Main WWW site: www.web100.org• Freely available software distribution

– www.web100.org/download– hundreds of downloads

• Please be cognizant of impacts on others• Please use, test, provide feedback, contribute code • IETF standards process to benefit all• Attention turning to working with OS vendors to

incorporate standards enhancements into their stacks