Jason Stowe Condor Week 2009 April 22 nd, 2009. Coming to Condor Week since 2005. Started as a User.
-
Upload
tiffany-mckinney -
Category
Documents
-
view
215 -
download
0
Transcript of Jason Stowe Condor Week 2009 April 22 nd, 2009. Coming to Condor Week since 2005. Started as a User.
Jason StoweJason Stowe
Condor Week 2009Condor Week 2009
April 22April 22ndnd, 2009, 2009
Coming to Condor Week Coming to Condor Week since 2005. Started as a Usersince 2005. Started as a User
Users hunger for featuresUsers hunger for features
AccountingGroups (2004/2005)AccountingGroups (2004/2005)Configuration w/Pipes (2005/2006)Configuration w/Pipes (2005/2006)GroupResourcesUsed (2006/2007)GroupResourcesUsed (2006/2007)
Condor in Cloud (2007/2008)Condor in Cloud (2007/2008)Resource Weights (2008/2009)Resource Weights (2008/2009)
Based upon customer requestsBased upon customer requests
Focus on software development for Focus on software development for managing Condor at any scale,managing Condor at any scale,
and provide services that and provide services that complement the technologycomplement the technology
Universities, Fortune 500s, Universities, Fortune 500s, Government Labs, Small/Medium Government Labs, Small/Medium
Businesses, that use CondorBusinesses, that use Condor
Users like Condor because...Users like Condor because...It’s open, it works, flexible, It’s open, it works, flexible, (corporations) no lock-in (corporations) no lock-in
API/Operating System, and...API/Operating System, and...
The CommunityThe Community
Today, let’s talk about Today, let’s talk about a few challenges, solutionsa few challenges, solutions
War Story #1: War Story #1: Compute & DataCompute & Data
Whenever you find or solveWhenever you find or solvea computation problem, youa computation problem, you
discover a data problem.discover a data problem.
““Dark” or Latent, Unused StorageDark” or Latent, Unused Storageon any OS/Deviceon any OS/Device
Empty space dispersed across Empty space dispersed across machines in unusable sizesmachines in unusable sizes
““We need more filer space, but we We need more filer space, but we have empty space on all our have empty space on all our
machines.”machines.”
So we looked at HadoopSo we looked at Hadoop
New type of storage:New type of storage:Aggregated or “Cloud” StorageAggregated or “Cloud” Storage
Block Store ArchitectureBlock Store Architecture
But how do we use it?But how do we use it?
1.5 years ago: It works well 1.5 years ago: It works well to access it in Java, but what to access it in Java, but what
about mounting?about mounting?
So we tried WebDAVSo we tried WebDAV
Next up,Next up,open source FUSE driveropen source FUSE driver
Need: Windows/Linux, Reliable, Large Need: Windows/Linux, Reliable, Large Files, scalable, and Read/WriteFiles, scalable, and Read/Write
Mountable drivers Mountable drivers Linux(FUSE) / Windows (IFS)Linux(FUSE) / Windows (IFS)
CloudFS ArchitectureCloudFS Architecture
When we rolled it out...When we rolled it out...
Customers Asked for Customers Asked for Surprising FeaturesSurprising Features
HTTP/REST Protocols similar to Amazon S3HTTP/REST Protocols similar to Amazon S3Reasons: Reasons:
Installing mountable driver across Installing mountable driver across servers/workstations prohibitiveservers/workstations prohibitive
Want similar interface to various cloud storage Want similar interface to various cloud storage providers => Internal Cloudproviders => Internal Cloud
FTP Interface – Because it is simple!FTP Interface – Because it is simple!
Status TodayStatus Today
Mountable Multi-platform Drivers. Mountable Multi-platform Drivers. Linux: SUSE 10, RHEL/CentOS 4&5, Linux: SUSE 10, RHEL/CentOS 4&5,
Windows 2k3 +, OSX 10.3+Windows 2k3 +, OSX 10.3+
Encryption to avoid snooping Encryption to avoid snooping sensitive datasensitive data
Data Nodes built on Java: Linux, Data Nodes built on Java: Linux, Windows, OSX, SolarisWindows, OSX, Solaris
RESTful Storage Service & RESTful Storage Service & FTP interfaceFTP interface
Management interface for Management interface for controlling storage featurescontrolling storage features
(Integrating with CycleServer)(Integrating with CycleServer)
Looking forward to Looking forward to condor_hadoop!condor_hadoop!
War Story #2: War Story #2: Cloud CalculationsCloud Calculations
Condor usersCondor usersPeak vs. Median usagePeak vs. Median usage
ProblemProblem
Need for compute power Need for compute power comes up suddenlycomes up suddenly
Condor Users hunger for Condor Users hunger for resourcesresources
Condor users balance Condor users balance “We need more servers for big “We need more servers for big
runs” and “Our servers are 40% runs” and “Our servers are 40% utilized”utilized”
Many ways to solve Many ways to solve this problem using EC2this problem using EC2
Use cases do exist for Use cases do exist for adding nodes to a local condor pooladding nodes to a local condor pool
using Amazon EC2using Amazon EC2
We favored entire poolsWe favored entire poolsin cloudin cloud
Data Scheduling, Data Scheduling, Performance issuesPerformance issues
Run workflows faster using Run workflows faster using resources you could never buy...resources you could never buy...
can test CycleServer at a scale can test CycleServer at a scale our users have and we don’tour users have and we don’t
Need 1000 node Condor PoolNeed 1000 node Condor PoolWait 15 minutesWait 15 minutes
Dynamic Resources => Dynamic Resources => Pool can be sized to the jobsPool can be sized to the jobs
1 core1 core x x 1000 hrs 1000 hrs ==1000 core 1000 core x x 1 hr 1 hr = = ~$200~$200
Sounds good, but how Sounds good, but how do we do this for a do we do this for a
Workflow like BLAST?Workflow like BLAST?
From e-science 2008:From e-science 2008:For 64x the processorsFor 64x the processors
Hadoop Running Blast: 57xHadoop Running Blast: 57xmpiBLAST: 52.4xmpiBLAST: 52.4x
High-CPU Amazon EC2 nodesHigh-CPU Amazon EC2 nodeshave best price/performancehave best price/performance
Scalability: 2x CPUs = 1.9825xScalability: 2x CPUs = 1.9825x64 CPUS = 60.7x Speed-up64 CPUS = 60.7x Speed-up
Why High Throughput leads toWhy High Throughput leads toEfficient ComputingEfficient Computing
Another User:Another User:Worked with Varian - Worked with Varian - Mass SpectrometersMass Spectrometers
Other High-Tech Other High-Tech Lab EquipmentLab Equipment
Problem: Coming up on Problem: Coming up on a conference, needed to run a conference, needed to run
a large simulationa large simulation
Six WeeksSix WeeksOn an internal Condor poolOn an internal Condor pool
Deployed a Condor poolDeployed a Condor poolin CycleCloudin CycleCloud
Same 6-week Job Same 6-week Job
Ran < 1 DayRan < 1 Day
War Story #3: War Story #3: ManagementManagement
Condor Tutorial mentionsCondor Tutorial mentions“Why use a personal Condor?”“Why use a personal Condor?”
i.e. Condor on few nodes...i.e. Condor on few nodes...
Condor on 1 computer Condor on 1 computer Gets you policies, Gets you policies,
fault-tolerance, Etc. fault-tolerance, Etc.
Similarly, management issues Similarly, management issues come up even on small poolscome up even on small pools
Collaborating with U. of W. Collaborating with U. of W. MadisonMadison
Managing Configuration Files Managing Configuration Files (our Config with Pipes CW2006)(our Config with Pipes CW2006)
Exploring ClassAds/LogFilesExploring ClassAds/LogFilesbecomes problematicbecomes problematic
Visualization, Reporting, etc.Visualization, Reporting, etc.
Man-decades on development Man-decades on development of tools to assist running Condorof tools to assist running Condor
Have demo against Madison poolHave demo against Madison poolCome see me. We’d love Come see me. We’d love
more use casesmore use cases
Questions? Thank youQuestions? Thank you
For more information go to:For more information go to:http://www.cyclecomputing.comhttp://www.cyclecomputing.com
We constantly see opportunities for talented We constantly see opportunities for talented Condor folks, so please feel free to contact us!Condor folks, so please feel free to contact us!
Jason StoweJason Stowejstowe - cyclecomputing.comjstowe - cyclecomputing.com