Agenda
• What’s new in iCER (Wolfgang)• Whats new in HPCC (Bill)• Results of the recent cluster bid• Discussion of buy-in (costs, scheduling)• Other
New iCER Website
• Part of VPRGS– News– Showcased
Projects– Supported
Funding– Recent
Publications
http://icer.msu.edu
User Dashboard
• Common Portal to User Resources– FAQ– Documentation– Forums– Research
Opportunities– Known Issues
http://wiki.hpcc.msu.edu
Current Research Opportunities
• NSF Postdoc Fellowships for Transformative Computational Science using CyberInfrastructure
• Website– Proposals– Classes– Seminars– Papers– Jobs
http://wiki.hpcc.msu.edu
• 50/50 match from iCER for a postdoc for large grant proposals (multi-investigator, inter-disciplinary)
• Currently only three matches picked up– Titus Brown– Scott Pratt– Eric Goodman
• Several other matches promised, but grants not decided yet
• More opportunities!
Postdoc Matching
• Interdisciplinary graduate education in high-performance-computing & science
• Big Data• Leads:– Dirk Colbry– Bill Punch
IGERT Grant Proposal
• NSF STC– Funded, starting in June– $5M/year for 5 years
• New joint space with iCER & HPCC– First floor BPS– Former BPS library space
BEACON
Graphics Cluster
32 node cluster• 2 x Quad 2.4GHz• 18GB ram• Two Nvidia M1060• no Infinband (Ethernet
only)
Result of a Buyin
• 21 of the nodes were purchased by funds from users
• Can be used by any HPCC user
Each nVidia Tesla M1060
• Number of Streaming Processor Cores 240• Frequency of processor cores 1.3 GHz• Single Precision peak floating point performance 933 gigaflops• Double Precision peak floating point performance 78 gigaflops• Dedicated Memory 4 GB GDDR3 • Memory Speed 800 MHz• Memory Interface 512-bit • Memory Bandwidth 102 GB/sec• System Interface PCIe
Example Script
#!/bin/bash –login#PBS –l nodes=1:ppn=1:gfx10,walltime=01:00:00#PBS –l advres=gpgpu.6364,gres=gpu:1
cd ${PBS_O_WORKDIR}module load cudamyprogram myarguments
CELL Processor
2 Playstation 3’s• running linux• for experimenting
with CELL• dev-cell08 and test-
cell08 (see the web for more details)
Green Restrictions
• The machine Green is still up an running, especially after having removed some problematic memory
• Mostly replaced by AMD fat nodes• On April 1st, it will be reserved for jobs
requesting 32 cores (or more) and/or 250 GB of memory (or more)
• Hope to help people running larger jobs
HPCC Stats
• Ganglia (off the main web page, Status) is back and working. Gives you a snapshot of the present system
• We are nearly done with a database of all run jobs that can be queried for all kinds of information. Should be up in the next couple of weeks.
How it was done
• HPCC submitted a Request for Quotes for a new cluster system.
• Targeted:– performance vs. power main concern– Inifinband– 3GB per core of memory– approximately $500K of cluster
Results
• Received 13 bids from 8 vendors• Found 3 options that were suitable for the
power, space, cooling and performance we were looking for.
• Looking for some guidance from you on a number of issues
Choice 1: Infiniband config
Two ways to configure Infiniband:• series of smaller switches configured in a
hierarchy (leaf switches)• one big switch (director)• leaf switches are cheaper, harder to
expand (requires reconfiguration), more wires, more points of failure
• director is more expandable, convenient, expensive
Choice 2: Buyin Cost
• buyin cost could reflect just the cost of the compute nodes itself, HPCC provides infrastructure (switches, wires, racks, etc.)
• buyin cost could reflect the total hardware cost
• obviously, subsidizing costs means cheaper buyin costs, fewer general nodes.
Remember
• HPCC is still subsidizing costs, even if hardware is not subsidized
• still must buy air-conditioning equipment, OS licenses, MOAB (scheduling) licenses, software licenses (Not to mention salaries, power)
• Combined, “other” hardware will run to about $75K
• scheduler about $100K for 3 years.
Some Issues
• 1 node = 8 cores, 1 chassis = 4 nodes. • Buyin will be at the chassis level (32 cores)
For 1024 cores
Vendor/config Total Per node/subsidized
Per node/full
Dell/leaf $418K $2,278 ($9,112)
$3,260 ($13,040)
HP/leaf $460K $2,482 ($9,928)
$3,594 ($14,376)
Dell/director $523K $2,278 ($9,112)
$4,086 ($16,344)
Scheduling
• We are working on some better scheduling methods. We think they have promise and would be very useful to the user base
• For the moment, it will be the Purdue model. We guarantee access to nodes within 8 hours of a request from a buyin user. Still a week max run time (though can be changed)
Top Related