Current Trends in HPC

Current Trends in High Performance Computing

Dr. Putchong UthayopasDepartment Head, Department of Computer Engineering,

Faculty of Engineering, Kasetsart UniversityBangkok, Thailand.

pu@ku.ac.th

I am pleased to be here!

Introduction

• High Performance Computing– An area of computing that involve the

hardware and software that help solving large and complex problem fast

• Many applications– Science and Engineering research

• CFD, Genomics, Automobile Design, Drug discovery

– High Performance Business Analysis• Knowledge Discovery• Risk analysis• Stock portfolio management

– Business is moving more to the analysis of data from data warehouse

Why we need HPC?• Change in scientific discovery – Experimental to simulation and visualization

• Critical need to solve an ever larger problem– Global Climate modeling– Life science – Global warming

• Modern business need – Design more complex machinery– More complex electronics design– Complex and large scale financial system analysis– More complex data analysis

Top 500: Fastest Computer on Our Planet

• List of the 500 most powerful supercomputers generated twice a year (June and November)

• Latest was announced in June 2012

Sequoia @ Lawrence Livermore Lab

• BlugeneQ• 34 login node– 48 cpu/node 64GB

• 98304 node– 16 cpu/node 16GB

• IBM power 7 1,572,864 CPU, 1.6 PB RAM

• Peak 20132 TFlops

Performance Development

Projected Performance Development

Top 500: Application Area

Processor Just not running faster

• Processor speed keep increasing for the last 20 years

• Common technique– Smaller process technology – increase clock speed– Improve microarchitecture• Pentium, Pentium II, Pentium III, Pentium IV, Centrino,

Pitfall

• Smaller process technology let to denser transistor but….– Heat dissipation– Noise – reduce voltage

• Increase clock speed – More power used since CMOS

consume power only when switch

• Improve microarchitecture– Small improvement for a lot more

complex design • The only solution left is to use

concurrency. Doing many things at the same time

Parallel Computing• Speeding up the execution by splitting task into many

independent subtask and run them on multiple processors or core– Break large task into many small sub tasks– Execute these sub tasks on multiple core ort processors– Collect result together

How to achieve concurrency

• Adding more concurrency into hardware• Processor• I/O• Memory

• Adding more concurrency into software– How to express parallelism better in software

• Adding more concurrency into algorithm– How to do many thing at the same time– How to make people think in parallel

The coming (back) of multicore

Hybrid Architecture

InterconnectionNetwork

Rational for Hybrid Architecture

• Most scientific application has fine grain parallelism inside– CFD, Financial computation, image processing

• Energy efficient– Employing large number of slow processor and

parallelism can help lower the power consumption and heat

Two main approaches

• Using multithreading and scale down processor that is compatible to conventional processor– Intel MIC

• Using very large number of small processors core in a SIMD model. Evolving from graphics technology – NVIDIA GPU– AMD fusion

Many Integrated Core Architecture

• Effort by Intel to add a large number of core into a computing system

Multithreading Concept

Challenges

• Large number of core will have to divide memory among them– Much smaller memory per core– Demand high memory bandwidth

• Still need an effective fine grain parallel programming model

• No free lunch , programmer have to do some work

4 cores

What is GPU Computing?

Computing with CPU + GPUHeterogeneous Computing

Medical Medical Imaging Imaging U of UtahU of Utah

Molecular Molecular DynamicsDynamics

U of Illinois, U of Illinois, UrbanaUrbana

Video Video TranscodingTranscoding

Elemental TechElemental Tech

Matlab Matlab ComputingComputing

AccelerEyesAccelerEyes

AstrophysicAstrophysicss

RIKENRIKEN

Financial Financial simulationsimulation

OxfordOxford

Linear AlgebraLinear AlgebraUniversidad

3D 3D UltrasoundUltrasoundTechniscanTechniscan

Quantum Quantum ChemistryChemistry

U of Illinois, U of Illinois, UrbanaUrbana

Gene Gene SequencingSequencing

U of MarylandU of Maryland

Not 2x or 3x : Speedups are 20x to 150x

CUDA Parallel Computing Architecture

• Parallel computing architecture and programming model

• Includes a C compiler plus support for OpenCL and DX11 Compute

• Architected to natively support all computational interfaces (standard languages and APIs)

ATI’s Compute “Solution”

Compiling C for CUDA Applications

NVCC CPU Code

C CUDAKey Kernels

CUDA objectfiles

Rest of CApplication

CPU objectfiles

Linker

CPU-GPUExecutable

Simple “C” Description For Parallelism

void saxpy_serial(int n, float a, float *x, float *y){ for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i];}// Invoke serial SAXPY kernelsaxpy_serial(n, 2.0, x, y);

__global__ void saxpy_parallel(int n, float a, float *x, float *y){ int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i];}// Invoke parallel SAXPY kernel with 256 threads/blockint nblocks = (n + 255) / 256;saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);

Standard C Code

Parallel C Code

Computational Finance

Source: CUDA SDK

Financial Computing Software vendorsSciComp : Derivatives pricing modelingHanweck: Options pricing & risk analysisAqumin: 3D visualization of market dataExegy: High-volume Tickers & Risk AnalysisQuantCatalyst: Pricing & Hedging EngineOneye: Algorithmic TradingArbitragis Trading: Trinomial Options Pricing

Ongoing workLIBOR Monte Carlo market modelCallable Swaps and Continuous Time Finance

Source: SciComp

Weather, Atmospheric, & Ocean Modeling

Source: Michalakes, Vachharajani

CUDA-accelerated WRF availableOther kernels in WRF being ported

Ongoing workTsunami modelingOcean modelingSeveral CFD codes

Source: Matsuoka, Akiyama, et al

New emerging Standard

• OpenCL– Support by many vendor including apple– Target for both GPU based SIMD and multithreading– More complex to program that CUDA

• OpenACC– OpenACC is a programming standard for parallel

computing developed by Cray, CAPS, Nvidia and PGI– simplify parallel programming of heterogeneous

CPU/GPU systems.– Directives based

Cluster computing• The use of large number of server that linked on

a high speed local network as one single large supercomputer

• Popular way of building supercomputer • Software– Cluster aware OS

• Windows compute cluster server 2008• NPACI Rocks Linux

• Programming system such as MPI• Use mostly in computer aided design,

engineering, scientific research

Comment

• Cluster computing is a very mature discipline• We know how to build a sizable cluster very well– Hardware integration– Storage integration : Luster, GPFS– Scheduler: PBS, Torque, SGE, LSF– Programming MPI– Distribution : ROCKS

• Cluster is a foundation fabric for grid and cloud

TERA Cluster• 1 Frontend (HP

ProLiant DL360 G5 Server) and 192 computer nodes– Intel Xeon 3.2

GHz (Dual core, Dual processor)

– Memory 4 GB (8GB for Frontend & infiniband nodes)

– 70x4 GB SCSI HDD (RAID1)

• 4 Storage Servers– Lustre file

system for TERA cluster's storage

– Attached with Smart Array P400i Controller for 5TB space

August 29,2008 TGCC 2008, Khon Khan University , Thailand

Edge Switch 1Gbps EthernetEdge Switch 1Gbps Ethernet

FESunyata

FEAraya

WinHPC(FE)

TERA(FE)

SPARE1(FE)

SPARE2(FE)

FS1FS1

FS2FS2

FS3FS3

FS4FS4

4 nodes4 nodes 4 nodes4 nodes 64nodes

96 nodes +

16 sparenodes

200 Ports Gigabit Ethernet switch200 Ports Gigabit Ethernet switch

Storage Tier 5TB Lustre FS

Anatta(FE)

15nodes

KU Fiber Backbone (1Gbps Fiber)

2.5Gbps to UninetStorage 48 TB

1 Gbps Ethernet/Fiber

Grid Computing Technology

• Grid computing enables the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities.

• Just as an Internet user views a unified instance of content via the Web, a grid user essentially sees a single, large virtual computer.

Grid Architecture• Fabric Layer

– Protocol and interface that provide access to computing resources such as CPU, storage

• Connectivity Layer– Protocol for Grid-specific network

transaction such as security GSI• Resources Layer

– Protocol to access a single resources from application• GRAM (Grid Resource Allocation

Management)• GridFTP ( data access)• Grid Resource Information Service

• Collective layer– Protocol that manage and access

group of resources Fabric

Connectivity

Resources

Collective Layer

Application Layer

Globus asService-Oriented Infrastructure

Uniform interfaces,security mechanisms,Web service transport,

monitoring

Computers StorageSpecialized resource

UserApplication

GRAM GridFTPHost EnvUser Svc

Database

ToolTool Reliable

FileTransfer

MyProxy

Host EnvUser Svc

MDS-Index

Introduction to ThaiGrid

• A National Project under Software Industry Promotion Agency (Public Organization) , Ministry of Information and Communication Technology

• Started in 2005 from 14 member organizations

• Expanded to 22 organizations in 2008

TGCC 2008, Khon Khan University , ThailandAugust 29,2008

Thai Grid Infrastructure

1 Gbps

2.5 Gbps155 Mbps

155 Mbps

19 sitesAbout 1000 CPU core.

August 29,2008 TGCC 2008, Khon Khan University , Thailand

ThaiGrid Usage• ThaiGrid provides about 290

years of computing time for members– 9 years on the grid– 280 years on tera

• 41 projects from 8 areas are being support on Teraflop machine

• More small projects on each machines

Medicinal Herb Research• Partner

– Cheminormetics Center, Kasetsart Univesity (Chak Sangma and team)

• Objective– Using 3D-molecular databse and virtual

screening to verify the traditional medicinal herb

• Benefit– Scientific proof of the ancient

traditional drug – Benefit poor people that still rely on

the drug from medicinal herb – Potential benefit for local

pharmaceutical industry

TGCC 2008, Khon Khan University , Thailand

Virtual Screening

Infrastructure

Lab Test

August 29,2008

NanoGrid

• Objective– Platform that support computational Nano science

research• Technology used

– AccelRys Materials Studio– Cluster Scheduler: Sun Grid Engine and Torque

TGCC 2008, Khon Khan University , Thailand

ThaiGridThaiGrid MS-Gateway

MS-Gateway

Computing ResourcesComputing Resources

August 29,2008

Challenges

• Size and Scale• Manageability– Deployment– Configuration– Operation

• Software and Hardware Compatibility

Grid System Architecture• Clusters– Satellite Sets

• 16 clusters delivered from ThaiGrid for initial members

• Composed of 5 nodes of IBM eServer xSeries 336 – Intel Xeon 2.8Ghz (Dual

Processor)– x86_64 architecture– Memory: 4 GB (DDR2 SDRAM)

– Other sets• Various type of servers and

number of nodes • Provided by member institutes

of ThaiGrid

C C C C

HCC CC CC CC

CC CC CC CC

Grid Scheduler

Grid as a Super Cluster

Is grid still alive?• Yes, grid is a useful technology for certain task– Bit torrent for massive file exchange infrastructure– European Grid is using it to share LHC data

• Pit fall of the grid– Network is still not reliable and fast enoughlong term

operation– Multi-site , multi- authority concept make it very complex

for • system management• Security• User to really use the system

• Recent trend is to move to centralized cloud

What is Clouding Computing?

Source: Wikipedia (cloud computing)

GoogleGoogle

AmazonAmazon

YahooYahoo MicrosoftMicrosoft

SaleforceSaleforce

Why Cloud Computing?• The illusion of infinite computing resources available on

demand, thereby eliminating the need for Cloud Computing users to plan far ahead for provisioning.

• The elimination of an up-front commitment by Cloud users, thereby allowing companies to start small and increase hardware resources only when there is an increase in their needs.

• The ability to pay for use of computing resources on a short-term basis as needed (e.g., processors by the hour and storage by the day) and release them as needed, thereby rewarding conservation by letting machines and storage go when they are no longer useful.

Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC Berkeley

Cloud Computing Explained

• Saas (Software as a Services) Application delivered over internet as a services (gmail)

• Cloud is a massive server and network that serve Saas to large number of user

• Service being sold is called Utility computing

Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC Berekeley

Enabling Technology for Cloud Computing

• Cluster and Grid Technoogy– The ability to build a highly scalable computing

system that consists of 100000 -1000000 nodes• Service oriented Architecture– Everything is a service– Easy to build, distributed, integrate into large scale

aplication• Web 2.0– Powerful and flexible user interface for intenet enable

Cloud Service Model

Cloud Computing Software Stack

Architecture of Service Oriented Cloud Computing Systems (SOCCS)

User Interface

Cloud Application

Node Hardware

CSMDSS

Operating System

Interconnection Network

Node Hardware

Operating System

SOCCS can be constructed by combining CCR/DSS Software to form scalable service to a client application.

Cloud Service Management (CSM) acts as a resources management system that keeps track of the availability of services on the cloud.

Cloud System Configuration

A Proof-of-Concept ApplicationPickup and Delivery Problem with Time Window (PDPTW) Pickup and Delivery Problem with Time Window (PDPTW) is a is a

problem of serving a number of transportation requests based problem of serving a number of transportation requests based on limited number of vehicles.on limited number of vehicles.

The objective of the problem is to minimize the sum of the The objective of the problem is to minimize the sum of the distance traveled by the vehicles and minimize the sum of the distance traveled by the vehicles and minimize the sum of the time spent by each vehicle.time spent by each vehicle.

PDPTW on the cloud using SOCCS

Master/Worker Master/Worker model is adopted as model is adopted as a framework for a framework for service interaction.service interaction.

The algorithm is The algorithm is partitioned using partitioned using domain domain decomposition decomposition approach.approach.

Cloud application Cloud application control the control the decomposition of decomposition of the problem by the problem by sending each sub sending each sub problem to worker problem to worker service and collect service and collect the results back to the results back to the best answer.the best answer.

Vehicle queue (Port)

Solution queue (Port)

Enqueue work andThread Execute

Work queue(Dispatcher queue)

After execute. Send solution to port

Parallel Runtime Interface(Arbiter)

Solve PDPTWfunction

Number of vehicles

PDPTWfunction

Master thread Worker thread

Solution of PDPTW

Gather solutionfunction

After execute. Send result to output

Dispatcher

Solution of

Number of vehicles

Solution of

Number of vehicles

PDPTWfunction

Threads pool

Solution queue (Port)

Enqueue work andThread Execute

Work queue(Dispatcher queue)

After execute. Send solution to port

Solve PDPTWfunction

Number of vehicles

PDPTWfunction

Master thread Worker thread

Solution of PDPTW

After execute. Send result to output

Dispatcher

Solution of

Number of vehicles

Solution of

Number of vehicles

PDPTWfunction

Threads pool

Results

Speed up on a Speed up on a single node single node with 4 coreswith 4 cores

ResultsPerformance: Performance:

Speedup and Speedup and efficiency derived efficiency derived from average from average runtime on 1, 2, 4, runtime on 1, 2, 4, 8 and 16 compute 8 and 16 compute nodes.nodes.

We are living in the world of Data

GeophysicalExploration

Medical Imaging

VideoSurveillance

Mobile Sensors

Gene Sequencing

Smart Grids

Social Media

Big Data

“Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”

Reference: “What is big data? An introduction to the big data landscape.”, Edd Dumbill, http://radar.oreilly.com/2012/01/what-is-big-data.html

The Value of Big Data• Analytical use– Big data analytics can reveal insights hidden

previously by data too costly to process. • peer influence among customers, revealed by analyzing

shoppers’ transactions, social and geographical data. – Being able to process every item of data in reasonable

time removes the troublesome need for sampling and promotes an investigative approach to data.

• Enabling new products.– Facebook has been able to craft a highly personalized

user experience and create a new kind of advertising business

3 Characteristics of Big Data

Big Data Challenge• Volume– How to process data so big that can not be move, or

store. • Velocity– A lot of data coming very fast so it can not be stored

such as Web usage log , Internet, mobile messages. Stream processing is needed to filter unused data or extract some knowledge real-time.

• Variety– So many type of unstructured data format making

conventional database useless.

How to deal with big data

• Integration of – Storage – Processing– Analysis Algorithm– Visualization

Massive Data

Stream

Stream processing

Processing

VisualizeVisualize

Analysis

Storage

A New Approach For Distributed Big Data

• Disparate Systems• Manual Administration• One Tenant, Many Systems• IT Provisioned Storage

• Single System Across Locations• Automated Policies • Many Tenants One System• Self-Service Access

L.A. BOSTON LONDON L.A. BOSTON LONDON

Storage Islands Single Storage Pool

Hadoop• Hadoop is a platform for distributing computing problems across a

number of servers. First developed and released as open source by Yahoo.– Implements the MapReduce approach pioneered by Google in

compiling its search indexes.– Distributing a dataset among multiple servers and operating on the

data: the “map” stage. The partial results are then recombined: the “reduce” stage.

• Hadoop utilizes its own distributed filesystem, HDFS, which makes data available to multiple computing nodes

• Hadoop usage pattern involves three stages:– loading data into HDFS,– MapReduce operations, and– retrieving results from HDFS.

WHAT FACEBOOK KNOWS

http://www.facebook.com/data

Cameron Marlow calls himself Facebook's "in-house sociologist." He and his team can analyze essentially all the information the site gathers.

The links of Love• Often young women specify that

they are “in a relationship” with their “best friend forever”.– Roughly 20% of all relationships for

the 15-and-under crowd are between girls.

– This number dips to 15% for 18-year-olds and is just 7% for 25-year-olds.

• Anonymous US users who were over 18 at the start of the relationship– the average of the shortest number

of steps to get from any one U.S. user to any other individual is 16.7.

– This is much higher than the 4.74 steps you’d need to go from any Facebook user to another through friendship, as opposed to romantic, ties.

http://www.facebook.com/notes/facebook-data-team/the-links-of-love/10150572088343859

Graph shown the relationship of anonymous US users who were over 18 at the start of the relationship.

• Facebook can improve users experience – make useful predictions about users' behavior– make better guesses about which ads you might

be more or less open to at any given time• Right before Valentine's Day this year a blog

post from the Data Science Team listed the songs most popular with people who had recently signaled on Facebook that they had entered or left a relationship

Data Tsunami

• Data flood is coming, no where to run now!– Data being generated

anytime, anywhere, anyone– Data is moving in fast– Data is too big to move, too

big to store• Better be prepare– Use this to enhance your

business and offer better services to customer

The Opportunities and Challenges ofExascale Computing

• Summary of findings from many workshop in US.

• List issues needed to overcome

• We will present only some challenges

Hardware Challenges

• Major improvement in hardware is needed.

Power Challenge• Power consumption of the

computers is the largest hardware research challenge.

• Today, power costs for the largest petaflop systems are in the range of $5-10M60 annually

• An exascale system using current technology.– the annual power cost to operate

the system would be above $2.5B per year.

– The power load would be over a gigawatt

• The target of 20 megawatts, identified in the DOE Technology Roadmap, is primarily based on keeping the operational cost of the system in some kind of feasible range.

Memory Challenge

• Memory subsystem is too slow

Data Movement Challenge

System Resiliency Challenge

• For exascale systems, the number of system components will be increasing faster than component reliability, with projections in the minutes or seconds for exascale systems.

• Exascale systems will experience various kind of faults many times per day. – Systems running 100 million cores will continually

see core failures and the tools for• Dealing with them will have to be rethought.

“Co Design” Challenge‐

The Computer Science Challenges

• A programming model effort is a critical component– clock speeds will be flat or even dropping to save

energy. All performance improvements within a chip will come from increased parallelism. The amount of memory per arithmetic

– need for fine-grained parallelism and a programming model other than message passing or coarse-grained threads

Under the radar

• Mobile processor run super computer• Hybrid war! GPU VS. MIC• I/O goes solid state• Programming standard war– CUDA/ OpenCL/ OpenMP/ OpenACC

Summary

• We are in the challenging world• Demand for HPC system, application will

increase.– Software tool , technology, hardware is changing

to catch up.

• The greatest challenge is how to quickly develop software for the next generation computing system

THANK YOU

Current Trends in HPC

Education

Transcript of Current Trends in HPC

7. Current Trends

Recent HPC Research Trends and Strategy in the United States

Current Trends

DUNE on current and next generation HPC Platforms

Current Industry Trends Cleaning Disinfection Trends Disinfectant Validation Trends and Current

HPC Trends, Big Data And The Emerging Market For High ... · • IDC HPC Research Activities • HPC Market Update and Trends HPC Usage by Major Countries • Major IT Trends •

Recent Trends in Operating Systems and their Applicability to HPC · 2017. 11. 21. · Recent Trends in Operating Systems and their Applicability to HPC Arthur Maccabe, Patrick Bridges

GPU COMPUTING WITH MSC NASTRAN 2013pages.mscsoftware.com/rs/mscsoftware/images/Paper... · GPU COMPUTING WITH MSC NASTRAN 2013 Srinivas Kodiyalam, ... Current trends in HPC (High

Trends and Perspectives for HPC infrastructures Carlo Cavazzoni, CINECA.

HPC I/O for Computational Scientists - MCSpress3.mcs.anl.gov/atpesc/files/2015/03/hpc-aug7-830.pdf · HPC I/O for Computational Scientists BrentWelch ... – Pointto&emerging&and&future&trends&in&HPC&I/O

Current Trends - Management

HPC & Big Data Trends @HPE - Matej Bel Universityuninfos.umb.sk/zbornik/PPT/Saviak.pdf · Some things are just faster. HPC & Big Data Trends @HPE. Volodymyr Saviak. HPC CEE Sales

CURRENT AND FUTURE HPC SOLUTIONS

Current Trends sti

Current Job Trends

1 HPC Storage Current Status and Futures Torben Kling Petersen, PhD Principal Architect, HPC.

HPC Workshop Presentation - HPC Current Use and Practice: Industry

HPC Architectures – past ,present and emerging trends · Computational Science Trends in HPC technology Trends in HPC programming Massive parallelism Accelerators The scaling problem

SEG 2015 HPC Trends for Seismic Computing

Worldwide HPC Market Update and Trends - STFC EMS · PDF fileTop Trends in HPC The global HPC market began growing again in 2010 ... HPC market wasn’t hit as much ... TH-1A Application