Recent Efforts on the Ninf Project and the Asia-Pacific Grid (ApGrid) Satoshi Matsuoka Tokyo Inst....

32
Recent Efforts on the Ninf Proj ect and the Asia-Pacific Grid (ApGrid) Satoshi Matsuoka Satoshi Matsuoka Tokyo Inst. Technology/JST Tokyo Inst. Technology/JST [email protected] [email protected] SEKIGUCHI, Satoshi SEKIGUCHI, Satoshi Electrotechnical Laboratory, Electrotechnical Laboratory, AIST(TACC), MITI AIST(TACC), MITI [email protected] [email protected] Several slides are courtesy of Grid peo
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Recent Efforts on the Ninf Project and the Asia-Pacific Grid (ApGrid) Satoshi Matsuoka Tokyo Inst....

Recent Efforts on the Ninf Project and the Asia-Pacific

Grid (ApGrid)

Recent Efforts on the Ninf Project and the Asia-Pacific

Grid (ApGrid)Satoshi MatsuokaSatoshi Matsuoka

Tokyo Inst. Technology/JSTTokyo Inst. Technology/[email protected]@is.titech.ac.jp

SEKIGUCHI, Satoshi SEKIGUCHI, Satoshi Electrotechnical Laboratory,Electrotechnical Laboratory,

AIST(TACC), MITIAIST(TACC), [email protected]@etl.go.jp

Several slides are courtesy of Grid people

What is ApGrid?What is ApGrid?

A meeting point for A meeting point for allall Asia-Pacific Asia-Pacific HPCN researchersHPCN researchers..doing grid-related work..doing grid-related work

Communication channel to the Global Communication channel to the Global Grid Forum, and other grid Grid Forum, and other grid communitiescommunities

Pool for finding international project Pool for finding international project partnerspartners

NotNot a single source funded “project”! a single source funded “project”!

APAN: http://apan.net

Europe

Exchange Point

Access Point

Current Status

Planned

South Korea

Japan

China

Hong Kong

Malaysia

Singapore

Indonesia

Australia

Philippines

TransPAC(100 Mbps)

North America(STARTAP)

Latin America

Europe

Australia-Japan Link(1.5Mbps Frame Relay)

Thailand

ACSys

Success2: Tsukuba Advanced Computing Center (TACC): SC99 HPC Games w/Pittsburgh, Stuttgart, Manchester, etc.

Success2: Tsukuba Advanced Computing Center (TACC): SC99 HPC Games w/Pittsburgh, Stuttgart, Manchester, etc.

APAN

TACC

National Backbones for Japaese AcademiaNational Backbones for Japaese Academia

nGrid/eGrid Partners

TACC: Tsukuba Advanced Computing CenterOsaka: Osaka UniversityRWCP: Real World Computing PartnershipTIT: Tokyo Institute of Technology Waseda: Waseda University

APAN Tokyo

RWCP

TIT

Waseda

Osaka

TransPAC100Mbps

vBNS

TACC

STAR TAPChicago

IMnet

WIDE

SINET

10Mbps

10Mbps

100Mbps

135Mbps

100Mbps

155Mbps

Australia

1.5Mbps Frame Relay

384Kbps

1.5Mbps

135Mbps

Super SINETSuper SINET

Similar to Internet 2Similar to Internet 2 10GBps backbone10GBps backbone interconnecting major Ja interconnecting major Ja

panese Universitiespanese Universities 10/2.4 GBps link10/2.4 GBps link to each Univ. to each Univ. Collaboration with other national 10GBps bCollaboration with other national 10GBps b

ackbone projectsackbone projectsE.g., 10GBps backbone in Tsukuba areaE.g., 10GBps backbone in Tsukuba area

APGrid Locations/Potential PartnersAPGrid Locations/Potential Partners

Japan Japan AIST/TACC/ETLAIST/TACC/ETL

National Institute of Advanced Industrial SNational Institute of Advanced Industrial Science and Technologycience and Technology

Tokyo Institute of TechnologyTokyo Institute of Technology Waseda U,Waseda U, Osaka-u,Osaka-u, Nara Advanced Institute of S & TNara Advanced Institute of S & T HEPL (DataGrid)HEPL (DataGrid)

AustraliaAustralia ANU, Monash UANU, Monash U

United StatesUnited States PNNLPNNL

Korea (KORDIC, )Korea (KORDIC, ) Singapore (NUS)Singapore (NUS) MalaysiaMalaysia ThailandThailand ROC,ROC, Hong Kong,Hong Kong, TaiwanTaiwan Other APAN membersOther APAN members

ApGRID: motivations (1)ApGRID: motivations (1)

Establish a regional wide testbed for global Establish a regional wide testbed for global computing (Grid and/or Meta)computing (Grid and/or Meta)Disseminating research activitiesDisseminating research activitiesProviding an easy-access environment for reseProviding an easy-access environment for rese

archers, students, vendors, etc.archers, students, vendors, etc.Improving interoperability of existing toolsImproving interoperability of existing toolsTestbed for software development and trial to Testbed for software development and trial to

have evaluation of usability and to archive perfhave evaluation of usability and to archive performance numbersormance numbers

Finding demonstrative applicationsFinding demonstrative applications

ApGRID: motivations (2)ApGRID: motivations (2)

Create a competitive/collaborative community to thCreate a competitive/collaborative community to the iGRID and the eGRID for:e iGRID and the eGRID for: Making international collaborationsMaking international collaborations Supporting and collaborating with network people, ex. APSupporting and collaborating with network people, ex. AP

AN, IM-net, etc. AN, IM-net, etc. Attempt to negotiate for standardization with real experieAttempt to negotiate for standardization with real experie

nce (in Global Gridforum.)nce (in Global Gridforum.) Also, domestic (intra-country) serviceAlso, domestic (intra-country) service

Nation-wideNation-wideSeveral “non-cooperative” network communities Several “non-cooperative” network communities Seeking governmental and/or industrial fundingSeeking governmental and/or industrial funding

Campus-wideCampus-wideFind Volunteers within our friendsFind Volunteers within our friends

ApGrid ResourcesApGrid Resources

Gov. Lab. and Univ. Supercomputing centerGov. Lab. and Univ. Supercomputing centerssMITI TACC, HEPL (DataGrid), etc.MITI TACC, HEPL (DataGrid), etc.

Individual Univ. LabIndividual Univ. LabTITECHTITECHWaseda Univ.Waseda Univ.Nara AISTNara AISTEtc.Etc.

Network configuration at TACC(AIST Supercomputing Center)Network configuration at TACC(AIST Supercomputing Center)

FEx8GbE

Firewall

Internet135Mbps/2.4Gbps

RS6000/SP/128200+GFlops

SR8000/64512GFlops

Clusters

TACC ResourcesTACC Resources

High Performance Computing SystemHigh Performance Computing System SR8000, RS/6000 SP, UE10000, etc.SR8000, RS/6000 SP, UE10000, etc. Super Clusters (Alpha Ev56x40, Ev6x256, …)Super Clusters (Alpha Ev56x40, Ev6x256, …)

High Speed NetworkHigh Speed Network Giga bit campus backboneGiga bit campus backbone ATM Megalink national backboneATM Megalink national backbone

15 national laboratory over Japan15 national laboratory over Japan High Speed Internet AccessHigh Speed Internet Access

IM net 135Mbps, StarTAP 100Mbps via TransPAC/APANIM net 135Mbps, StarTAP 100Mbps via TransPAC/APAN Highly functional Data BaseHighly functional Data Base

RIO DBRIO DB Visit http://www.aist.go.jp/RIODBVisit http://www.aist.go.jp/RIODB

Hitachi SR8000/64 (sr8k)Hitachi SR8000/64 (sr8k)

Power PC + PVP + HXBPower PC + PVP + HXB 64 nodes64 nodes 512 Gflops (peak)512 Gflops (peak) 449.7Gflops (linpack)449.7Gflops (linpack) 512GB memory512GB memory 2D cross bar network2D cross bar network 1.98TB Disks1.98TB Disks R&D, Parallel Program developmentR&D, Parallel Program development + 8 nodes Front-end for interactive usa+ 8 nodes Front-end for interactive usa

ge (ex. Global Computing)ge (ex. Global Computing)

IBM RS/6000 SP (rssp)IBM RS/6000 SP (rssp)

Power3 SMP 2CPU, Winter hawkPower3 SMP 2CPU, Winter hawk 128 nodes128 nodes 205 Gflops (peak)205 Gflops (peak) 149.36Gflops (linpack)149.36Gflops (linpack) 256GB memory256GB memory High speed swtichHigh speed swtich 3.3TB (user 873GB) Disks3.3TB (user 873GB) Disks ISV applications’ platformISV applications’ platform + 4way/350MHz P3 x 8nodes front-end WH-II+ 4way/350MHz P3 x 8nodes front-end WH-II

Bamboo Alpha ClusterBamboo Alpha Cluster

256 Alpha EV6 500Mhz, 51256 Alpha EV6 500Mhz, 512MB, 256 GFlops Peak2MB, 256 GFlops Peak

Two-stage Gigabit Ethernet Two-stage Gigabit Ethernet Switch (Myrinet 2K?)Switch (Myrinet 2K?)

Special Compact-PCI PackaSpecial Compact-PCI Packaging by Alta Tech.ging by Alta Tech.

Linux-based, Commodity SoLinux-based, Commodity Software (Beowulf)ftware (Beowulf)

$5 mil$5 mil Production Cluster, OperatiProduction Cluster, Operati

onal RSNonal RSN

TITECH Matsuoka Lab. Grid ClustersTITECH Matsuoka Lab. Grid Clusters

““Very” Commodity Very” Commodity clusters as Grid Rclusters as Grid Resources and Resesources and Research pltfm.earch pltfm.

Current 2 clustersCurrent 2 clusters192 procs total192 procs total

6 clusters, over 46 clusters, over 400procs/400GFlo00procs/400GFlops by 1Q 2001, Gps by 1Q 2001, Gigabit linkageigabit linkage

The PRESTO Grid Clusters at Matsuoka Lab, TITECH for 2000The PRESTO Grid Clusters at Matsuoka Lab, TITECH for 2000 Presto IPresto I

64 PII-350, 256MB/node64 PII-350, 256MB/node Linux + RWC Score + our stuffLinux + RWC Score + our stuff Semi production, parallel OR algorithm on Semi production, parallel OR algorithm on

the Gridthe Grid Presto IIPresto II

64 Celeron-900, 512MB/node, multiple i64 Celeron-900, 512MB/node, multiple interconnectnterconnect

Grid Simulation, HP JavaGrid Simulation, HP Java ProsperoProspero

64nodex2proc SMP PIII-824, 640MB, 3-tr64nodex2proc SMP PIII-824, 640MB, 3-trunked 100Base-T (will be 192proc RSN wunked 100Base-T (will be 192proc RSN w/6TB disks)/6TB disks)

General-purpose cluster research, Grid siGeneral-purpose cluster research, Grid simulation, app. Run (incl. Mcell over the Pmulation, app. Run (incl. Mcell over the Pacific)acific)

ProntoPronto > 64Athlon, > 1.1Ghz, > 512> 64Athlon, > 1.1Ghz, > 512

MB DDR-DRAM, Hybrid 1000MB DDR-DRAM, Hybrid 1000/100Base-T/100Base-T

Semi-production, 1Q2001Semi-production, 1Q2001 PortoPorto

Plug & Play ClusteringPlug & Play Clustering 32 High-Performance Notebo32 High-Performance Notebo

oks (600Mhz Mobile Celeron)oks (600Mhz Mobile Celeron) PintoPinto

16-32 node Alpha cluster16-32 node Alpha cluster Heterogeneous Clustering ovHeterogeneous Clustering ov

er the Grid er the Grid Total >400nodesTotal >400nodes

Grid Cluster Research at TITECHGrid Cluster Research at TITECH

Grid Simulation and PerformanGrid Simulation and Performance Benchmarkingce Benchmarking

Cluster Federation w/GridCluster Federation w/Grid Commodity High-Performance Commodity High-Performance

NetworkingNetworking Incl. OpenMP (w/RWCP)Incl. OpenMP (w/RWCP)

Fault Tolerance and SecurityFault Tolerance and Security Dynamic Plug&Play ClusterDynamic Plug&Play Cluster

Downloadable Self-tuning Java LiDownloadable Self-tuning Java Libs and Appsbs and Apps

ApplicationsApplications Operation Research/ControlOperation Research/Control Netsolve MCell run Resource (w/Netsolve MCell run Resource (w/

UCSD)UCSD)

Java/Jini-based Grid&Cluster Java/Jini-based Grid&Cluster computingcomputing Migratory CodeMigratory Code Jini-based Cluster Grid ServiceJini-based Cluster Grid Service

ssResource Publication and ResResource Publication and Res

ource Discoveryource DiscoveryJiPANG Jini-based Grid Portals JiPANG Jini-based Grid Portals

Architecture (w/UTK)Architecture (w/UTK) Performance PortabilityPerformance Portability

High-Performance Portable JavHigh-Performance Portable Java DSMa DSM

Open-ended, downloadable JIT Open-ended, downloadable JIT CompilerCompilerThe OpenJIT Proj(w/Fujitsu)The OpenJIT Proj(w/Fujitsu)

Bricks Grid Simulatior (HPDC’99)Bricks Grid Simulatior (HPDC’99)

Consists of simulated Consists of simulated Global Computing Global Computing Environment Environment and and Scheduling Unit.Scheduling Unit.

Allows simulation of various behaviors ofAllows simulation of various behaviors of resource scheduling algorithmsresource scheduling algorithms programming modules for schedulingprogramming modules for scheduling network topology of clients and serversnetwork topology of clients and servers processing schemes for networks and servers processing schemes for networks and servers

(various queuing schemes)(various queuing schemes)using the using the Bricks scriptBricks script..

Makes benchmarks of existing global Makes benchmarks of existing global scheduling components availablescheduling components available

The Bricks ArchitectureThe Bricks Architecture

Scheduler

NetworkMonitor ServerMonitor

ClientNetwork

NetworkServer

Scheduling UnitScheduling Unit

Global Computing EnvironmentGlobal Computing Environment

ResourceDB

NetworkPredictorServerPredictor

Predictor

Applications on PRESTO Clusters –Op. ResearchApplications on PRESTO Clusters –Op. Research

SCRM(Generalized Quadratic SCRM(Generalized Quadratic Optimization Algorithm)Optimization Algorithm)

Iterative execution of multiple Iterative execution of multiple SDP solver w/Ninf via MasterSDP solver w/Ninf via Master-Worker-Worker

Some problems 100Fold speSome problems 100Fold speedup/128 procs (exec. Time edup/128 procs (exec. Time world record)world record)

Other difficult OR problems alOther difficult OR problems also very positive -> Larger exeso very positive -> Larger exection on Cluster Federation rction on Cluster Federation resourcesesources

PRESTO SCRMクラスタによる非凸二次計画問題の 法による並列実行

01000

20003000

40005000

60007000

8000

1 2 4 8 16 32 64

#Processors

()

実行

時間

秒 NQP15_1.datNQP12_1.dat

Titanium Terascale Grid ClusterTitanium Terascale Grid Cluster

Proposal for 10TF-scale “commodity” cluster at thProposal for 10TF-scale “commodity” cluster at the TITECH computing centere TITECH computing center

2 x 500 Itanium-class “commodity” cluster on two 2 x 500 Itanium-class “commodity” cluster on two TITECH campusesTITECH campuses

Interconnect via 2.4 Gigabit WANInterconnect via 2.4 Gigabit WAN Campus-wide usage with Grid softwareCampus-wide usage with Grid software

Centerpiece of Grid infrastructure within TITECH campuCenterpiece of Grid infrastructure within TITECH campuss

ApGrid and Global Grid collaborationApGrid and Global Grid collaboration 2002-3? W/restructuring of computing center2002-3? W/restructuring of computing center

Titanium Cluster OverviewTitanium Cluster Overview

Goal: Construct Goal: Construct as “cheap” as as “cheap” as possiblepossible

Semi-reliable Semi-reliable serviceservice

Use Grid Use Grid technology to technology to federate and federate and manage the manage the clustersclusters大岡山⇔長津田間

2.4G-10Gbps

内外の Grid インフラへ

(NPACI/Alliance/IPG, J- Grid, E- Grid など)

分散 ImmersaDesk

Titaneum クラスタ 1 号機

1024 プロセッサ , 100TB ストレジ

クラスタ OS/Grid ミドルウェア

学内 Grid ユーザ

学内 Grid ユーザ

高速無線 LAN AP

( 教室、研究室等)

学内ユーザの自由

な Grid 資源への

アクセス

大岡山地区

Gigabit 学内 LAN

ApGrid: Services (1)ApGrid: Services (1)

Grid computing serviceGrid computing serviceDeploy major grid software packages ready to uDeploy major grid software packages ready to u

seseNinf v.2.0 (Another talk Ninf v.2.0 (Another talk ))Globus, Netsolve, NWS, Nimrod, Condor Legion,etGlobus, Netsolve, NWS, Nimrod, Condor Legion,et

c.c.MPICH/G(2), PACX-MPI, Harness, etcMPICH/G(2), PACX-MPI, Harness, etc

System resourcesSystem resourcesUS220R x 2CPU x 4 from ETLUS220R x 2CPU x 4 from ETLORIGIN 2000/16CPU, J90/16CPU, CS6400/64ORIGIN 2000/16CPU, J90/16CPU, CS6400/64SR8000/8node, WH-II 8nodeSR8000/8node, WH-II 8nodeClusters (Pentium, Alpha), etc in many placesClusters (Pentium, Alpha), etc in many places

(“lapack”,”dgesv”, .., ..)

lapack.ApGrid.orgmurata.ApGrid.orglapack.eGrid.org 3-DNS

hpcc.gr.jp 192.50.75.0/24

ninf.org 150.29.218.0/23

150.29.219.128(VIP)

BIG/IP

Selector/scheduler

BIG/IP

Selector/scheduler

Different VIP per packagee.g. linpack.apgrid.org

Grouping of libraries via VIPVIP expands the URL to address of appropriate serverNinf 2.0/netsolve etc

Simplified architecture than the Metaserver・ Limit the # of Servers・ Load balancing with L4 switch technology・ Central administration of servers and DB・ Transactions

Res DB

package routine

ASP-Like ApGrid Ninf ServiceASP-Like ApGrid Ninf Service

Simplified architecture Simplified architecture than the Ninf Metaserthan the Ninf Metaserverver Limit the # of known SLimit the # of known S

erverservers Load balancing with L4 Load balancing with L4

switch technologyswitch technology Central administration Central administration

of servers and DBof servers and DB Transaction supportTransaction support

Resource access and Resource access and Load balancing w/VIPLoad balancing w/VIP Different VIP per packaDifferent VIP per packa

gegee.g. linpack.apgrid.orge.g. linpack.apgrid.org

Grouping of libraries viGrouping of libraries via VIPa VIP

VIP expands the URL to VIP expands the URL to address of appropriate address of appropriate serverserver

Ninf 2.0/Netsolve etcNinf 2.0/Netsolve etc

ApGrid: Services (2) ApGrid: Services (2)

Grid information serviceGrid information serviceMaintain name servers and databasesMaintain name servers and databasesASP-like portal serviceASP-like portal service

Handling users, micro economicsHandling users, micro economics

Grid security support service (Plan)Grid security support service (Plan)PKI: Public Key InfrastructurePKI: Public Key InfrastructureCertificate AuthorityCertificate Authority

ApGrid Information ServicesApGrid Information Services

Resource InfoResource Info Performance MonitorinPerformance Monitorin

g and Archiveg and Archive

Would like to collaboraWould like to collaborate w/other Grid patnerte w/other Grid patnerss

APANTokyo

RWCP

TITECH

TransPAC100MbpsETL/TACC

STAR TAPChicago

ApGrid - Korea, Singapore,Australia, etc,

ApGrid nodes in Japan

NWS Sensors

Virutal/Real Client

ApGridApGridTestbedTestbed

NWS Sensors

Virutal/Real Client

Virutal/Real Client

NWS Sensors

US and EuropeanPartners

Osaka-U

ApGrid: Current StatusApGrid: Current Status

Just kicked off, and some of the resources Just kicked off, and some of the resources are ready, but still we need:are ready, but still we need:Hiring people to maintain and to install the regHiring people to maintain and to install the reg

ular services initiallyular services initially Enrolling more partnersEnrolling more partners

Reserved: apgrid.org, Web site will be open shReserved: apgrid.org, Web site will be open shortlyortly

Find international partnersFind international partnersCreating much stronger relation with APAN actiCreating much stronger relation with APAN acti

vitiesvities

SummarySummary Some success storiesSome success stories

Collaboration with Application Scientists Collaboration with Application Scientists International CollaborationsInternational Collaborations

Osaka-U/UCSD (Globus)Osaka-U/UCSD (Globus)NetSolve/Ninf CollaborationNetSolve/Ninf Collaboration

WGCC2000, Grid Forum, metacomputing WSWGCC2000, Grid Forum, metacomputing WS Government funded several small projectsGovernment funded several small projects

the Asia-Pacific Grid (ApGrid)the Asia-Pacific Grid (ApGrid) TACC is ready for providing computing resourcesTACC is ready for providing computing resources National, Regional testbedNational, Regional testbed International Collaborations Efforts a MUST!International Collaborations Efforts a MUST!

TACC OverviewTACC Overview

MissionsMissions Providing world leadership in advanced computing sciencProviding world leadership in advanced computing scienc

e and technology through the development and applicatie and technology through the development and application of computing science and engineeringon of computing science and engineering

OrganizationOrganization MITI/AIST operates directly since 1981MITI/AIST operates directly since 1981 2 executive, 7 technical, 2 admin + SEs2 executive, 7 technical, 2 admin + SEs Annual budget 2,400M JPY (=20M USD)Annual budget 2,400M JPY (=20M USD)

Incl. Supercomputer rental, SE, network maintenance, electricity, Incl. Supercomputer rental, SE, network maintenance, electricity, etc.etc.

Collaborative activities with partners Collaborative activities with partners RWCP, Tsukuba Univ., NAL, Jaeri, KEK,RWCP, Tsukuba Univ., NAL, Jaeri, KEK, HRLS, CSAR, SDSC, UTK, LANL, NIST, ETHZ, ANU...HRLS, CSAR, SDSC, UTK, LANL, NIST, ETHZ, ANU...

ITBL is NOTITBL is NOT

ApGrid nor Japan Grid nor Tokyo Grid nor Tsukuba Grid nor…ApGrid nor Japan Grid nor Tokyo Grid nor Tsukuba Grid nor… An Infrastructure-oriented projectAn Infrastructure-oriented project An Application-oriented projectAn Application-oriented project An Earth Simulator-related projectAn Earth Simulator-related project A successor to RWCPA successor to RWCP A Grid projectA Grid project An internationally collaborative projectAn internationally collaborative project A domestically collaborative projectA domestically collaborative project A huge projectA huge project A Good project (at least to our opinion)A Good project (at least to our opinion) Then, what is IT?Then, what is IT?

Nobody really knows (or cares)Nobody really knows (or cares) And thus its objective must be top secret (even to us)And thus its objective must be top secret (even to us) Probably upgrades several supercomputer boxes (Probably upgrades several supercomputer boxes ( 箱物箱物 ) and network links ) and network links

(( ゼネコン対策ゼネコン対策 ))