RIKEN Integrated Cluster of Clusters System User's Guide

Version 1.22

Apr. 01, 2015

Advanced Center for Computing and

Communication

RIKEN

RIKEN Integrated Cluster of Clusters System

User's Guide

Copyright (C) RIKEN, Japan. All rights reserved.

- ii -

Version management table

Version Revision date Change content

1.4 2010.02.19 1.1 “Outline of the System” modified

2.1.5 “Access to the RIKEN network from RICC” added

2.2 “Account and Authentication” modified

2.6 “Login environment” modified

3.1 “Available file area” modified

3.2.3 “Local disk area (work area)” modified

4. “How to create jobs” modified

5.1.1.3.1 “Example of chain job” added

5.1.1.4 “Major options for job submission” modified

5.1.1.6 “Software resource” modified

5.1.1.10 “Script file for Batch job” modified

5.1.2 “Confirm job information” modified

5.1.3 “Operate job” modified

5.2 “Interactive Job” modified

6.6 “FTL Syntax” modified

8 “How to use Archive system” modified

9.2 “RICC Mobile Portal” modified

10.1 “Product manual” modified

1.5 2010.03.17 5.1.3 “Operate job” modified

7.2 “Time measurement” modified

8 “How to use Archive system” modified

11.5.3 “Create script (Amber8 : MDGRAPE-3 Cluster)” modified


1.2 “Hardware outline” modified

5.1.1.3 “Function outline and Job submit command format” modified

5.1.1.5 “Hardware resources” modified

1.7 2010.06.07 3.1 “Available file area” modified

3.2.2 “Data area” modified

5.1.1.4 “Major options for job submission” modified


5.1.1.8 “Major options for job submission command” modified

11.2.3 “Specify temporary directory” added

11.3.1 “Create script” modified


- iii -

1.8 2010.07.30 3.2.2 “Data area” modified

7.2 “Debugger” added

8 “Tuning” added


1.2 “Hardware outline” modified

5.1.1.3 “Function outline and Job submit command format” modified


5.1.1.6 “Soware resources” modified

6.7 “FTL generating tool : ftlgen added

12.2 GaussView modified

12.4 ANSYS modified

12.5 Amber modified

1.10 2010.11.19 1.3 “Available application and libraries” added

5.1.1.6 “Soware resources” modified

12.3 “NBO for Gaussian” added

1.11 2011.05.02 5.1.5 “Confirm project user job information” added

6.6.12.5 “FTL variable” modified

12.1.1.6 “Use scrach area on the local disk of computing node” added

12.5 “ANSYS” modified

1.12 2011.08.17 5.1.1 “Submit batch job” modified


5.2 “Interactive job” modified

1.13 2012.4.2 1.2.7 “Cluster for single jobs using SSD” added

1.5.1 “Available computation time” modified

3.1 “Available file area” modified

3.2.2 “Data area” modified

3.2.3 “Local disk area (work area)” modified



5.1.6.1 “Display resource information” modified

5.1.6.2 “Display usage of core” modified

12.1.1.1 “Create script” modified

1.14 2012.7.13 5.1.3.3 “Software resource” modified

12.6 “Amber” modified

1.15 2012.11.26 4.2 “Compilation / Linkage for GPGPU program” modified


- iv -


1.2.2 “Multi-purpose Parallel Cluster” modified


5.1.3.3 “Software resources” modified

12.7 “GAMESS” modified

1.17 2013.4.1 4.4.5 “IMSL Fortran Numerical Library” added

12.4 “ANSYS” modified

12.8 “MATLAB” added

1.18 2013.5.14 5.1.3.3 “Software resources” modified

12.9 “Q-Chem” added

1.19 2013.9.2 4.2 “Compilation / Linkage for GPGPU program” modified


5.1.3.3 “Software resources” modified


12. “Application” removed (moved to RICC portal https://ricc.riken.jp)




1.22 2015.04.01 1. “Outline of the system” modified

2. “How to Access” modified

3. “File Area” modified

4. “How to create jobs” modified

5. “How to execute Job” modified

6. “FTL(File Transfer Language)” modified

9. “How to use Archive system” modified

https://ricc.riken.jp/


- v -

Contents

Introduction ............................................................................................................................................ 1

1. Outline of the System ........................................................................................................................ 2

1.1 Outline of the System..................................................................................................................... 2

1.2 Hardware outline ............................................................................................................................ 3

1.3 Available application and libraries .................................................................................................. 4

1.4 Maintenance .................................................................................................................................. 4

1.5 Usage categories ........................................................................................................................... 5

2. How to Access ................................................................................................................................... 6

2.1 Login Flow ...................................................................................................................................... 6

2.2 Account and Authentication ......................................................................................................... 19

2.3 Update Password ......................................................................................................................... 20

2.4 Access RICC ................................................................................................................................ 23

2.5 Login environment ....................................................................................................................... 26

2.6 File transfer .................................................................................................................................. 27

3. File Area ............................................................................................................................................ 30

3.1 Available file area ......................................................................................................................... 30

3.2 Type of available file area ............................................................................................................ 31

4. How to create jobs ........................................................................................................................... 32

4.1 Outline of Compilation / Linkage .................................................................................................. 32

4.2 Compilation / Linkage for GPGPU program ................................................................................ 37

4.3 Library management .................................................................................................................... 39

4.4 Linkage of Math library................................................................................................................. 40

4.5 Job Freeze Function .................................................................................................................... 41

5. How to execute Job ......................................................................................................................... 44

5.1 Batch job / Interactive batch job ................................................................................................... 45

5.2 Interactive Job .............................................................................................................................. 83

6. FTL (File Transfer Language) ......................................................................................................... 84

6.1 Introduction .................................................................................................................................. 84

6.2 Transfer input file ......................................................................................................................... 85

6.3 Transfer input directory ................................................................................................................ 86

6.4 Transfer output file ....................................................................................................................... 87

6.5 FTL Basic Directory ..................................................................................................................... 89

6.6 FTL Syntax ................................................................................................................................... 91


- vi -

6.7 FTL generating tool : ftlgen ......................................................................................................... 110

7. Development Environment ............................................................................................................112

7.1 Endian conversion ...................................................................................................................... 112

7.2 Debugger .................................................................................................................................... 113

8. Tuning ..............................................................................................................................................115

8.1 Tuning overview .......................................................................................................................... 115

8.2 Time measurement ..................................................................................................................... 115

8.3 Program development support tool ............................................................................................. 119

8.4 Network topology ....................................................................................................................... 124

9. How to use Archive system .......................................................................................................... 125

9.1 Configuration .............................................................................................................................. 125

9.2 pftp ............................................................................................................................................. 126

9.3 hsi............................................................................................................................................... 127

9.4 htar ............................................................................................................................................. 128

10. RICC Portal ................................................................................................................................... 129

10.1 RICC Portal .............................................................................................................................. 129

11. Manual .......................................................................................................................................... 130

11.1 Product manual ........................................................................................................................ 132

Appendix ............................................................................................................................................ 134

1. FTL Examples ................................................................................................................................ 135

1.1 Execute serial job ....................................................................................................................... 136

1.2 Execute parallel job .................................................................................................................... 142

1.3 FTL basic directory (FTLDIR command) ................................................................................... 149

1.4 Others ........................................................................................................................................ 153


- 1 -

INTRODUCTION In this User’s Guide, we explain usage of the Supercomputer System (RICC, RIKEN Integrated Cluster

of Clusters) installed at RIKEN. Please read this document before you start using the system. This

User’s Guide is available for reference and download at the following homepage. The contents of this

User’s Guide are subject to change.

Shell scripts and other examples in this User’s Guide are available in the following directory on RICC.

RICC system operation schedule is announced by the following web page and RICC user ’s mailing list.

Futhermore, training classes are scheduled several times a year to provide technical support to use

RICC. Class schedule is available on the following web page of the Advanced Center for Computing

and Communication.

Please send your inquiry on programming consultation, such as usage methods, debugging, paralleling

or tuning programs and any questions about RICC to the following e-mail address.

No portion of this document may be copied, reproduced, or distributed in any way, or by any means,

without permission.

https://ricc.riken.jp

ricc.riken.jp:/usr/local/example

http://accc.riken.jp/riccinfo

Email: [email protected]


- 2 -

1. Outline of the System

1.1 Outline of the System

RICC (RIKEN Integrated Cluster of Clusters) consists of 2 computing systems for different purposes

(Massively parallel computing, Multi-purpose parallel computing,), Frontend system, 2.2PB disk device

and 2PB tape library system. Massively Parallel Cluster, core of the system, is PC cluster system of

3888 cores (peak performance 45.6 TFLOPS) for massively parallel computing. Multi-purpose Parallel

Cluster with GPU type of accelerator (peak performance 9.3 TFLOPS + 93.3 TFLOPS [single

precision]) for multi-purpose computing such as commercial or free applications’ execution.

Users are able to edit, compile, link programs, submit batch jobs and obtain computed results through

Login Server (ricc.riken.jp). Also each computing server can run interactive jobs which is necessary for

users to debug their programs. In addition, users can access the system from non-RIKEN network

through VPN and can use the system as if they were on the RIKEN network.

Users are able to login RICC on the RIKEN network by the ssh or the scp, etc. In addition, RICC

provides the web portal site, RICC Portal, which allows users to access RICC by web browser on the

PC. Users are able to edit, compile, link programs, submit batch jobs and obtain computed results on

RICC Portal.

In RICC, users’ home directories are located in the high speed magnetic disk device. Users can access

files in their home directories from Login Server, Multi-purpose Parallel Cluster. When executing batch

jobs on Massively Parallel Cluster, users need to transfer necessary files from their home directories to

local disks of Massively Parallel Cluster and return computed results back to their home directories.

These operations can be performed easily by commands in shell scripts used when submitting batch

jobs.

All systems of RICC are available to login by the issued RICC user accounts, the RICC passwords and

the passphrases of the public-key based authentication method. The passphrases can be generated on

RICC Portal.


- 3 -

1.2 Hardware outline

PC Clusters consist of Massively Parallel Cluster [486 nodes (3888 cores)] and Multi-purpose Parallel

Cluster [100 nodes (800 cores)].

1.2.1 Massively Parallel Cluster

Computation performance

Intel Xeon X5570 (2.93GHz) 1048 nodes (952 CPUs, 3888 cores)

Total peak performance: 2.93 GHz x 4 calculations x 4 cores x 972 CPUs = 45.6 TFLOPS

Memory

12.5TB (12GB x 1048 nodes)

Memory bandwidth: 25.58GB/s = 1066MHz (DDR3-1066) x 8Byte x 3channel)

Byte/FLOP: 0.54 (Byte/Flop) = 25.58GB/s / (2.93GHz x 4calculations x 4cores)

HDD

272TB((147GB × 3 + 73GB) × 436 + (147GB × 6 + 73GB) × 50)

Interconnect (DDR InfiniBand)

All 486 nodes with DDR InfiniBand HCA are configured as a computer network of two-way

communication with performance of 16 Gbps per way.

1.2.2 Multi-purpose Parallel Cluster

Computation performance

Intel Xeon X5570 (2.93GHz) 100 nodes (200 CPUs, 800 cores) + NVIDIA Tesla C2075 GPU type

accelerator x 100

Total peak perfomance: 2.93GHz x 4 calculations x 4 cores x 100 CPUs = 9.3 TFLOPS

1.03 TFLOPS (single precision) x 100 = 103 TFLOPS

Memory

2.3 TB (24GB x 100 nodes)

Memory bandwidth: 25.58GB/s = 1066MHz (DDR3-1066) x 8Byte x 3channel)

Byte/FLOP: 0.54 (Byte/Flop) = 25.58GB/s / (2.93GHz x 4calculations x 4cores)

HDD

25.0TB (250GB x 100 nodes)

Interconnect (DDR InfiniBand)

All 100 nodes with DDR InfiniBand HCA are configured as a computer network of two-way

communication with performance of 16 Gbps per way.


- 4 -

1.2.3 Frontend system

Frontend system is the first host to login to access RICC. Also it provides environment for program

development and execution for PC Clusters, MDGRAPE-3 Cluster and Large Memory Capacity Server.

Frontend system has 4 Login Servers. The Login Servers are connected to 2 load-balancers for

redundancy and high availability.

1.2.4 Cluster for single jobs using SSD

This cluster can be used via a RICC, provides an environment for jobs that require high-speed I/O

and non-parallel.

Local disk area

SSD 360GB (30GB / core)

Interconnect for data transfer

QDR InfiniBand

1.3 Available application and libraries

Information about available application(Gaussian, ANSYS, Amber etc) and libraries(FFTW, GSL,

HDF5, Python library etc) on RICC is released in the following URL.

1.4 Maintenance

Basically RICC is 24/7 operation, but emergent maintenance is performed if needed. We make every

effort to inform users of the maintenance in advance.

https://ricc.riken.jp/cgi-bin/hpcportal.2.2/index.cgi?LMENU=SYSTEM


- 5 -

1.5 Usage categories

We have following four user categories. Users use RICC in one of the categories.

General Use

Quick Use

For more information, please refer to “ 4. Usage Categories” in “RIKEN Supercomputer System Usage

Policy”, which is available on the following URL.

1.5.1 Available computation time

Available computation time is different by projects. Use the listcpu command to check allotted

computation time, used computation time and date of expiry of allotted computation time. When used

computation time reaches 100%, jobs cannot be submitted.

[explanation]

Limit(h) : Allotted computation time (unit: hour)

Used(h) : Used computation time (unit: hour)

Use(%) : Used computation time / Alloted computation time (unit: %)

Date of expiry : Date of expiry of allotted computation time

1.5.2 List Project number / Project name

Use listprj (or listproject) to list Project number and Project name.

http://accc.riken.jp/ricc/policy_e.html

[username@ricc1:~] listcpu

[Q00100] Study of parallel programs <-- Project no./Project name

Limit(h) Used(h) Use(%) Date of expiry

----------------------------------------------------------------------

Total 402000.0 80400.0 20.0% 2016/03/31

+- mpc - 80000.0 - -

+- upc - 400.0 - -

+- ssc - 0.0 - -

[username@ricc1:~] listprj

Q00001(Quick) Study of massively parallel programs on RIKEN Cluster of

Clusters

G00001(General) Research of RICC


- 6 -

2. How to Access

2.1 Login Flow

The login flow for RICC system from account application to login as folllows:

When the account is issued, the e-mail with the client certficate attachment is sent. After installing the

client certificate on your PC, access to the RICC Portal. You can login to the front end servers via SSH

by registering your ssh public key on the RICC Potal.

Figure 2-1 Login Flow


- 7 -

2.1.1 Initial Settings

When accessing the system for the first time, login to the RICC Portal and make sure to do the

following initial settings:

2.1.2 Install Client Certificate

2.1.3 Generate / Register public key and private key

2.1.2 Install Client Certificate

2.1.2.1 Windows

Install the client certificate ACCC sent you by e-mail.

Double click the client certificate provided by ACCC. The Certificate Import Wizard starts. Click

"Next" button.

Figure 2-2 The first screen of "Certificate Import Wizard"


- 8 -

Figure 2-3 The second screen of "Certificate Import Wizard"

Figure 2-4 The third screen of "Certificate Import Wizard"

Figure 2-5 The fourth screen of "Certificate Import Wizard"

1. Enter the passowrd for "Client

Certificate" issued by ACCC.

2. Click "Next" button.


- 9 -

Figure 2-6 The fifth screen of "Certificate Import Wizard"

Figure 2-7 The sixth screen of "Certificate Import Wizard"


- 10 -

2.1.2.2 Mac

Install the client certificate ACCC sent you by e-mail.

Double click the client certificate provided by ACCC.

Figure 2-8 The first screen of "Certificate Import Wizard"

Enter the passowrd for

"Client Certificate" issued

by ACCC.


- 11 -

2.1.3 Generate / Register public key and private key

When accessing RICC by Virtual terminal (ssh / scp, etc), the authentication method is the public-key

based authentication method whether accessing from the RIKEN network or from non-RIKEN network.

Therefore, each user needs to register the public key into Login Server and the private key into the PC /

WS accessing RICC. Preparation flow is following.

(1) Access RICC Portal (refer to 2.1.3.1 Access RICC Portal)

(2) Generate and/or register a public key by either of following way.

Generate a public key on RICC Portal

Generate a public-key pair of a public key and a private key on RICC Portal, and

then store the private key into the terminal.

(refer to 2.1.3.2 Generate public key and private key on RICC Portal)

B) Generate a public key on the terminal (Mac, Linux, etc.) (for advanced users)

Generate a public-key pair of a public key and a private key on the terminal (Mac,

Linux, etc.) accessing RICC, and then register the public key on RICC Portal.

(refer to 2.1.3.3 Generate a public key on the terminal (Mac, Linux, etc.) (for

advanced users))

2.1.3.1 Access RICC Portal

RICC users access RICC Portal (following URL) to generate a public-key pair.


1. Select Client Certification

Click [OK]


- 12 -

Fig. 2-9 RICC Portal login window

1. Enter RICC user account

2. Enter RICC password

3. Click [LOGIN]


- 13 -

2.1.3.2 Generate public key and private key on RICC Portal

(1) At [Setting] – [Key Generation] menu, enter a public-key passphrase.

(Don’t forget the public-key passphrase)

Fig. 2-10 Public-key pair generation window

1. Click [Setting]

2. Click [Key Generation]

3. Enter a passphrase

4. Retype the same

passphrase

5. Select [SSH-2RSA]

6. Select OS(Software) type

7. Click [Generate Key]


- 14 -

Save the private key into the PC.

[In case of Windows (for PuTTY, WinSCP)]

Fig. 2-11 Private key display window

[In case of Mac(OS X)/UNIX/Linux]

Fig. 2-12 Private key display window

* Public-key pairs can be generated as many as users want. Also, registered public keys

generated in past time are not deleted by generation of public-key pairs.

1. The private key is displayed

at the bottom of the window

2. Copy the private key strings

3. Save it into the terminal as a text file

4. Change permission of the file to 600

e.g. $ chmod 600 ~/.ssh/id_rsa

(note)

Save the private key file as ~/.ssh/id_rsa. If

saved in other directory or as other name,

specify the private key file when you access

RICC by the ssh command as follows.

e.g. $ ssh -i private-key-file

-l RICC-account ricc.riken.jp

1. The private key is displayed

at the bottom of the window

2. Copy the private key strings

3. Save it into the terminal as a text file

Note 1: Save the text file by one of the

following editors with the character code.

－notepad: ANSI

－wordpad: text document

Note 2: Extension of the text file should be

“ppk”

(e.g. id_rsa.ppk)


- 15 -

2.1.3.3 Generate a public key on the terminal (Mac, Linux, etc.) (for advanced users)

(*) If you generate a public key in the way of Generate public key and private key on RICC

Portal2.1.3.2 , please skip this section.

(1) Use the ssh-keygen command on the terminal to generate a public-key pair on the terminal.

Mac (OS X): Start Terminal. Execute the ssh-keygen command.

UNIX / Linux: Start terminal emulater. Execute the ssh-keygen command.

Fig. 2-13 Generate a public-key pair

Access RICC Portal from web browser. Move to Key management window.

Fig. 2-14 Move to key management window

1. Click [Setting]

2. Click [Key Management]

3. Click [Update Public Key]


1. Enter the ssh-keygen command

2. Press the return key (If you save it

as other than ~/.ssh/id_rsa file,

enter a file name. (Note))

3. Enter passphrase

4. Retype the same passphrase

(Note) In such case, specify the private

key file when you access RICC by the

ssh command as follows.

Exapmple)

$ ssh -i private-key-file

-l RICC-account ricc.riken.jp


- 16 -

Display the generated public key and register it on RICC Portal.

Mac(OS X): Start Terminal. Execute the cat command to display the public key.

UNIX / Linux: Start terminal emulater. Execute the cat command to display the public key..

(Note) If the ssh-keygen command is executed with no argument at step (1), a public

key is stored in ~/.ssh/id_rsa.pub file.

Command example: $ cat ~/.ssh/id_rsa.pub

Fig. 2-15 Copy the content of the public key

Fig. 2-16 Register the public key

Logout RICC

Fig. 2-17 RICC Portal logout

1. Display the generated public

key.

$ cat "public-key-file"

2. Copy the content

1. Paste the content of the

public key.

2. Select key type

3. Click [save]

Click [logout]


- 17 -

2.1.3.4 Delete registered public key

(1) Access RICC Portal from web browser.

Move to [Delete Public Key] window.

Fig. 2-18 Move to Delete Public Key window

(2) Delete the registered public keys.

Fig. 2-19 Deletion of public keys window

(3) Logout RICC Portal.


Click [Delete All Keys]

*All the registered public

keys are deleted.

1. Click [Setting]

2. Click [Key Management]

3. Click [Delete Public Key]

Click [logout]



- 18 -

2.1.4 Network Access

Destination hosts are as follows:

note 1 : On how to use GaussView , please refer to RICC Portal(https://ricc.riken.jp)

2.1.5 Available service

ssh/scp (Virtual terminal, file transfer)

https (RICC Portal, online manual)

2.1.6 Access to out of the RIKEN network from RICC

When you access to external systems from the RICC system, login to the front end servers enabling

SSH Agent forwarding (-A option).

After login HOKUSAI-GreatWave, login to the RICC front end servers enabling SSH Agent

forwarding (-A option).

Host name(FQDN) Purpose to access

ricc.riken.jp Usaual access

riccgv.riken.jp GaussView use note 1

[username@greatwave:~]$ ssh -A –l username ricc.riken.jp

[username@Your-PC ~]$ ssh -A –l username greatwave.riken.jp

https://ricc.riken.jp/


- 19 -

2.2 Account and Authentication

The account to access RICC, which is called RICC user account, is what the user specified in the

application form. However, the password to enter is different by access methods. Password for each

access method is listed in Table 2-1 Access method list.

The RICC password is what is given to the user when the RICC user account is issued. Please change

the initial RICC password after logging into RICC Portal for the first time.

Access method Protocol Account Password

RICC Portal https RICC user

account

(specified in the

application form)

RICC password

HPSS pftp3 RICC password

Virtual terminal ssh

(scp/sftp)

Public-key passphrase (specified

by user) 4

Table 2-1 Access method list

1 : pftp is the special commands to transfer files between users’ home directories and the Archive

system.

pftp is an enhanced command of the ftp. The pftp can be used in the same way of the ftp.

2 : Public-key passphrase is specified by a user when a pair of a public key and a private key is

generated. A pair of a public key and a private key can be generated in RICC Portal. Please refer

to 2.1.3 Generate / Register public key and private key.


- 20 -

2.3 Update Password

When logging into RICC for the first time, make sure to update the initial RICC password on RICC

Portal.

Password updating flow is following.

(1) Access RICC Portal (refer to 2.3.1 Access RICC)

(2) Update password(refer to 2.3.2 Password updating procedure)

2.3.1 Access RICC Portal

RICC users access RICC Portal (following URL) to update the initial password.

2.3.2 Password updating procedure

(1) Login RICC Portal



Select Client Certification


- 21 -


(2) At [Setting] – [Password Update] menu, update the initial password.

(If the initial password is not updated on RICC Portal, Password Update menu is shown just after

logging into RICC Portal.)

Fig. 2-23 Password update window

1. Click [Setting]

2. Click [Password Update]

3. Enter the initial password

4. Enter a new password

5. Retype the same password

6. Click [Update]

Condition of password:

- At least 6 characters

- Not simple

(e.g. dictionary word)


2. Enter the initial password

3. Click [LOGIN]


- 22 -

(3) Confirm password was updated.

Fig. 2-24 Confirmation of password update window

(4) Logout RICC Portal


Click [logout]

Click [OK]


- 23 -

2.4 Access RICC

2.4.1 Login

Use ssh service to login RICC from PC / WS. The ssh command for UNIX / Mac (OS X) and PuTTY for

Windows are recommended. PuTTY is available on the following website.

http://www.chiark.greenend.org.uk/~sgtatham/putty/

The host to access is following.

Login prompt varies each time of login because Login Servers (4 servers) are load-balanced by the

load-balancers.

A) For UNIX / Mac(OS X)

Host name (FQDN)

ricc.riken.jp

% ssh –l username greatwave.riken.jp

T The authenticity of host 'greatwave.riken.jp' can't be established. Displayed only

RSA key fingerprint is 26:8a:53:1e:d3:3f:ed:29:e0:a3:32:0d:d5:6e:1a:e2 . at first-time login.

Are you sure you want to continue connecting (yes/no)? yes <---------------------- Enter [yes]

Warning: Permanently added 'greatwave.riken.jp' (RSA) to the

list of known hosts.

Enter passphrase for key '/home/username/.ssh/id_rsa': ++++<---Enter the pablic-key passphrase

[username@greatwave1:~] ssh –l username ricc.riken.jp

The authenticity of host 'ricc.riken.jp' can't be established. Displayed only

RSA key fingerprint is 26:8a:53:1e:d3:3f:ed:29:e0:a3:32:0d:d5:6e:1a:e2 . at first-time login.

Are you sure you want to continue connecting (yes/no)? yes <---------------------- Enter [yes]

Warning: Permanently added 'ricc.riken.jp' (RSA) to the

list of known hosts.

Enter passphrase for key '/home/username/.ssh/id_rsa': ++++<---Enter the pablic-key passphrase

[username@ricc1:~]

http://www.chiark.greenend.org.uk/~sgtatham/putty/


- 24 -

B) For Windows

1. Specify the private key in a virtual terminal.

Fig. 2-26 Private key setting window

2. Access RICC with a virtual terminal.

Fig. 2-27 Virtual terminal (PuTTY) Session window

For PuTTY,

1. Go to [Connection] – [SSH] –

[Auth] menu

2. Click [Browse] and specify the

private key in

2.1.3

Generate / Register public key and

private key

For PuTTY,

1. Click [Session]

2. Enter following items

Host name: greatwave.riken.jp

Port: 22

Connection type: SSH

3. Enter a session name at [Saved

Sessions] (e.g.GreatWave)

4. Click [Save]

5. Click [Open]


- 25 -

3. For the first-time login, following security alert window is shown. Click [Yes].

This alert is now shown at future logins.

Fig. 2-28 Virtual terminal (PuTTY) Security Alert window

4. Enter the RICC user account and the public-key passphrase.

Fig. 2-29 Virtual terminal login completion

2.4.2 Logout

Enter “exit” or “logout” at prompt. Logout process might take a little time for post processing (writing the

history file).

1. Enter RICC user account at

[login as]

2. Enter the public-key passphrase


- 26 -

2.5 Login environment

In RICC, bash or tcsh is available as login shell. Default is bash. If you want to change it, please

contact the Advanced Center for Computing and Communication ([email protected])

An environment setting file to use RICC is stored in your login directory.

(note) To add paths to environment variable PATH, add them to the end of PATH. If not, you may not

use the system properly.

Also, original skeleton files are available in the following directory of Login Server.

ricc.riken.jp:/usr/local/example/skel


- 27 -

2.6 File transfer

2.6.1 File transfer of RICC

Use ssh service to transfer files between RICC and the PC / WS. The scp (sftp) command for UNIX /

Mac (OS X) and WinSCP for Windows are recommended. WinSCP is available on the following

website.

http://winscp.net/eng/docs/

The host to access is following.

A) For UNIX / Mac (OS X)

B) For Windows

Login RICC with WinSCP. Files can be transferred by drag & drop after login.

1. Login RICC with WinSCP.

Host name (FQDN)

ricc.riken.jp

% scp local-file [email protected]:remote-dir

public key passphrase：++++++++ Enter the public-key passphrase

file-name 100% |***********************| file-size

[username@greatwave1:~] scp local-file [email protected]:remote-dir


1. Click [New Site]


Host name: greatwave.riken.jp

Port number: 22

User name: RICC user account

Password: Public-key passphrase

3. Click [Advanced]

http://winscp.net/eng/docs/lang:jp


- 28 -

Fig. 2-30 WinSCP Login window

4. Open [Authentication]


Private key file：Private key file

6. Clock [OK]


- 29 -

2. Files can be transferred by drag & drop.

Fig. 2-31 WinSCP after login window

Login to GreatWave(greatwave.riken.jp) for transfer files to RICC

Also, files can be uploaded / downloaded on RICC Portal using web browser.

However, upload / download function of RICC Portal cannot transfer multiple files.

[username@greatwave1:~] scp local-file [email protected]:remote-dir


file-name 100% |***********************| file-size


- 30 -

3. File Area

3.1 Available file area

Available file areas are following.

Area Area name Size Device

homenote 1 /home 2.2PB 4TB/user

data /data (4TB~52TB)/Project

local disk

(work area)

/work depends on clusternote 2 computing node

archive /arc 2PB ReadOnly

Table 3-1 Available file area list

Note 1 : home area is limited to less than 500GB per user by Quota.

Note 2 : local disk area of each cluster is limited as follows:

Massively Parallel Cluster 40GB/core

Multi-purpose Parallel Cluster 10GB/core

Cluster for single jobs using SSD 30GB/core

Available file areas for nodes are following.

File area Login

Server

Massively Parallel

Cluster

Multi-purpose Parallel

Cluster

Cluster for single

jobs using SSD

homenote ３ O O O O

data O O O O

local disk

(work area) - O

(for prestaging)

O

(scratch area for job)

O

(scratch area for job)

archive O - - -

O : Available for use - : Not available

Table 3-2 Available file area for nodes


- 31 -

3.2 Type of available file area

3.2.1 Home area

Home area is 2.2PB shared file system located on Disk Storage System.

Home area is accessible from Login Server, Multi-purpose Parallel Cluster, Cluster for single jobs using

SSD.

Intended purpose:

To store source programs, object files and execution modules

To store small amounts of data

Usage of home area is limited to less than 2.2PB per user by Quota.

3.2.2 Data area

Data area is 2.2PB shared file system located on Archive System.

Data area is accessible from Login Server, Multi-purposes Parallel Cluster, Cluster for single jobs using

SSD.

Intended purpose:

Data sharing between Project members

To store large amounts of data

3.2.3 Local disk area (work area)

Local disk area (work area) is local file system on PC Clusters, Cluster for single jobs using SSD.

Intended purpose:

Staging area for jobs (FTL)

Scratch area while running jobs

Local disk area can be used by users’ jobs and the files are deleted when the jobs finish.

For Massively Parallel Cluster, the area is limited to less than 40GB per core. The more cores a job

uses, the more capacity the job can use. For example, a job using 4 cores can use the area up to

160GB.

For Multi-purpose Parallel Cluster, Cluster for single jobs using SSD, the area can be used for scratch

area while running jobs.


- 32 -

4. How to create jobs

4.1 Outline of Compilation / Linkage

In RICC, programs are compiled and linked on Login Server. Specify which machine to compile / link a

program for.

Compilation / Linkage for GPGPU program is done on GPGPU Compile Server (accel). For more

information, please refer to 4.2 Compilation / Linkage for GPGPU program.

Format of compilation / Linkage is as follows.

command (serial / thread parallel) f77, f90, cc, c++

command (MPI parallel / XPFortran

parallel)

mpif77, mpif90, mpicc, mpic++

xpfrt

machine-option -pc

option optional (options of each compiler)

file source file, object file

Table 4-1 Format of Compilation / Linkage

command machine-option [option] file [...]


- 33 -

Options to PC Clusters is following.

Option Meaning

-c output only object programs

-g generate the debugging information in object programs

-I specify the file for include

-L add directory to the list of directories in which the linker searches for libraries

-l search the library libname.so or libname.a

-o specify name of execution module

Table 4-2 Common option list

Common optimization options are following.

Common option Meaning Fujitsu

compiler

Intel compiler

-high (*1) Optimize for high speed

execution on the machine

-Kfast -O3 –ipo –no-prec-div –xHost

-middle In addition to basic

optimization、loop unrolling,

configuration change of

multiple loop, etc. are

performed.

-O2

-low Basic optimization -O1

-none No optimization -O0

Table 4-3 Common optimization option list

(*1) Specifying this option may give rise to side effects. Please pay attention.

Common options for thread parallel are following.

Common option Meaning Fujitsu compiler Intel compiler

-auto_parallel Perform auto parallelization -Kparallel -parallel

-auto_parallel_info Display information of auto

parallelization

-Kpmsg -par-report

-omp Enable OpenMP directives -KOMP -openmp

Table 4-4 Common thread option list


- 34 -

Following libraries are available in RICC.

Machine Serial Parallel

Math library Parallel library Math library

PC Clusters BLAS

LAPACK

SSL II

IMSL

MPI

PVM

ScaLAPACK

SSL II

Table 4-5 Available library

If a machine to run modules is specified in CLTK user configuration file (${HOME}/.cltkrc),

machine-option (-pc) can be omitted for compilation / linkage.

* Option in command line has a priority over one in CLTK configuration file.

Parameter of CLTK user configuration file is following.

Parameter Value Meaning

CLTK_TARGET_MACHINE pc generate module for PC Clusters

Table 4-6 Parameter of CLTK user configuration file

Example of CLTK user configuration file:

There are cautions on compilation / linkage of thread parallel programs or MPI parallel programs. For

more information, please refer to product manuals. On how to refer to product manuals, please refer to

0

Manual.

CLTK_TARGET_MACHINE=pc


- 35 -

4.1.1 Compilation / Linkage for PC Clusters

Fujitsu compiler is used for PC Clusters.

4.1.1.1 Serial program

Use f77/f90/cc/c++ to compile / link serial programs for PC Clusters.

1. Compile / link Fortran77 a program for PC Clusters. (optimization: high)

[username@ricc1:~] f77 –pc –high –o sample1.out sample1.f

2. Compile / link C a program for PC Clusters. (optimization: high)

[username@ricc1:~] cc –pc –high –o sample2.out sample2.c

4.1.1.2 Thread parallel program

Use f77/f90/cc/c++ to compile / link thread programs for PC Clusters. Specify a common option in

Table 4-4 as thread-option.

1. Compile / link a Fortran77 program for PC Clusters with auto parallelization.

[username@ricc1:~] f77 –pc –auto_paralell –o auto_para.out auto_para.f

2. Compile / link a C program including OpenMP for PC Clusters.

[username@ricc1:~] cc –pc –omp –o omp.out omp.c

4.1.1.3 MPI parallel program

User mpif77/mpif90/mpicc/mpic++ to compile / link MPI parallel programs for PC Clusters.

1. Compile / link an MPI Fortran77 program for PC Clusters.

[username@ricc1:~] mpif77 –pc –o mpi_sample1.out mpi_sample1.f

2. Compile / link an MPI C program for PC Clusters.

[username@ricc1:~] mpicc –pc –o mpi_sample2.out mpi_sample2.c

f77/f90/cc/c++ -pc [option] file [...]

f77/f90/cc/c++ -pc thread-option [option] file [...]

mpif77/mpif90/mpicc/mpic++ -pc [option] file [...]


- 36 -

4.1.1.4 XPFortran parallel program

Use xpfrt to compile / link XPFortran(former VPP Fortran) parallel programs.

1. Compile / link an XPFortran program

[username@ricc1:~] xpfrt –o xpf.out xpf.f

xpfrt [option] file [...]


- 37 -

4.2 Compilation / Linkage for GPGPU program

Compilation / Linkage for GPGPU programs (CUDA programs (*)) is done on GPGPU Compile Server

(accel).

(*) For more information on CUDA (Compute Unified Device Architecture), please refer to the following

web site (CUDA ZONE).

1. Log in to GPGPU Compile Server (accel) from Login Server

[username@ricc1 ~] ssh accel

[username@upc0000 ~]

2. Compile / link GPGPU programs (CUDA C programs) (use of CUDA compiler)

[username@upc0000 ~] nvcc [OPTION] file [...]

or

[username@upc0000 ~] cc –nvidia [OPTION] file [...]

3. Compile / link GPGPU programs (CUDA Fortran programs) (use of PGI compiler)

4. Compile / link GPGPU programs (CUDA MPI Fortran programs) (use of PGI compiler)

(*) Without machine-option on accel, PGI compiler is used by default.

Example of GPGPU programs (CUDA programs) Job script file:

[username@upc0000 ~] f90 [-pgi] [OPTION] file [...]

[username@upc0000 ~] mpif90 [-pgi] –ta=nvdia -Mcuda file [...]

http://www.nvidia.com/object/cuda_home_new.html

[username@ricc1:~] vi go.sh

#!/bin/sh

#------ qsub option --------#

#MJS: -accel

#MJS: -cwd

#---- Program execution ----#

srun ./a.out


- 38 -

Example of GPGPU programs (CUDA MPI Fortran programs) Job script file:

[note]

Jobs can be submitted on Login Server (ricc1-4).

Jobs can not be submitted on GPGPU Compile Server (accel).

Specify –accel as hardware resource to submit jobs using GPGPU (CUDA programs).

When -accel is specified, the job consumes 1 CPU (4 cores) as resource.

Specify -accelex as the hardware resource if you want to use 1 node exclusively. In this case, each

process consumes 8 cores resource.

A job which uses 2 or more GPGPU consumes 1 node (8cores) per process.


#!/bin/sh

#------ qsub option --------#

#MJS: -accelex

#MJS: -proc 2

#MJS: -cwd


mpirun -np 2 ./multi.exe


- 39 -

4.3 Library management

Format of archive command is following. Specify options of the ar command as option.

1. Create an archive for PC Clusters.

[username@ricc1:~] ar –pc cr libarchive.a sub1.o sub2.o sub3.o

ar machine-option option archive [member...]


- 40 -

4.4 Linkage of Math library

When Math libraries are used by Fujitsu C/C++ compiler, please read cautions in product manuals too.

On how to refer to product manuals, please refer to 0

Manual.

4.4.1 BLAS

Specify –blas option to link BLAS library. Specify –blas_t option to link BLAS library for thread

parallel.

1. Link BLAS library for PC Clusters.

[username@ricc1:~] f77 –pc –blas –o blas.out blas.o

4.4.2 LAPACK

Specify –lapack option to link LAPACK library. Specify –lapack_t option to link LAPACK library for

thread parallel.

1. Link LAPACK library for PC Clusters.

[username@ricc1:~] f77 –pc –lapack –o lapack.out lapack.o

4.4.3 ScaLAPACK

Specify –scalapack option to link ScaLAPACK library. Specify –scalapack_t option to link

ScaLAPACK library for thread parallel.

1. Link ScaLAPACK library for PC Clusters.

[username@ricc1:~] mpif77 –pc –scalapack –o scalapack.out scalapack.o

4.4.4 SSLII

For PC Clusters, SSL II and C-SSL II are available. Specify –SSL2 option to link SSL II or C-SSL II

library.

1. Link an object compiled by Fortran77 for PC Clusters and SSL II library.

[username@ricc1:~] f77 –pc –SSL2 –o ssl2.out ssl2_f.o

2. Link an object compiled by Fortran77 for PC Clusters and SSL II library for thread parallel.

[username@ricc1:~] f77 –pc –SSL2 –o ssl2thread.out ssl2thread_f.o

3. Link an object compiled by C for PC Clusters and SSL II library.

[username@ricc1:~] cc –pc –SSL2 –o ssl2.out ssl2_c.o

4. Link an object compile by C for PC Clusters and SSL II library for thread parallel.

[username@ricc1:~] cc –pc –SSL2 –o ssl2thread.out ssl2thread_c.o


- 41 -

4.5 Job Freeze Function

On RICC, the Job Freeze Function can save the status of a running job (in a file) that is not completed

before a halt of system operation. When the system operation is restarted, this function restores the job

from the file, and then restarts it.

4.5.1 Jobs as the targets of the Job Freeze Function

Job Freeze applies to the jobs that meet the following condition:

Program compiled by a Fujitsu compiler

To enable the job freeze, the job program must be compiled and linked by Fujitsu compiler and

linked with the job freeze library. (The job freeze library is linked automatically by compilers

described in “4.1.1 Compilation / Linkage for PC Clusters”)

4.5.2 Jobs excluded from the targets of the Job Freeze Function

The Job Freeze Function cannot always freeze all jobs. The jobs and their internal information

described below are excluded from Job Freeze targets. An attempt to freeze or defrost such jobs may

fail. Even if one of the said jobs has been frozen and defrosted successfully, its operation may be

unpredictable.

Job concerned with time

If a job using time information is frozen and defrosted, the time information on the period from the

end of freezing to the end of defrosting will be lost. The same applies to the job that uses a timer

process.

Job concerned with the i node number of file

For example, the i node number that can be acquired with system call stat(2) may be changed

during the period between job freezing and job defrosting.

Job process that standard output is redirected to the file

Since the file is overwritten after job defrosting, job lost the output before job freezing.

Job process that standard input is redirected to the file

Since the file cannot be seek at the file location set before job freezing, job operation after job

defrosting will be unpredictable.

Job cooperated or sharing a resource with an external process

When a job cooperated or sharing a resource with an external process is frozen, the external

status related to the job cannot be saved by job freezing. An example of such jobs is the job that

exchanges data with other jobs via files.

Job using a profiler

When a job uses a profiler, the status of the job cannot be saved because the profiler may

communicate with the processes outside of the job. Freezing of the job using the profiler fails.

Shell scripts

When a job uses script language (perl, python, etc.), the Job Freeze will fail.


- 42 -

Job using unsupported file system

The Job Freeze Function does not support the following file systems:

/dev, procfs, and namefs, etc.

If the program uses any of these file systems when Job Freeze is performed, the Job

Freeze will fail.

Job opening a directory

Freezing of a job that is currently opening a directory fails.

Job using I/O event notification facility epoll system call

Freezing of a job that is using epoll system call fails.

Interactive job

Interactive batch job

Job not using srun, mpirun, xpfrun

The Job Freeze Function freezes a process launched by srun, mpirun, xpfrun. Other processes

does not frozen.


- 43 -

Job executing srun, mpirun, xpfrun repeatedly in “for” or “while” sentence

When defrosting a job, the job script is restarted. The Job Freeze Function saves a line number of

the job script and defrosts the job at that point. Therefore, when executing srun, mpirun, xpfrun

repeatedly in “for” or “while” sentence, the Job Freeze Function may not work properly. However,

if the job script is written to save and restore the status at the point of job freezing, the Job Freeze

Function can work properly.


#!/bin/bash

#------ qsub option --------#

#MJS: -pc

#MJS: -proc 16

#MJS: -time 10:00:00

#MJS: -eo

#MJS: -cwd

#----- FTL command -----#

#BEFORE: a.out

#BEFORE: input.1

#AFTER: output.*


start=1

end=100

if [ -f ${QSUB_REQID}_index ]; then

start=`cat ${QSUB_REQID}_index`

fi

for (( i = $start; i <= $end; i = i + 1 )); do

echo $i > ${QSUB_REQID}_index

mpirun -stdinfile input.${i} ./a.out > output.${i}

cp output.${i} input.$((i+1))

done

rm ${QSUB_REQID}_index


- 44 -

5. How to execute Job

There are 3 types of Job. (Refer to Table 5-1 Type of Job)。

To batch jobs, necessary resources such as cores and memory for computing are exclusively

allocated. In addition to batch jobs, which do not need to recieve input from terminal, jobs which need

input from terminal can be executed as Interactive batch job.

Interactive job is executed sharing resources for interactive job by time sharing methods.

Job type Purpose Occupancy of

resource Start of execution

Batch job Execute a job as batch type Yes

When resources

are allocated Interactive batch job Execute a job as interactive type

Interactive job

Execute a program (debug etc.)

which is preferred to run

immediately rather than

occupancy of cores / memory.

No(time sharing

with other

interavtive jobs)

Immediate

Table 5-1 Type of Job

Batch job is classified 4 types by submission patern.

Batch job type Purpose Procedure of submission

Normal batch job Execute a job in each script 5.1.1.1 Submit Normal batch job

Chain job Execute a set of Jobs by specified

order 5.1.1.2 Submit chain job

Bulk job Execue Jobs in same script

Jobs that be managed as one job 5.1.1.3 Submit bulk job

Coupled

calculation job

Execute a set of Jobs started at

same time

5.1.1.4 Submit couppled calculation

job

Table 5-2 Type of Batch Job


- 45 -

5.1 Batch job / Interactive batch job

5.1.1 Submit batch job

5.1.1.1 Submit Normal batch job

Use the qsub command with script file name as argument to submit batch jobs.

Example) Batch job submission

Above message (REQUEST-ID) is displayed at job submission.

If blank characters are included in the current directory path, an error message is displayed. In that

case, please change the directory name including blank characters.

5.1.1.2 Submit chain job

Chain job are executed sequentially job by specified order in submit command line. Two or more jobs

are never executed at the same time.

Specify two or more script files separated by comma (,) without white space to submit chain jobs for

qsub command.

When a job which composed chain jobs is cancelled by qdel command, all subsequent jobs are

cancelled.

Example of chain jobs which use output file for input file of the next job.

Prepare scripts

Prepare scripts which transfer output file (output.x) of the previous job by FTL and use it for input file

of the next job. (go1.sh, go2.sh, go3.sh)

qsub [option] script-file [...]

% qsub go.sh Submit a batch job

Request 123777.jms submitted to MJS.

qsub [option] script-file,script-file[,script-file[,...]]


- 46 -

Submit chain job

Specify the prepared scripts files separated by a comma without white space.

go2.sh

input file: output.1

output file: output.2

go3.sh

input file: output.2


#!/bin/sh

#MJS: -proc 8

#MJS: -time 1:00:00

#MJS: -eo

#MJS: -cwd

#BEFORE: a.out

#AFTER: output.1

mpirun ./a.out -o output.1

#!/bin/sh

#MJS: -proc 8

#MJS: -time 1:00:00

#MJS: -eo

#MJS: -cwd

#BEFORE: a.out

#BEFORE: output.1

#AFTER: output.2

mpirun ./a.out -i output.1 -o output.2

#!/bin/sh

#MJS: -proc 8

#MJS: -time 1:00:00

#MJS: -eo

#MJS: -cwd

#BEFORE: a.out

#BEFORE: output.2

#AFTER: output.3

mpirun ./a.out -i output.2 -o output.3

go1.sh


[username@ricc1:~] qsub go1.sh,go2.sh,go3.sh


- 47 -

5.1.1.3 Submit bulk job

A bulk job is a structure that allows execution of the same program with the same resources multiple

times but with different input files. A bulk job can be submitted, controlled as a single unit. Each job

(subjob) in the bulk job shares the same bulk ID but has a unique bulk index.

Specify “-B” option and the range of BULK INDEX ID from start number <StartNO> to end number

<EndNO>. Steps of Bulk Index ID can be specified by Step number <StepNO> for qsub command.

It facilitates handling of output and input. The environment variable MJS_BULKINDEX is avalable to

refer bulk index.

A bulk job or a part of sub jobs can be cancelled at once by specifying Bulk ID or Bulk Index ID.

Bulk ID is set to the environment variable MJS_BULKID. Bulk index ID is set to the environment

variable MJS_BULKINDEX. Input or output files can be switched using bulk index ID.

Prepare input files

Prepare input files to use in each subjob.

Prepare script file

Prepare script file for bulk job. The environment variable of each subjob is set to variable

MJS_BULKID and variable MJS_BULKINDEX.

Submit Bulk job

Specify “-B” option as bulk job.

[username@ricc1:~] qsub –B 1-3 go-bulkjob.sh

Bulk Request 145678.jms Submitted to MJS.

Sub job [1] inputfile: input.1



#!/bin/sh

#MJS: -proc 8

#MJS: -time 1:00:00

#MJS: -eo

#MJS: -cwd

#BEFORE: a.out

#BEFORE: input.${MJS_BULKINDEX}

#AFTER: output.${MJS_BULKINDEX}

mpirun ./a.out -i input.${MJS_BULKINDEX} -o output.${MJS_BULKINDEX}

qsub -B <StartNO>-<EndNO>[:<StepNO>][option] script-file


- 48 -

For above-mentioned example, bulk Job is given to bulk ID “145678” and each subjob is given bulk

Index ID ”1”, ”2”, ”3”. The environment variables, input file and output file name of each subjob are the

following.

Bulk ID Bulk Index ID Environment variables Input file name Output file name

145678

1 MJS_BULKID=145678

MJS_BULKINDEX=1 input.1 output.1

2 MJS_BULKID=145678


3 MJS_BULKID=145678


5.1.1.4 Submit couppled calculation job

Two or more jobs as coupled calculation job are stared at the same time. Coupled calculation jobs

are not started until all jobs are allocated computing resource.

Specify two or more script files separated by colon (:) to execute coupled calculation jobs (jobs that

start at the same time).

When a job which composed coupled calculation jobs is cancelled by qdel command, the others also

are cancelled.

5.1.1.5 Confirm completion of Batch job

When a job completes, a standard output file and a standard error output file are created in the

directory where the job is submitted.

In standard output file, standard output of executed job is wrote. In standard error output file, error

messages if errors occur are wrote.

[Execution result file of PC Clusters, MDGRAPE-3 Cluster]

Request-name.oXXXXX.jms --- Standard output file

Request-name.eXXXXX.jms --- Standard error output file

(XXXXX is REQUEST-ID displayed at job submission)

[Execution result file of Large Memory Capacity Server]

Request-name.oXXXXX --- Standard output file

Request-name.eXXXXX --- Standard error output file

(XXXXX is REQUEST-ID displayed at job submission)

qsub [option] script-file:script-file:[script-file:[...]]


- 49 -

Caution

In PC cluster and MDGRAPE-3 Cluster, when some batch jobs which write more than hundreds

megabyte data to standard output/error finish at the same time, it takes long time to transfer standard

output/error files because the load of job scheduler becomes high. The all user's job termination

processes are delayed due to this influence.

Therefore, if the size of standard output/error is large, please redirect them to normal files as below.

Example1: Redirect standard output/error to file "$MJS_REQID.log"

Use bash as login shell

Use tcsh as login shell

Example2: Suppress to write to standard output/error

Use bash as login shell

Use tcsh as login shell

#------- FTL command -------#

#AFTER:0: $MJS_REQID.log ! FTL command

#------- Program Execution -------#

mpirun ./a.out >> $MJS_REQID.log 2>&1 # Redirect output to

# $MJS_REQID.log

#------- FTL command -------#

#AFTER:0: $MJS_REQID.log ! FTL command


mpirun ./a.out >>& $MJS_REQID.log # Redirect output to $MJS_REQID.log

mpirun ./a.out > /dev/null 2>&1

mpirun ./a.out >& /dev/null


- 50 -

5.1.2 Submit interactive batch job

Use the qsub command with option –i (hyphen i) to execute jobs interactively. At job submission,

specify qsub options as argument of the command.

Example) Interactive batch job submission

If job submission is successful, the system return a notification of submission completion to the prompt

and the job becomes in waiting status on the terminal until the job starts. When resources are allocated,

a notification of start of execution is displayed and resources are available to use. If the user does not

any operation on the terminal for 10 minutes, the user will logout automatically.

If blank characters are included in the current directory path, an error message is displayed. In that

case, please change the directory name including blank characters.

qsub -i [option] [script file]

% qsub -i -pc -proc 4 -mem 4048mb Submit an interactive batch job

Request 123777.jms submitted to MJS. Notification of submission completion

Request 123777.jms start Notification of start of execution

[username@mpc0025~ ] mpirun ./a.out

(output of job execution)

[username@mpc0025~ ] exit Notification of job completion

% End job


- 51 -

5.1.3 Function outline and Job submit command format

(1) The resources are classified into the following three categories.

A) Basic resources : Number of cores, elapsed time, amount of memory

B) Hardware resources: Resources that depend on hardware

C) Software resources: Resources that depend on software (application such as ISV)

(2) Specify resources following “#MJS:” in scripts.

Example

(3) Don’t use colon (:) and comma (,) in a script file name because they have special meanings.

(4) Chain job, bulk job and Coupled calculation job cannot be used for interactive batch job. cannot

be used for interactive batch job.

(5) A user can submit up to 500 jobs per project.

(6) A user can submit up to 5000 bulk jobs per project including normal jobs.

(7) There are limits of using numbers of cores at the same time.

General Use: Up to 3888 cores per project

Quick Use: Up to 256 cores per project

#MJS: -pc -amber

#MJS: -proc 4 -mem 1024mb

#MJS: -time 12:00:00


- 52 -

5.1.3.1 Major options for job submission

Major options for job submission command are as follows.

Command option Meaning

-proc <PROCNO> Specify a number of processes (cores) (default: 1)

-thread <THREANO> Specify a number of threads (cores) (default: 1)

-mem[ory] <MEMSIZE>[kb

|mb|gb]

Specify amount of memory per process

default unit: mb (PC Clusters / MDGRAPE-3 Cluster)

gb (Large Memory Capacity Server)

-hdd <HDDSIZE>[kb

|mb|gb]

Specify HDD size per process

default unit: gb

-time <hh:mm:ss> |

<sssss>

Specify running time (elapsed time)

format: hh(hours):mm(minutes):ss(seconds) or sssss(seconds)

default : refer to Table 5-5 Available hardware resource

-eo

Merge standard output and standard error output (default: not

merge)

(*) Invalid for interactive batch job

-oi

Output statistical information of the job to standard output

(default: not output)

(*) In case of interactiv batch job, output statistical information

to a file.

-mb Send an email when a job starts (default: not send)

-me Send an email when a job ends (default: not send)

-mu “email

address” Email address (default: email address in the application form)

-r <REQNAME> Specify a request name (default: script name or STDIN)

-rerun [ Y | N ] Specify if job restarts in case of trouble. Y: restart (default:N)


-chaindel [ Y | N ]

Specify if subsequent jobs are deleted when the chain job ends

abnormally. Y: delete (default: Y)

(*) Invalid for non chain job

-cwd Move to a directory where a job is submitted when a job starts

(default: home directory)


- 53 -

-comp[iler] <COMPTYPE>

Specify a compiler that generated modules

[fj|intel|gcc|pgi|nvidia]

fj Fujitsu compiler

intel Intel compiler

gcc GNU compiler

pgi PGI compiler(using GPGPU)

nvidia CUDA compiler (using GPGPU)

default: fj (except for Large Memory Capacity Server)

intel (Large Memory Capacity Server)

-para[llel] <PARALLEL>

Specify parallel execution environment that linked modules

[fjmpi|xpf|mpt|pvm]

fjmpi Fujitsu MPI

xpf XPFortran

mpt Message Passing Toolkit

pvm Parallel Virtual Machine

default: fjmpi (except for Large Memory Capacity Server)

mpt (for Large Memory Capacity Server)

-project <PRJ-ID> Specify a project ID (for users who have two or more projects)

-B <N>-<M>[:S]

Submit a batch job as specified number of bulk jobs.


<N> Start number of bulk job

<M> End number of bulk job

<S> Number of steps of bulk job(default: 1)

-bmb Send an email when a first bulk subjob starts (default: not

send) (*) Invalid for non bulk job

-bmab Send an email when all bulk subjobs start (default: not send)

(*) Invalid for non bulk job

-bme Send an email when a first bulk subjob ends (default: not send)


-bmae Send an email when all bulk subjobs end (default: not send)


-fstype [ftl | share] Presence of FTL specification (default: share)

Table 5-3 Major options for job submit command


- 54 -

5.1.3.2 Hardware resources

Hardware resources to be specified are as follows.

Hardware resource Computing server system to use

-pc PC Clusters (Massively Parallel Cluster, Multi-purpose Parallel Cluster)

(*)

-mpc Massively Parallel Cluster, Multi-purpose Parallel Cluster)

-upc Multi-purpose Parallel Cluster

-accel Multi-purpose Parallel Cluster (GPGPU)

-accelex Multi-purpose Parallel Cluster (GPGPU: 1node possession)

-ssc Cluster for single jobs using SSD

Table 5-4 Hardware resource list

(*) If the –pc option is specified and no software resource(e.g. –g03, -adf and so on) is specified

when submitting a job to PC Clusters, the job is executed on either Massivelly Parallel Cluster or

Multi-purpose Parallel Cluster.

However, because home area(/home) and data area(/data) are not shared in Massively Parallel

Cluster, files are transferred by FTL (reffer to 6 FTL (File Transfer Language)) at the start/end time.

Therefore, the location of the output file is different between Massively Parallel Cluster and

Multi-purpose Parallel Cluster.

Example：

#!/bin/sh

#MJS: -pc

#MJS: -proc 1

#MJS: -eo

#MJS: -cwd

#FTLDIR: $MJS_CWD

srun ./a.out > output.log

The output.log will be created as follows:

* Executed on Massively Parallel Cluster: $MJS_CWD/REQUEST-ID/output.log.0

* Executed on Multi-purpose Parallel Cluster: $MJS_CWD/output.log

So, specify a hardware resource as follows:

* Execute a job on Massively Parallel Cluster: -mpc

* Execute a job on Multi-purpose Parallel Cluster: -upc

* Execute a job on Massively Parallel Cluster or Multi-purpose Parallel Cluster: -pc


- 55 -

Number of cores, amount of memory and elapsed time depend on hardware resource.

Hardware

resource

(*1)

Number of available

cores per job(*2)

Max

elapsed

time to

specify(*3)

per process (*4)

Quick

Use

General

Use

amount of memory to

specify

local disk size

to specify

-pc

(PC Clusters)

1～128 1～128 72 H [executed on Massively Parallel Cluster]

default 1,200MB

(max. 9,600MB)

default 40GB

(max.

320GB) 129～256 129～512 24 H

- 513～3803 6 H

[executed on Multi-purpose Parallel

Cluster]

default 2,600MB

(max. 20,800MB)

default 10GB

(max. 80GB)

-mpc

(Massively

Parallel

Clusters)

2～128 2～128 72 H

default 1,200MB

(max. 9,600MB)

default 40GB

(max.

320GB)

129～256 129～512 24 H

- 513～8192 6 H

-upc/-accel

-accelex

(Multi-purpose

Parallel Cluster

/GPGPU (*5) )

1～128 1～128 72 H

default 2,600MB

(max. 20,800MB)

default 10GB

(max. 80GB)

129～256 129～512 24 H

- 513～800 6 H

Table 5-5 Available hardware resource

Caution

(*1) One hardware resource must be specified. (Two or more hardware resources cannot to be

specified.)

(*2) Number of available cores is number of process x number of thread.

(*3) If -time option is omitted when job is submitted, maximum elapse time is set according to the

number of cores assigned to the job

(*4) Amount of memory per process can be specified up to maximum value according to hardware

resource in the table. When amount of memory more than default is specified, use computation

time based on a number of cores occupied according to specified amount of memory.

(Example) In case of executing 2 cores in parallel job which is specified 30GB amount of memory

per process on Large Memory Capacity Server

Specified amount of memory per process 30GB = default amount of memory 15GB x 2

It is equivalent to 2 core’s amount of memory.

Computation time of 2 cores x 2 cores in parallel job = The job uses 4 core’s computation time


- 56 -

(*5) On GPGPU, when -accelex is specified, the job is executed exclusively occupying a node per

process. So, computation time for cores of the nodes is used regardless of a specified number of

cores.

When -accel is specified, the amount of the occupation of the resource occupies one CPU (for

4cores).

(*6) This cluster is 12 cores per node unlike other clusters. Pleae take care when you spacify a

parallel number.

Massively Parallel Cluster, Multi-purpose Parallel Cluster have 2 CPU (4 core / CPU) per computing

node. Jobs using 1 core share CPU. However, parallel jobs which specified two or more cores occupy

CPU (4 cores). Therefore, if a job occupies more cores than specified, computation time is used

accordingly.

A job using 2 cores

core

Jobs of parallel core always occupy

CPU. Computation time is used

according to number of CPU.

not available to use

Jobs using 1 core CPU(4 core)

core

Jobs using 1 core share CPU.

CPU(4 core)


- 57 -

5.1.3.3 Software resource

Software resources to be specified are as follows.

Software resource Execution software Computing server system to use

-g03 Gaussian03 Multi-purpose Parallel Cluster Cluster for single jobs using SSD Large Memory Capacity Server

-g09 Gaussian09 Multi-purpose Parallel Cluster Cluster for single jobs using SSD Large Memory Capacity Server

-g03nbo NBO 5.G Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-g09nbo NBO 5.9 Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-g09nbo6 NBO 6.0 Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-adf ADF2013.01 Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-adf2010 ADF2010.02 Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-gamess GAMESS(socket) Massively Parallel Cluster Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-gamess_mpi GAMESS(socket) Massively Parallel Cluster Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-amber8 Amber8 MDGRAPE-3 Cluster

-amber10 Amber10 Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-amber11 Amber11 Multi-purpose Parallel Cluster Cluster for single jobs using SSD Multi-purpose Parallel Cluster (GPGPU)

-amber12 Amber12

-amber14 Amber14

-ansys ANSYS Multi-purpose Parallel Cluster

-clustalw ClustalW Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-blast BLAST Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-hmmer HMMER Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-fasta FASTA Multi-purpose Parallel Cluster Cluster for single jobs using SSD

-cluster3 CLUSTER 3.0 Multi-purpose Parallel Cluster Cluster for single jobs using SSD Large Memory Capacity Server

-qchem Q-Chem 4.1 Multi-purpose Parallel Cluster Cluster for single jobs using SSD

Table 5-6 Software resource list


- 58 -

Based on specified software resource, number of processes to specify or elapsed time to specify is

different from one according to hardware resource. Available resources to specify are as follows.

(“**”means the same value according to hardware resource as shown in Table 5-5 Available hardware

resource)

Software resource

Hardware resource to specify

Number of processes to specify (*1)

Number of threads to specify (*2)

Max. Elapsed time

Amount of memory to specify

-g03

-g09 -pc/-upc/-ssc 1 ** ** **

-g03nbo

-g09nbo

-g09nbo6

-pc/-upc/-ssc 1 ** ** **

-adf

-adf2010 -pc/-upc/-ssc ** 1 ** **

-gamess -pc/-mpc/-upc/

-ssc ** ** ** **

-gamess_mpi -pc/-mpc/-upc/

-ssc ** ** ** **

-amber10 -pc/-upc/-ssc ** 1 ** **

-amber11 -pc/-upc/-accel/

-accelex/-ssc ** 1 ** **





-clustalw -pc/-upc/-ssc ** 1 ** **

-blast -pc/-upc/-ssc 1 ** ** **

-hmmer -pc/-upc/-ssc 1 ** ** **

-fasta -pc/-upc/-ssc 1 ** ** **

-cluster3 -pc/-upc/-ssc 1 1 ** **

Table 5-7 Available software resource

---- Note ----

(*1) It is a number of processes generated for job execution. It is specified by "-proc" of qsub

option.

(*2) It is a number of threads generated for job execution. It is specified by "-thread" of qsub

option.

(*3) The number of ANSYS Solver license is 1. Therefore, only one job using ANSYS can be

executed at the same time.


- 59 -

5.1.3.4 Other job properties

Job property Meaning How to specify

-pri Priority to a job

Specify 0 – 65535. Default is 100.

The larger value is the higher priority.

Example: #MJS: -pri 10000

-start_time Time when a job starts (*1)

Format [[YYYY/]MM/DD-]HH:MM

Example:

#MJS:-start_time 2009/10/01-09:00

Table 5-8 Job property list

Caution

(*1) If specified resources cannot be secured by specified time, status changes from WAIT(WIT) to

TIME OVER(TOV). Jobs in TOV status can be deleted but do not start.

5.1.3.5 Major options for job submission command

5.1.3.5.1 Specify a number of processes

A number of cores specified by PROC-NO are allocated for processes for a job. If this option is omitted,

PROC-NO is set to 1. Specify a number of processes to execute a parallel job with interprocess

communication such as MPI program or XPFortran program.

Caution

If PROC-NO x THREAD-NO of "-thread <THREAD-NO>" at the next section exceeds the maximum

number of cores to specify, a job submission error occurs. Please specify a proper number of cores to

submit a job.

5.1.3.5.2 Specify a number of threads

A number of cores specified by THREAD-NO are allocated for threads for a job. If this option is omitted,

THREAD-NO is set to 1. Specify a number of threads to execute a parallel job generating threads.

-proc <PROC-NO>

-thread <THREAD-NO>


- 60 -

5.1.3.5.3 Specify amount of memory

Amount of memory per process specified by MEMSIZE is secured for job execution. Units which can be

specified are kb, mb or gb (default: mb (PC Clusters/MDGRAPE-3 Cluster), gb (Large Memory

Capacity Server)). Blank characters must not be put between MEMSIZE and a unit. If this option is

omitted, default of amount memory is set. (Please refer to Table 5-4 Hardware resource list.)

(Example 1) -mem 800mb --> 800MegaByte = 800 x 1024KiloByte = 800 x 1024 x 1024Byte

(Example 2) -mem 8gb --> 8GigaByte = 8 x 1024MegaByte = 8 x 1024 x 1024KiloByte

5.1.3.5.4 Specify HDD size (for PC Clusters, MDGRAPE-3 Cluster)

HDD size per process is specified by HDDSIZE for job execution. Units which can be specified are kb,

mb or gb. Blank characters must not be put between HDDSIZE and a unit. This option is for users who

need large size of local disk area. If this option is omitted, default of local disk size is set.

(Example 1) -hdd 2000mb → 2000MegaByte = 2000 x 1024KiloByte = 2000 x 1024 x 1024Byte

(Example 2) -hdd 10gb → 10GigaByte = 10 x 1024MegaByte = 10 x 1024 x 1024KiloByte

5.1.3.5.5 Specify elapsed time

Execute job within specified elapse time. When the job does not end within the elapse time, the job is

forcibly deleted. This prevents the job wasting resources when the job goes into an infinite loop, etc. If

this option is omitted, maximum elapse time is set according to the number of cores assigned to the job

(refer to 5.1.3.2 Hardware resources Table 5-5 Available hardware resource). Elapsed time specified by

ELAPSETIME is set in format of HH:MM:SS (HH: hour, MM: minute, SS: second) or SSSSS (SSSSS:

second).

(Example 1) -time 24:10:10 --> 24 hours 10 minutes 10 seconds

(Example 2) -time 3600 --> 3600 seconds

(Example 3) -time 59:01 --> 59 minutes and 1 second

[Backfill function]

The job scheduler determines priorities of jobs based on the users' usage of resources and determines

the job which starts next. However, the job scheduler starts other low-priority jobs so long as they don't

delay the highest priority job by backfill function. Therefore, it is possible that job starts earlier if proper

ELAPSETIME is specified for the job.

-mem <MEMSIZE>[kb|mb|gb]

-hdd <HDDSIZE>[kb|mb|gb]

-time <ELAPSETIME>


- 61 -

5.1.3.5.6 Merge standard output and standard error output

Merge a standard error output file with a standard output file. If this option is omitted, a standard error

output file and a standard output file are generated separately.

5.1.3.5.7 Send email at start of job

At the start of the job, an email is sent to address in the application form.

5.1.3.5.8 Send email at end of job

At the end of the job, an email is sent to address in the application form.

5.1.3.5.9 Specify Request name

Execute a job as request name REQNAME. If this option is omitted, request name of a job is script file

name. Blank character must not be included in request name.

5.1.3.5.10 Specify if subsequent jobs are deleted when the chain job ends abnormally

Specify if subsequent jobs are deleted when the chain job ends abnormally (Y: delete, N: not delete).

Caution

A) If the option is omitted, default is –chaindel Y (delete subsequent jobs). However, if –rerun Y

(rerun the job) is specified, default is –chaindel N (execute subsequent jobs).

B) Job submission with both –rerun Y and –chaindel Y (delete subsequent jobs) fails with the

following error message.

qsub: ERROR: 0016: invalid options: cannot enable -chaindel Y and -rerun Y at the same time.

C) This option is valid for chain job. It is ignored for non chain job.

D) It is possible to submit jobs which are specified –chaindel and jobs which are not specified

–chaindel as a chain job.

-eo

-mb

-me

-r <REQNAME>

-chaindel [Y|N]


- 62 -

5.1.3.5.11 Move to directory where job is submitted when job starts

A Job executes the script in the home directory by default. Specify –cwd option to execute the script in

the directory where the job is submitted.

5.1.3.5.12 Specify project number

Specify a project number for job execution. Project numbers are ID issued when the applications are

permitted by the administrator. (Users who have one project number don't need this option. This option

is for users who have two or more projects.)

A default project number can be specified by the variable MJS_QSUB_PROJECT in the .cltkrc (CLTK

user configuration file) located in home directory.

Example) Edit the .cltkrc file

[username@ricc1:~] vi $HOME/.cltkrc

MJS_QSUB_PROJECT = G00001 <-- Specify a default project number

5.1.3.5.13 Submit bulk jobs

Submit a batch job as bulk jobs. Specify a range of Bulk Index ID by <N>-<M>. Steps of Bulk Index ID

can be specified by S. please refer to “5.1.1.3 Submit bulk job”.

Example) submit 50 sub jobs as bulkjob

[username@ricc1:~] qsub –B 1-50 go.sh


[username@ricc1:~] qstat

REQID NAME STAT ELAPSE START-TIME CORE

--------------------------------------------------------------------

5381290[1].jms go.sh RUN 00:00 06/09 14:37 1

5381290[2].jms go.sh RUN 00:00 06/09 14:37 1

5381290[3].jms go.sh RUN 00:00 06/09 14:37 1

5381290[4].jms go.sh RUN 00:00 06/09 14:37 1

5381290[5].jms go.sh RUN 00:00 06/09 14:37 1

5381290[6].jms go.sh RUN 00:00 06/09 14:37 1

5381290[7].jms go.sh RUN 00:00 06/09 14:37 1

5381290[8-50].jms go.sh QUE --:-- --/-- --:-- 1

-cwd

-project <PROJECT-NO>

-B <N>-<M>[:S]


- 63 -

Example) submit 13 sub jobs as bulkjob with step number

[username@ricc1:~] qsub –B 1-25:2 go.sh




--------------------------------------------------------------------

5381341[1].jms go.sh RUN 00:00 06/09 14:37 1

5381341[3].jms go.sh RUN 00:00 06/09 14:37 1

5381341[5].jms go.sh RUN 00:00 06/09 14:37 1

5381341[7].jms go.sh RUN 00:00 06/09 14:37 1

5381341[9].jms go.sh RUN 00:00 06/09 14:37 1

5381341[11].jms go.sh RUN 00:00 06/09 14:37 1

5381341[13].jms go.sh RUN 00:00 06/09 14:37 1

5381341[15,17,19,21,23,25].jms go.sh QUE --:-- --/-- --:-- 1

5.1.3.5.14 Send email at start of first bulk subjob

When any one bulk subjob starts, an email is sent to address in the application form.

5.1.3.5.15 Send email at start of all bulk subjobs

When all bulk subjobs start, an email is sent to address in the application form.

5.1.3.5.16 Send email at end of first bulk subjob

When any one bulk subjob ends, an email is sent to address in the application form.

5.1.3.5.17 Send email at end of all bulk subjobs

When all bulk subjob end, an email is sent to address in the application form.

-bmb

-bmab

-bme

-bmae


- 64 -

5.1.3.6 Execution command

Batch jobs and interactive batch jobs execute execution commands specified after job submission

options and resources to use.

5.1.3.6.1 Execution command for PC Clusters

For PC Clusters, following commands are available.

Execution command (*1) Meaning

srun Serial program

Thread parallel program (maximum number of threads: 8)

mpirun MPI parallel program, Hybrid parallel program (MPI + thread)

xpfrun XPFortran parallel program

---- Note ----

(*1) Options of execution commands are not necessary since number of cores (MPI parallel or thread

parallel) and resources to use are already specified at job submission.

Example 1) Execute a serial program

srun ./serial.out

Example 2) Execute an MPI parallel program

mpirun ./mpi.out

Example 3) Execute an XPFortran program

xpfrun ./xpf.out

5.1.3.6.2 Execution command for MDGRAPE-3 Cluster

The commands of PC Clusters are available.

5.1.3.6.3 Execution command for Large Memory Capacity Server

For Large Memory Capacity Server, following commands are available.

Execution command (*1) Meaning

srun Serial program

Thread parallel program (maximum number of threads: 32)

mpirun MPI parallel program, Hybrid parallel program (MPI + thread)

---- Note ----

(*1)Options of execution commands are not necessary since number of cores (MPI parallel or

thread parallel) and resources to use are already specified at job submission.


- 65 -

Example 1) Execute a serial program

srun ./serial.out

Example 2) Execute an MPI parallel program

mpirun ./mpi.out


- 66 -

5.1.3.7 Script file for Batch job

Create scripts by vi or emacs etc. to submit batch jobs. Script files except resource name (hardware

resources and software resources) are available to use among all the computing server systems.

5.1.3.7.1 Script for PC Clusters

Since Massively Parallel Cluster cannot access home area, transfer execution files to computing

nodes' local disk area in advance of job execution by specifying file transfer function (FTL) in scripts.

On FTL, please refer to 6 FTL (File Transfer Language).

We explain a script for a job which needs following resources.

- Number of processes (cores) : 8 cores

- Amount of memory : 1200MB

- Elapsed time : 10 H

- Merge standard error output and standard output : Yes

- Restart the job in case of trouble : Yes

- Move to directory where job is submitted : Yes

Table 5-9 Script for PC Clusters

Don't change "#!/bin/sh" in the first line and "#MJS:" for job options in 3rd to 9th line because these

have special meanings.

[username@ricc1:~] vi go-pc.sh

#!/bin/sh

#------ qsub option --------#

#MJS: -pc Specify hardware resource

#MJS: -proc 8 Specify a number of processes

#MJS: -mem 1200mb Specify amount of memory

#MJS: -time 10:00:00 Specify elapsed time

#MJS: -eo Merge standard output / error output

#MJS: -rerun Y Restart job in case of trouble

#MJS: -cwd Move to directory where job is submitted

#------- FTL command -------#

#FTLDIR: $MJS_CWD Specify file transfer (Note)

#------- Program execution -------#

mpirun ./para.out Execute job


- 67 -

5.1.3.7.2 Script for Large Memory Capacity Server

We explain a script for a job which needs following resources.

- Number of processes (cores) : 8 cores

- Amount of memory : 30GB

- Elapsed time : 10 H

- Merge standard error output and standard output : Yes

- Restart the job in case of trouble : Yes

- Move to directory where job is submitted : Yes

Note) Files in the directory where the job is submitted are transferred to computing nodes' local disk

area automatically in advance of job execution. Also, files in the directory on computing nodes are

collected after job execution. Except for Massively Parallel Cluster, this command is ignored.

When FTLDIR is used, unnecessary files may be transferred. Also, existence of files is checked after

job execution even though there is no file to be transferred. For large scale parallel jobs, as the costs

may be high, please use BEFORE and AFTER instead of FTLDIR.

For more information on BEFORE and AFTER, please refer to 6 FTL (File Transfer Language).

[username@ricc1:~] vi go-ax.sh

#!/bin/sh

#------ qsub option --------#

#MJS: -ax Specify hardware resource

#MJS: -proc 8 Specify a number of processes

#MJS: -mem 30gb Specify amount of memory

#MJS: -time 10:00:00 Specify elapsed time

#MJS: -eo Merge standard output / error output

#MJS: -rerun Y Restart job in case of trouble

#MJS: -cwd Move to directory where job is submitted

#------- FTL command -------#

#FTLDIR: $MJS_CWD

#------- Program execution -------#

mpirun ./para.out Execute job


- 68 -

Table 5-10 Script for Large Memory Capacity Server

Don't change "#!/bin/sh" in the first line and "#MJS:" for job options in 3rd to 9th line because these have

special meanings.


- 69 -

5.1.4 Confirm job information

Use the qstat command to confirm job information.

Option Meaning

(none) Display job information

-d In addition to job information, display directory where a job is submitted

-m In addition to job information, display using amount of memory

-p In addition to job information, display priority

-e Display completion information

-w Display reason for waiting

-project Display job of specified project (for users who have two or more projects)

5.1.4.1 qstat command

Display user's own job list submitted currently.

REQID : REQUEST-ID

* Bulk job (running): "Bulk ID"."Bulk Index ID"

* Bulk job (waiting): "Bulk ID".["start of Bulk Index ID"-"end of Bulk Index ID"]

NAME: REQUEST-NAME (If omitted, request name is script file name)

STAT: Batch job status

( RUN: running, QUE: waiting to run, WIT: waiting to specified start time,

HLD: hold(*1), END:end(*2), TOV:time over to specified start time(*3))

ELAPSE: Elapsed time (HH:MM)

START-TIME: Time when the job starts (MM/DD HH:MM)

CORE: Number of allocated cores

(*1): State where the job is prevented from starting

(*2): Only for coupled calculation jobs

(*3): State where the job is held because job cannot start at specified start time.

qstat [-d|-m|-p|-e|-w|-project] [REQID]


[Q00001] Study of massively parallel programs on RIKEN Cluster....


-------------------------------------------------------------------

12342.jms go.sh RUN 12:34 07/28 12:00 8

12348.jms go.sh QUE --:-- --/-- --:-- 8

12412[1].jms bulk.sh RUN 12:04 07/28 12:30 1

12412[2].jms bulk.sh RUN 12:04 07/28 12:30 1

12412[3-10].jms bulk.sh QUE --:-- --/-- --:-- 1


- 70 -

5.1.4.2 qstat command (directory where job is submitted)

With –d option, directory where job is submitted is displayed in addition to job information.

[Meaning of additional information]

SUBMIT-DIR: Directory where job is submitted

5.1.4.3 qstat command (amount of memory in use)

With –m option, memory information is displayed in addition to job information. Memory information is

updated periodically. Maximum amount of memory in use in all processes is displayed.


MEM: Maximum amount of memory in use

5.1.4.4 qstat command (priority)

With -p option, priority is displayed in addition to job information.


PRI: Job priority

[username@ricc1:~] qstat –m


REQID NAME STATUS ELAPSE START-TIME CORE MEMORY

--------------------------------------------------------------------

12342.jms go.sh RUN 12:34 07/28 12:00 8 500M

12348.jms go.sh QUE --:-- --/-- --:-- 8 --

[username@ricc1:~] qstat –p


REQID NAME STAT ELAPSE START-TIME CORE PRI

--------------------------------------------------------------------

12342.jms go.sh RUN 12:34 07/28 12:00 8 100

12348.jms go.sh QUE --:-- --/-- --:-- 8 100

[username@ricc1:~] qstat -d


REQID NAME STATUS ELAPSE START-TIME CORE SUBMIT-DIR

--------------------------------------------------------------------

12342.jms go.sh RUN 12:34 07/28 12:00 8 $HOME/JOB

12348.jms go.sh QUE --:-- --/-- --:-- 8 $HOME/JOB

12412[1].jms bulk.sh RUN 12:04 07/28 12:30 1 $HOME/JOB

12412[2-10].jms bulk.sh QUE --:-- --/-- --:-- 1 $HOME/JOB


- 71 -

5.1.4.5 qstat command (finished job)

With -e option, list of finished jobs is displayed.

(*) If amount of memory cannot be obtained. "-" is displayed.

[username@ricc1:~] qstat -e


REQID NAME STATTIME ENDTIME CORE MEM(*) SUBMIT-DIR

--------------------------------------------------------------------

12321.jms go.sh 07/21 08:00 07/28 14:21 896 500M $HOME/JOB1

4649.ax go.sh 07/25 12:40 07/28 16:04 8 20G $HOME/axjob

12324.jms go.sh 07/28 12:00 07/28 13:09 896 500M $HOME/JOB

12412[1].jms bulk.sh 07/28 12:00 07/28 13:09 1 500M $HOME/JOB

12412[2].jms bulk.sh 07/28 12:00 07/28 13:09 1 500M $HOME/JOB


- 72 -

5.1.4.6 qstat command (reason for waiting)

For jobs of top 10 priority which are waiting to run, display reasons and estimated time when the jobs

start. Estimated time may be different from actual start time because it depends on other jobs execution

situation, etc.


MEM: Specified amount of memory (If not specified, "-" (hyphen) is displayed)

ESTIMATE: Estimated time when the job starts

< 6hrs The job will start in 6 H.

<12hrs The job will start in 12 H.

<24hrs The job will start in 24 H.

< 3days The job will start in 3days.

> 3days The job will end after 3days.

REASON: Reason for waiting to run

Insufficient cores Cores are insufficient

Insufficient memory Amount of memory is insufficient

Insufficient license License is insufficient

Otherjob booking cores Other jobs inhibit the job to run

Chain job Chain job inhibits the job to run

Upper limit of project The limit on number of cores the project could

use at a time inhibits the job to run.

Over specified start time Job cannot run because of over specified

start time

[username@ricc1:~] qstat -w


REQID NAME STAT CORE MEM ESTIMATE REASON

--------------------------------------------------------------------

13574.jms go.sh QUE 1024 -- < 6hrs Insufficient cores

13575.jms sim.sh QUE 8 -- > 24hrs Insufficient license

4695.ax go.sh QUE 16 -- < 12hrs Insufficient memory

13577.jms go.sh QUE 1024 -- > 24hrs Insufficient cores

13577.jms go-1.sh QUE 1024 -- > 3days Insufficient cores

13577.jms go-2.sh QUE 1024 -- < 3days Chain job

*****************************************************************************

The estimation time is transitorily changed by the job execution or submission.

*****************************************************************************


- 73 -

5.1.4.7 qstat command (project number)

Display jobs of specified project number. This function is only available for users who have two or more

projects.

5.1.5 Display standard output / standard error output

Use the qcat command to display submitted script files, running jobs' standard output file or standard

error output file.

Option Meaning

(none) Display running job's standard output file.

-e Display running job's standard error output file.

-o Display running job's standard output file.

-s Display job's script file.

5.1.6 Confirm resource information

Display information on usage of the system or available resources.

Option Meaning

-x Display available resources and limit.

-uc Display usage of cores in the system.

-um Display usage of memory on Large Memory Capacity Server.

qstat [-x|-uc|-um]

qcat [-o|-e|-s] REQID

[usernane@ricc ~] qstat –project G00001

[G00001] Research of RICC <-- Display project number (G00001)


---------------------------------------------------------------------

12342.jms go.sh RUN 12:34 07/28 12:00 8

12348.jms go.sh QUE --:-- --/-- --:-- 8


- 74 -

5.1.6.1 Display resource information

Display hardware and software resources which the user can specify for each project.

[username@ricc1:~] qstat -x


H_RESOURCE MAX_CORE/J MAX_CORE/P SUBMIT ELAPSE MEMORY RUN QUEUED

-----------------------------------------------------------------------

pc - 256 10/500 - - 0( 189) 0( 206)

+- mpc 256 - - 72H 10240mb 10( 180) -

+- upc 256 - - 72H 21200mb 0( 6) -

+- ssc 96 - - 72H 43200mb 0( 0) -

S_RESOURCE[pc(mpc)] MAX_PROC/J MAX_THREAD/J ELAPSE MEMORY

-----------------------------------------------------------------------

amber14 - 1 - -

g09 1 - - -

g09nbo 1 - - -

g09nbo6 1 - - -

gamess - - - -

gamess_mpi - - - -

S_RESOURCE[pc(upc)] MAX_PROC/J MAX_THREAD/J ELAPSE MEMORY

-----------------------------------------------------------------------

adf - 1 - -

adf2010 - 1 - -

adf2013 - 1 - -

adf2014 - 1 - -

amber10 - 1 - -

amber11 - 1 - -

amber12 - 1 - -

amber14 - 1 - -

blast 1 - - -

clustalw - 1 - -

cluster3 1 1 - -

fasta - - - -

g03 1 - - -

g03nbo 1 - - -

g09 1 - - -

g09nbo 1 - - -

g09nbo6 1 - - -

gamess - - - -

hmmer 1 - - -

visit 160 8 24H -

visitgpu - - - -

S_RESOURCE[pc(ssc)] MAX_PROC/J MAX_THREAD/J ELAPSE MEMORY

------------------------------------------------------------------------

adf - 1 - -

adf2010 - 1 - -

adf2013 - 1 - -

adf2014 - 1 - -

amber10 - 1 - -

amber11 - 1 - -

amber12 - 1 - -

amber14 - 1 - -

blast 1 - - -

clustalw - 1 - -

cluster3 1 1 - -


- 75 -

Item Meaning

H_RESOURCE Hardware resource

MAX_CORE/J Max. number of cores to specify per job (default value in

parenthesis)

MAX_CORE/P Max. number of cores the project can use at a time

SUBMIT Number of Submitted jobs / Max. number of jobs the project can

submit

ELAPSE Max. elapsed time to specify per job

MEMORY Max. amount of memory to specify per job (default value in

parenthesis)

RUN Number of running jobs of the project (total value in parenthesis)

QUEUED Number of waiting jobs of the project (total value in parenthesis)

S_RESOURCE[pc(mpc)] Available software resources for Massively Parallel Cluster

S_RESOURCE[pc(upc)] Available software resources for Multi-purpose Parallel Cluster

S_RESOURCE[pc(accel)] Available software resources for Multi-purpose Parallel

Cluster(GPU)

S_RESOURCE[pc(accelex)

]

Available software resources for Multi-purpose Parallel

Cluster(GPU)

MAX_PROC/J Max. number of processes to specify per job

MAX_THREAD/J Max. number of threads to specify per job

Application_NAME Software resource that has limitation on number of use.

USE/MAX Number of using cores / Max. number of available cores

5.1.6.2 Display usage of core

Display usage of cores in the system.

Display current usage of cores in each system. RATIO(USED/ALL) means ratio of use (%), number

of cores in use and max. number of cores.

[username@ricc1:~] qstat -uc

The status of CORE use RATIO(USED/ALL)

------------------------------------------------------------------------

mpc *********************************------- 84.9%(3232/3888)

upc **************************************** 100.0%(0800/0800)

ssc *********************------------------- 54.6%(0118/0216)


- 76 -

5.1.7 Confirm project user job information

Use prjstat command to confirm job lists of user belongs to same project.

Option Meaning

-r Sort request id

5.1.7.1 prjstat command

Display same project user's job list submitted currently.

Example : Display job list of users who belongs to project G00001

Item Meaning

USER Username submitted jobs

5.1.7.2 prjstat command (display job list in order of request ID)

With -p option, job list is displayed in order of request ID.

prｊstat [-r]

[username@ricc1:~] prjstat

[G00001] Research of RICC

USER REQID NAME STAT ELAPSE START-TIME CORE

----------------------------------------------------------------------

userA 1234567.jms go.sh RUN 01:40 02/15 13:40 128

userB 1234599.jms test.sh RUN 01:46 02/15 13:34 8

userC 1234600.jms run.sh QUE --:-- --/-- --:-- 256

userC 1234601.jms run.sh QUE --:-- --/-- --:-- 512

userD 1234301.jms run-d.sh RUN 49:40 02/13 13:34 32

[username@ricc1:~] prjstat -r

[G00001] Research of RICC

REQID NAME USER STAT ELAPSE START-TIME CORE

----------------------------------------------------------------------

1234301.jms run-d.sh userD RUN 49:40 02/13 13:34 32

1234567.jms go.sh userA RUN 01:40 02/15 13:40 128

1234599.jms test.sh userB RUN 01:46 02/15 13:34 8

1234600.jms run.sh userC QUE --:-- --/-- --:-- 256

1234601.jms run.sh userC QUE --:-- --/-- --:-- 512


- 77 -

5.1.8 Operate job

5.1.8.1 Operate file of running job

Operate files in the local disk area of computing nodes. So, these commands are only for jobs running

on Massively Parallel Cluster.

5.1.8.1.1 Display file list on computing node

Use the qls command to display running job's file list on computing nodes' local disk area.

The ls command's options are available as OPTION.

Specify REQUEST-ID and process number (@RankNO) to display file list in the job execution directory.

Example: Display files in the job execution directory of rank 0 of REQUEST-ID 13562.jms

5.1.8.1.2 Get file of running job

Use the qget command to transfer files of running job on computing nodes' local disk area.

From Login Server, get specified REQUEST-ID and process (@RankNO) 's files on computing nodes'

local disk area to home area.

Example) Get a file (resultfile) of rank 0 of a running job (REQUEST-ID 13579.jms)

qget REQID[@RankNO] SRC .. [DEST]

qls REQID [@RankNO] [OPTION]

[username@ricc ~] qls 13562.jms@0

result_file

exec.tar.gz

go.sh

[username@ricc ~]$ qls 13579.jms -l ← Confirm files in the job

total 48

-rwxr-xr-x 1 username group 46555 Jul 22 13:45 resultfile

[username@ricc ~]$ qget 13579.jms@0 result /tmp ← Execute qget command

[username@ricc ~]$ ls /tmp/result ← Confirm the files are transferred

/tmp/result


- 78 -

5.1.8.1.3 Put result file

Use the qput command to put files on computing nodes' local disk area to other nodes or home area

Option Meaning

-del Delete source files after putting files.

The qput command can be invoked from script.

Example) Put resultfile to the directory where the job is submitted during the job execution.

qput [OPTION] [@SRC_Rank:] SRC DEST

qput [@SRC_Rank:] SRC @DEST_Rank_LIST

[usernane@ricc ~] vi go-qput.sh

#!/bin/sh

#------ Option Set for qsub command --------#

#MJS: -pc

#MJS: -proc 8

#MJS: -time 10:00:00

#MJS: -rerun Y

#MJS: -cwd

mpirun ./a.out > resultfile

qput resultfile $MJS_CWD ← Copy to the directory where the job is submitted

mpirun ./b.out


- 79 -

5.1.8.2 Cancel job

Use the qdel command to cancel a job. Use the qd command to cancel two or more jobs interactively.

Option Meaning

(none) Cancel a job

-K Cancel a job and delete standard output / error output file (except for Large

Memory Capacity Server).

-collect Cancel a job and collect files on computing nodes (except for Massively

Parallel Cluster).

5.1.8.2.1 Example of qdel command

Confirm REQUEST-ID of job to delete.

Specify job's REQUEST-ID (REQID) as argument of the qdel command.

If standard output / error output are not necessary on PC Clusters, specify –K option. This is

especially specified to cancel the job in EXITING status.



REQID NAME STAT ELAPSE START-TIME CORES

--------------------------------------------------------------------

12342.jms go.sh RUN 12:34 07/28 12:00 8

12348.jms go.sh RUN 0:20 07/28- 00:14 8

12356.jms go.sh RUN 0:05 07/28- 00:29 8

12412[1].jms bulk.sh RUN 0:05 07/28 00:29 1

12412[2].jms bulk.sh RUN 0:05 07/28 00:29 1

12348[3-10].jms bulk.sh QUE --:-- --/-- --:-- 1

qdel [-K|-collect] REQID

qd [-K|-collect]

[usernane@ricc ~] qdel 12348.jms

Request 12348.jms has been deleted.

[usernane@ricc ~] qdel 5963.ax

Request 5963.ax has been deleted.

[usernane@ricc ~] qdel -K 12342.jms

Request 12342.jms has been deleted.


- 80 -

Specify –collect option to cancel the job and collect files the job generated on computing nodes of

PC Clusters.

Specify Bulk ID to delete a whole bulk job.

Specify "Bulk ID"."Bulk Index ID" to delete a bulk job individually.

Specify "Bulk ID"["Bulk Index ID"] to delete 2 bulk jobs or more at the same time.

5.1.8.2.2 Example of qd command

Display submitted job list by the qd command.

Enter NO of the job to cancel. If two or more jobs are to be cancelled, specify them separated by

comma (,) or using hyphen (-). If all jobs are to be cancelled, enter "all". Enter "q" or "quit" to quit

the qd command.

On the qd command, running bulk jobs are handled as a job individually and waiting bulk jobs are

handled as a job for a whole.

[usernane@ricc ~] qdel -collect 12356 .jms

Request 12356.jms is running, and has been signalled.

qd: input NO: 2 3

qd: Are you sure? (yes|no)? yes

Request 12348.jms is running, and has been signalled.

Request 5963.ax has been deleted

qd: Normal end

[username@ricc1:~] qd

NO REQID NAME STAT ELAPSE START-TIME CORES

-------------------------------------------------------------------

1 12342.jms go.sh RUN 12:34 07/28 12:00 8

2 12348.jms go.sh RUN 0:20 07/28 00:14 8

3 12412[1].jms bulk.sh RUN 0:05 07/28 00:29 1

4 12412[2].jms bulk.sh RUN 0:05 07/28 00:29 1

5 12348[3-10].jms bulk.sh QUE --:-- --/-- --:-- 1

qd: input NO:

[username@ricc1:~] qdel 12412.jms

Bulk Request 12412.jms has been deleted.

[username@ricc1:~] qdel 12412[1].jms

Request 12412[1].jms has been deleted.

[username@ricc1:~] qdel 12412[1,3,5-10].jms

Bulk Request 12412[1,3,5-10].jms has been deleted.


- 81 -

5.1.8.3 Delete completed job information

Specify –e option to delete completed job information.

5.1.8.3.1 Example of qdel command

Display list of completed jobs.

Specify REQUEST-ID (REQID) of job to delete it from list of completed jobs.

(*) 500 completed jobs are preserved.

When a number of ended jobs is 500 or more, oldest job is deleted.

qdel -e REQID

qd -e

[usernane@ricc ~] qdel -e 12321.jms

Request 12321.jms was deleted from jobhistory-file.

[username@ricc1:~] qstat -e


REQID REQNAME START-TIME END-TIME CORES MEM SUBMIT-DIR

------------------------------------------------------------------

12321.jms go.sh 07/21 08:00 07/28 14:21 896 500M $HOME/JOB1

12324.jms go.sh 07/28 12:00 07/28 13:09 896 500M $HOME/JOB


- 82 -

5.1.8.3.2 Example of qd command

Specify –e option as option of the qd command.

Enter NO of the job to cancel. If two or more jobs are to be cancelled, specify them separated by

comma (,), blank character or hyphen (-). If all jobs are to be cancelled, enter "all". Enter "q" or

"quit" to quit the qd command.

5.1.8.4 Alter priority of job

Alter priority of submitted job.

Specify priority from 0 to 65535 following –p option. (default: 100)

The priority of submitted bulkjob can not be changed.

qalter -p <PRIORITY> <REQID>

qd: input NO: 1 2

qd: Are you sure? (yes|no)? yes

Request 12321.jms was deleted from jobhistory-file.

Request 4649.ax was deleted from jobhistory-file.

qd: Normal end

[usernane@ricc ~] qalter -p 200 12343.jms

Request 12343.jms was changed to priority(200).

[username@ricc1:~] qd -e

NO REQID REQNAME START-TIME END-TIME CORES MEM SUBMIT-DIR

-----------------------------------------------------------------------

1 12321.jms go.sh 07/21 08:00 07/28 14:21 896 500M $HOME/JOB1

2 12322.jms go.sh 07/25 12:40 07/28 16:04 8 20G $HOME/JOB2

3 12324.jms go.sh 07/28 12:00 07/28 13:09 128 1.2G $HOME/JOB3


- 83 -

5.2 Interactive Job

For interactive job, there are limits as follows:

Number of

processes to

specify

Number of

threads to

specify

Max.

Elapsed time to

specify

Amount of

memory to

specify

PC Clusters 32 8 4 hour 2GB

5.2.1 Interactive Job execution for PC Clusters

Use following commands on Login Server to execute interactive job for PC Clusters.

Execution command Meaning

srun Serial program

Thread parallel program (max. number of threads: 8)

mpirun MPI parallel program (max. number of processes: 32)

xpfrun XPFortran parallel program (max. number of processes: 32)

If programs are executed not using above commands, the programs are executed on Login Server and

this may cause adverse effect on the system. When executing interactive jobs, above commands need

to use. Also, to execute programs of script language such as Perl or Python as interactive jobs, please

specify –pc option. Furthermore, if a job requires input from keyboard, please specify –pty to make

buffering of standard output off.

example 1) Execute serial program (execution module) (buffering of standard output is off)

[username@ricc1:~] srun –pty ./serial.out

example 2) Execute serial program (script)

[username@ricc1:~] srun –pc ./serial.pl

example 3) Execute thread parallel program by 4 threads

[username@ricc1:~] srun –thread 4 ./thread.out

example 4) Execute MPI parallel program by 4 processes

[username@ricc1:~] mpirun –np 4 ./mpi.out

example 5) Execute XPFortran parallel program by 4 processes

[username@ricc1:~] xpfrun –np 4 ./mpi.out

Also, ISV applications such as Gaussian, ADF, ANSYS(solver) and etc. cannot be executed by

interactive jobs. Please execute them by batch jobs.


- 84 -

6. FTL (File Transfer Language)

6.1 Introduction

In RICC, job execution area is different among clusters. For Multi-purpose Parallel Cluster, shared area

is used for variable job execution. For Massively Parallel Cluster, local area of computing nodes is used

to realize fast I/O and reduce as much access load as possible. Therefore, necessary files for job need

to be transferred from home area of Login Server to computing nodes in advance of job running. Also,

computation results need to be transferred from computing nodes to home area of Login Server after

jobs finish. FTL (File Transfer Language) is used for file transfer. FTL commands are embedded within

a script file for job execution or generated to execute ftlgen command (refer to 6.7 FTL generating tool :

ftlgen).

In addition, each process of parallel program cannot be controlled using files since computing nodes

don't share files with each other.

Please specify FTL option in the job scripts. (default: share)

ex1) File transfer by using FTL

#MJS: -fstype ftl

ex2) Do not use FTL option

#MJS: -fstype share


- 85 -

6.2 Transfer input file

To transfer input files, specify one or more following items.

Input files

RANK-LIST and Computing node's directory (optional)

Specify name of files to transfer as "Input files", rank (0 to number of processes -1) as "RANK-LIST"

and name of computing node's directory as "Computing node's directory".

If "RANK-LIST and Computing node's directory" is not specified, specified input files are transferred to

computing nodes in the same directory configuration of Login Server.

Fig. 6-1 Transfer input files from Login Server to computing nodes

Input files: /home/username/job/input

Rank: All ranks (Number of processes: n)

Login Server

shared area

computing node

rank: 0

rank: 7

Dir:/home/username/job

File:input

t

local area

computing node

rank: n-8

rank: n-1


File:input

local area


File:input

..

..

.....


- 86 -

6.3 Transfer input directory

To transfer input directories, specify one or more following items.

Input directories

RANK-LIST and Computing node's directory (optional)

Specify name of directories to transfer as "Input directories", rank (0 to number of processes -1) as

"RANK-LIST" and name of computing node's directory as "Computing node's directory".

All files in the specified input directories are transferred.

If "RANK-LIST and Computing node's directory" is not specified, specified input directories are

transferred to computing nodes in the same directory configuration of Login Server.

Fig. 6-2 Transfer input directory from Login Server to computing nodes

Input directory: /home/username/job/bin


Login Server

shared area

computing node

rank: 0

rank: 7

Dir:/home/username/job/bin

t

local area

computing node

rank: n-8

rank: n-1

Dir:/home/username/job/bin

local area

Dir: /home/username/job/bin

..

.. .....


- 87 -

6.4 Transfer output file

To transfer output files, specify one or more following items.

Output files

RANK-LIST and Login Server's directory (optional)

Specify name of output files to transfer as "Output files", rank (0 to number of processes -1) as

"RANK-LIST" and name of Login Server's directory as "Login Server's directory".

If "RANK-LIST and Login Server's directory" is not specified, specified output files are transferred to

Login server in the same directory configuration of computing nodes.

Fig. 6-3 Transfer output files from computing nodes to Login Server

Output file: /home/username/job/output


NFS


File:output.0

local area


File:output.n-1

local area

computing node

rank: 0

rank: 7

computing node

rank: n-8

rank: n-1

Login Server


File:output.0


File:output.n-1

･･････

shared area

..

.. .....


- 88 -

If each output file has the same name among computing nodes, it is possible to avoid overwriting by

adding rank number ( the first rank number in a node) to output file name.

Fig. 6-4 Transfer output files (in case of avoidance of overwriting)

Output file: /home/username/job/output


NFS


File:output

local area


File:output

local area

computing node

rank: 0

rank: 7

computing node

rank: n-8

rank: n-1

Login Server


File:output.0


File:output.n-8

･････

･

shared area

..

.. .....


- 89 -

6.5 FTL Basic Directory

There is a simple way to transfer files. Just specifying a directory name where files to transfer are

located as FTL basic directory, the files are transferred as follows:

[before job runs]

Files (not include directories) in the FTL basic directory on Login Server

--> recognized as input files and transferred to computing nodes

[after job ends]

Files (not include directories) in the FTL basic directory on computing nodes

--> recognized as output files and transferred to Login Server

Fig. 6-5 Transfer input files by specifying FTL basic directory

FTL basic directory: /home/username/job

Login Server

shared area

computing node

rank: 0

rank: 7

computing node

rank: n-8

rank: n-1

・・・・

・・・・・


File:input.0

local area


File:input.n-1

・・・・・・・・


File:input.0

local area


File:input.n-1

・・・・・・・・


File:input.0


File:input.n-1

..

..

.....


- 90 -

Transfer of output files by specifying FTL basic directory works as follows.

1. "ReqID" directory is created in the FTL basic directory on Login Server.

2. Files in the FTL basic directory on computing nodes are transferred into "ReqID" directory on

Login Server.

3. The first rank number in the computing node is added to the output file.(Note 1) (Note 2)

Note 1: It is possible to transfer output files with no rank number. However, the output files are

overwrited if the files are the same name.

Note 2: You can specify which type of files to transfer; newly created files while the job is running,

or updated files while the job is running. If neither of the types is specified, only newly created files

are transferred.

Fig. 6-6 Transfer output files by specifying FTL basic directory

Login Server

shared area

computing node

rank: 0

rank: 7

computing node

rank: n-8

rank: n-1

・・・・

local area

・・・・

・・・・・

local area

・・・・

・・・・・

Dir:/home/username/job/ReqID

File:output.0.0


File:output.0.n-8

・・・・

・・・・・

・・・・・・・


File:output.n-1.0


File:output.n-1.n-8


File:output.0


File:output.n-1


File:output.n-1


File:output.0

..

..

.....


- 91 -

6.6 FTL Syntax

You can select "single line mode" or "multi-line mode" for FTL. Basically, you indicate only one FTL

command in "single line mode" and two or more commands in "multi-line mode". FTL sentense begins

with # character. It is necessary to put # in the first column. If # is not put in the first column, it is not

recognize as FTL sentence.

Syntax of each mode is as follows.

Single line mode

Multi-line mode

Items in FTL command are describes as follows.

files

Specify input / output file name to transfer

Files need to be general files and symbolic links

Files do NOT need to be directory, device, socket or FIFO

Some meta characters are available to use.

Specify relative path from a directory where a batch job is submitted.

Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home

or /data. On FTL variable, please refer to 6.6.12.5 FTL variable.

RANK-LIST

Specify destination of input files and destination of output files by RANK-LIST. For more

information on RANK-LIST, please refer to 6.6.12.6 RANK-LIST

directory

Specify destination directory.

Meta characters are not available.


Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home

or /data. On FTL variable, please refer to 6.6.12.5 FTL variable.

#FTL command: [RANK-LIST[@directory]:] files[, files... ]

#<FTL command>

# [RANK-LIST[@directory]:] files[, files... ]

# [RANK-LIST[@directory]:] files[, files... ]

#</FTL command>


- 92 -

Restrictions on FTL are followings.

Files which are not contained in /home or /data cannot be specified in FTL command.

Blank character cannot be included in file name and directory name.

Batch job's standard / error output files (extension: .jms) and swap files (extension: .swp) are not

transferred.

Multi-line mode cannot be nested.

Use the ftlchk command to check FTL syntax and existence of file. For more information, please

see ftlchk --man.

Example:

[username@ricc1 ~]$ ftlchk go.sh

=====================

FTL Analysis Result

=====================

Line Type TargetRank Stat SourcePath[Login] DestinationDir[Calc]

11 BEFORE 0-15 o $CWD/a.out --> $CWD


- 93 -

6.6.1 FTL Syntax (transfer input file)

Transfer input files from Login Server to computing nodes by following syntax.

More than one command can be specified in a script file.

Use a comma as separator to specify multiple files to transfer.

Destination of files is determined by set of RANK-LIST and computing node's directory.

RANK-LIST and directory are optional. If "RANK-LIST and Computing node's directory" is not

specified, specified input files are transferred to computing nodes in the same directory

configuration of Login Server.

6.6.1.1 Single line mode (#BEFORE)

6.6.1.2 Multi-line mode (#<BEFORE> - #</BEFORE>)

#BEFORE: [RANK-LIST[@computing node's directory]:] input file [...]

#<BEFORE>

#[RANK-LIST[@computing node's directory]:] input file [...]

#</BEFORE>


- 94 -

6.6.2 FTL Syntax (transfer input directory)

Transfer input directories from Login Server to computing nodes by following syntax.


Use a comma as separator to specify multiple directories to transfer.

Destination of input directories is determined by set of RANK-LIST and computing node's

directory.

RANK-LIST and computing node's directory are optional. If "RANK-LIST and Computing node's

directory" is not specified, specified input directories are transferred to computing nodes in the

same directory configuration of Login Server.

6.6.2.1 Single line mode (#BEFORE_R)

6.6.2.2 Multi-line mode(#<BEFORE_R> - #</BEFORE_R>)

#BEFORE_R: [RANK-LIST[@computing node's directory]:] input directory [... ]

#<BEFORE_R>

#[RANK-LIST[@computing node's directory]:] input directory [...]

#</BEFORE_R>


- 95 -

6.6.3 FTL Syntax (transfer output directory)

Transfer output files from computing nodes to Login Server by following syntax.


Use a comma as separator to specify multiple files to transfer.

Destination of output files is determined by set of RANK-LIST and Login Server's directory.

RANK-LIST and Login Server's directory are optional. If "RANK-LIST and Login Server's

directory" is not specified, specified output files are transferred to Login Server in the same

directory configuration of computing nodes.

6.6.3.1 Single line mode (#AFTER)

6.6.3.2 Multi-line mode (#<AFTER> - #</AFTER>)

#AFTER: [RANK-LIST[@Login Server's directory]:] output file [...]

#<AFTER>

#[RANK-LIST[@Login Server's directory]:] output file [...]

#</AFTER>


- 96 -

6.6.4 FTL Syntax (avoid overwrite output file)

Add rank number (first rank number in a node) to output files transferred by the AFTER command using

following syntax. This avoid overwriting output files when output files on computing nodes are the same

name.

This command can be specified only once in a script file.

If this command is not specified, "off" is set for the flag.

This is valid for files (collected from 2 or more computing nodes) specified by the AFTER

command.

There is no multi-line mode.

6.6.4.1 Single line mode (#FTL_SUFFIX)

Item Value Meaning

flag

on Add rank number to output files

off Not add rank number to output files

Table 6-1 FTL_SUFFIX flag

#FTL_SUFFIX: flag


- 97 -

6.6.5 FTL Syntax (FTL basic directory)

Specify FTL basic directory by following syntax.


Only one FTL basic directory can be specified.

There is no multi-line mode for this command


Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home or

/data. On FTL variable, please refer to 6.6.12.5 FTL variable.

6.6.5.1 Single line mode (#FTLDIR)

(note) When FTLDIR is used, unnecessary files may be transferred. Also, existence of files is checked

after job execution even though there is no file to be transferred. For large scale parallel jobs, as the

costs may be high, please use BEFORE and AFTER instead of FTLDIR.

#FTLDIR: FTL basic directory


- 98 -

6.6.6 FTL Syntax (File collect type of FTL basic directory)

Specify file collect type for output files to transfer from FTL basic directory on computing nodes after the

job finishes.


If this command is not specified, "new" is set for File collect type.

There is no multi-line mode for this command

6.6.6.1 Single line mode (#FTL_COLLECT_TYPE)

Item Value Meaning

file collect type

new Collect files which are not transferred at the start of job

mtime Collect updated files only while the job is running

Table 6-2 FTL_COLLECT_TYPE

#FTL_COLLECT_TYPE: file collect type


- 99 -

6.6.7 FTL Syntax (Avoid adding rank number of FTL basic directory)

Avoid adding rank number (first rank number in a node) to output files transferred by the FTLDIR

command using following syntax. Output files will be overwrote when output files on computing nodes

are the same name.



This is valid for files specified by the FTLDIR command.


6.6.7.1 Single line mode (#FTL_NO_RANK)

Item Value Meaning

flag

on Not Add rank number to output files

off Add rank number to output files

Table 6-3 FTL_NO_RANK

#FTL_NO_RANK: flag


- 100 -

6.6.8 FTL Syntax (Rank Format)

Specify a number of digits of rank number by following syntax.

Specify a number of digits to add RANK-LIST to files with specified number of digits. This is valid

for FTL variable (*１), FTLDIR command and AFTER command (when FTL_SUFFIX is set to

on).

This command is valid for FTL commands (FTL variable, etc.) which have been specified before

this command.

Specify a number from 0 to 9 as a number of digits.

There is no multi-line mode for this command.

(*1): On FTL variable, please refer to 6.6.12.5 FTL variable.

6.6.8.1 Single line mode (#FTL_RANK_FORMAT)

Item Value Meaning

number of

digits

0-9

0: no use of RANK FORMAT

a number of digits of RANK-LIST

Table 6-4 FTL_RANK_FORMAT

RANK-LIST Number of digits:

none specified

Number of digits:

1

Number of digits:

2

Number of digits:

3

1 1 1 01 001

10 10 10 10 010

100 100 100 100 100

Table 6-5 RANK_FORMAT example

#FTL_RANK_FORMAT: number of digits


- 101 -

6.6.9 FTL Syntax (make directory)

Make directories on all computing nodes running a job before the job starts by following syntax.


Use a comma as separator to specify multiple directories to make.

There is no multi-line mode for this command.


Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home or

/data. On FTL variable, please refer to 6.6.12.5 FTL variable.

6.6.9.1 Single line mode (#FTL_MAKE_DIR)

#FTL_MAKE_DIR: directory [...]


- 102 -

6.6.10 FTL Syntax (statistic information output)

Output statistic information (file transfer time, number of files, file size) of files transferred before and

after the job execution to standard output at the end of job by following syntax.




6.6.10.1 Single line mode (#FTL_STAT)

Item Value Meaning

flag

off None of statistic information output

normal Normal mode.

Output statistic information to standard output.

detail

Detail mode.

In addition to Normal mode, output statistic information

of files transferred to each rank.

Table 6-6 FTL_STAT flag

#FTL_STAT: flag


- 103 -

6.6.10.2 Output format of statistic information

Output items

Item Meaning

ELAPSE(s) Elapsed time of file transfer (unit: second)

FILE_NUM Total number of transferred files

FILE_SIZE(KB) Total size of transferred files (unit: KB)

Table 6-7 Output item of statistic information

Normal mode format

Detail mode format

#=========== FTL STATISTICS INFORMATION =============#

---------------------- BEFORE ----------------------------------

ELAPSE(s) FILE_NUM FILE_SIZE(KB)

------------------------------------------------------------------------

TOTAL 60 30 16384

---------------------- AFTER ----------------------------------


------------------------------------------------------------------------

TOTAL 10 30 16384

#=========================================================#

#=========== FTL STATISTICS INFORMATION =============#

---------------------- BEFORE ----------------------------------


------------------------------------------------------------------------

TOTAL 60 10 100

RANK: 0-7 60 10 100

---------------------- AFTER ----------------------------------


------------------------------------------------------------------------

TOTAL 60 10 10

RANK: 0-7 60 5 5

RANK: 8-15 60 5 5

#=========================================================#


- 104 -

6.6.11 FTL Syntax (output transferred file information)

Output information of files transferred before the job execution and files created while the job is running

to standard output at the end of job by following syntax.




Directory with no files is not displayed.

6.6.11.1 Single line mode (#FTL_INFO)

Item Value Meaning

flag

off None of file information output

before Output information of files transferred before the job

starts.

after Output information of files (including transferred files

before the job starts) created while the job is running.

all

Output information of files transferred before the job

starts and files (including transferred files before the

job starts) created while the job is running.

Table 6-8 FTL_INFO flag

#FTL_INFO: flag


- 105 -

6.6.11.2 Output format of transferred file information

Output items

Item Meaning

TIME Access time of file (Month Date HH:MM)

SIZE(KB) File size (unit: KB)

FILE_NAME File name

Table 6-9 Output items of transferred file information

Output format

Output file information of each rank. Output format with flag "all" is following. With flag "before"

output is only part of (*1), with flag "after" output is only part of (*2).

#=============== FTL FILE INFORMATION ===============#

------------------- BEFORE ---------------------

[RANK: 0-7]

TIME SIZE(KB) FILE_NAME

--------------------------------------------------------

Jul 16 10:41 14246 /home/username/job/a.out

Jul 24 10:20 361 /home/username/job/input.1

[RANK: 8-16]


--------------------------------------------------------



------------------- AFTER ----------------------

[RANK: 0-7]


---------------------------------------------------------



Jul 24 10:25 361 /home/username/job/output

[RANK: 8-16]


---------------------------------------------------------



#=======================================================#

(*1)

(*2)


- 106 -

6.6.12 FTL Syntax (others)

6.6.12.1 Comment

Characters after an exclamation mark ! are regards as a comment.

Example

6.6.12.2 Blank line

Blank lines and lines of only # are ignored.

Example

6.6.12.3 Special character

Comma (,), colon (:), equal (=) and exclamation mark (!) are special characters in FTL commands.

Put a backslash before a special character when the special character is included in file name or

directory name.

記述例

#<BEFORE>

#! this line is comment <-- This line is commnet

# a.out ! b.out <-- a.out is transferred, but b.out is not.

#</BEFORE>

#<BEFORE>

# <-- This line is ignored.

<-- This line is igonored.

# a.out

#</BEFORE>

#<BEFORE>

# a¥:b.out <-- transfer a:b.out

#</BEFORE>


- 107 -

6.6.12.4 Meta character

Following meta characters are available in file name and directory name. However, they are not

available in directory portion of file name.

Meta character Meaning

* Match any (zero or more) characters

? Match any single character

Table 6-10 Meta character list

Example

6.6.12.5 FTL variable

Following FTL variables are available in file name and directory name. However, they are not available

in directory portion of file name and the FTL_MAKE_DIR command.

Using FTL variables, input / output file transfer commands for MPI jobs can be specified easily.

Variable Meaning

$MJS_HOME Home directory path (/home/username)

$MJS_DATA Data directory path (/data/username)

$MJS_CWD directory path where a job is submitted

$MJS_REQID REQUEST-ID.

This is available in file name in the AFTER command and directory name.

$MJS_REQNAME REQUEST-NAME

This is available in file name and directory name.

$MJS_BULKINDEX Bulk Index ID

This is available in file name and directory name.

$MPI_RANK MPI rank (from 0 to number of processes -1)

$XPF_RANK XPF processer identification number (from 1 to number of processes)

Table 6-11 Environment variable list

#<BEFORE>

# input.? input.0, input.1, input3 are transferred.

# a* a.out, a.1, a.2 are transferred

# bin/exe*/a.out Meta characters are not available for directory portion

#</BEFORE>


- 108 -

Example 1

Example 2

With above BEFORE command, if an MPI program of 16 processes is executed, input files are

transferred to 16 processes. Files (/home/username/input.0 – input.7) are transferred to the

first computing node and files (/home/username/input.8 – input.15) are transferred to the

second computing node as indicated in Fig. 6-7 Example of input file transfer with FTL variable.

Fig. 6-7 Example of input file transfer with FTL variable

Login Server

shared area

computing node

rank: 0

rank: 7

computing node

rank: 8

rank: 15

Dir:/home/username

File:input.0

local area

Dir:/home/username

File:input.7

Dir:/home/username

File:input.8

local area

Dir:/home/username

File:input.15

Dir:/home/username

File:input.0

Dir:/home/username

File:input.7

Dir:/home/username

File:input.15

Dir:/home/username

File:input.8

...

...

...

...

..

.. #<AFTER>

# 0@$MJS_CWD: log/output log/output file on MPI master node is transferred to

the directory where the job is submitted

#</AFTER>

#BEFORE: input.$MPI_RANK


- 109 -

6.6.12.6 RANK-LIST

Specify destination of input file and source of output file by following descriptions.

If ranks are specified redundantly, they are processed as specified once.

If nonexistent ranks are specified, no file is transferred to the ranks.

If existent ranks and nonexistent ranks are specified at the same time, files are transferred to the

existent ranks but not to the nonexistent ranks.

Item Format Meaning

I 1 File transfer command to rank1 of computing node.

II 1–3 File transfer command to rank1, rank2, rank3 of computing nodes.

III 1,3 File transfer command to rank1 and rank3 of computing nodes.

IV

1-3,5,7

(combination of

Item II, III in this table)

File transfer command to ranks rank1, rank2, rank3, rank5 and

rank 7 of computing nodes.

V * File transfer command to all computing nodes assigned for a job.

VI ALL File transfer command to all computing nodes assigned for a job.

VII MASTER File transfer command to a master node (rank 0) of a job.

Table 6-12 RANK-LIST format

Ranges of RANK-LIST for each job type are following.

Job type Range of RANK-LIST

Serial job 0

MPI parallel job 0 – (number of processes -1)

OpenMP / auto parallel job 0

Hybrid job 0 – (number of processes -1)

Table 6-13 Range of RANK-LIST for Job type


- 110 -

6.7 FTL generating tool : ftlgen

ftlgen generates FTL command and job submit option line, interactively.

Option Meaning

-chk execute ftlchk command after create FTL command line

-o <filename> Output shell script to file

[example] execute ftlgen command (ftlgen command is available on tab completion)

ftlgen <option>

[username@ricc1:~] ftlgen

MJS: Project id: G00001 ← Specified project-ID

MJS: Number of process(range: 1-8192, default: 1): 256 ← Specified procces

MJS: Number of thread(range: 1-8, default: 1): 1 ← Specified thread

MJS: Merge stderr to stdout ?('y' or 'n', default: 'y'): y

← Marge standard output/error output

MJS: Run on current working directory ?('y' or 'n', default: 'y'): y

← Specified Job execution directory

MJS: Other qsub options: -time 1:00:00 –mem 1.2GB

← Specified other qsub option

MJS: Executable module and command path: a.out

← Specified execution module

FTL(PRE): Are there any input file ?('y' or 'n'): y ← If input file exists

FTL(PRE): Input file or directory: input ← Specified input file

FTL(PRE): Destination rank number(0-255): * ← Specified rank number

FTL(PRE): Transfered Directory: [ENTER](skip) ← Specified destination

directory(If optional, specified Job execution directory)

FTL(PRE): Enter more ?('y' or 'n'): n ← If other input file exists

FTL(POST): Are there any output file ?('y' or 'n'): y ← If output file exists

FTL(POST): Output file: output.log ← Specified transfer output file

FTL(POST): Source rank number(0-255): * ← Specified rank number

FTL(POST): Transfered Directory: outputdir ← Specified destination

directory

FTL(POST): Enter more ?('y' or 'n'): n ← If other input file exists

#!/bin/sh ← output result

#--- qsub options ---#

#MJS: -project V10002


- 111 -

#MJS: -proc 256

#MJS: -thread 1

#MJS: -eo

#MJS: -cwd

#MJS: -time 1:00:00 –mem 1.2GB

#MJS: -compiler fj

#MJS: -parallel fjmpi

#--- FTL file information ---#

#BEFORE:*: $MJS_CWD/input

#AFTER:*@$MJS_CWD/outputdir: $MJS_CWD/output.log

#BEFORE:*: $MJS_CWD/a.out

#--- Job execution ---#

mpirun a.out


- 112 -

7. Development Environment

7.1 Endian conversion

7.1.1 Outline of endian

Endian is a method of how to store a number which consists of multiple bytes into memory. For

example, when number 1234 is stored, a method storing 12 into 1st byte and 34 into 2nd byte is called

Big Endian. On the other hand, a method storing 34 into 1st byte and 12 into 2nd byte is called Little

Endian.

7.1.2 Endian type of RSCC and RICC

RICC consists of little endian computers. However, RSCC (RIKEN Super Combined Cluster) consists

of big endian computers and little endian computers and big endian was used for unformatted

WRITE / READ statement. Therefore, please pay attention when reading Fortran's unformatted

output files (big endian) created on RSCC.

System Endian

RSCC Big endian

RICC Little endian

Table 7-1 Endian type of RSCC and RICC

7.1.3 Endian type

7.1.3.1 Fujitsu compiler

Specify runtime option –WI, -T to read or write big endian data by Fujitsu compiler. With –T option,

logical type data, integer type data and IEEE floating-point data are converted from big endian to little

endian in unformatted I/O statements.

example 1) Convert unit number 10 to little endian.

[username@ricc1:~] srun ./serial.out –Wl,-T10

example 2) Convert all unit numbers to little endian.

[username@ricc1:~] srun ./serial.out –Wl,-T


- 113 -

7.1.3.2 Intel compiler

Specify environment variable F_UFMTENDIAN to read or write big endian data by Intel compiler.

example 1) Convert unit number 10 to little endian.

[username@ricc1:~] export F_UFMTENDIAN=10

[username@ricc1:~] srun ./serial.out

example 2) Convert all unit numbers to little endian.

[username@ricc1:~] export F_UFMTENDIAN=big

[username@ricc1:~] srun ./serial.out

7.2 Debugger

The debugger enables the user to run a program under control of the debugger to verify processing

logic.

The following types of operations can be performed for a serial program and an MPI program of Fortran

and C/C++, and an XPFortran prgram.

The profiler can output following information.

Application execution control

Setting of program execution stop position

Expression and variable evaluation and display

Use of calling stack

7.2.1 The preparation to use debugger

The following two compilation options must be specified when you compile and link programs to debug.

-g

Produce debugging information. If this option is omitted, you cannot diplay the value of variable

and so on.

-Ktl_trt

Link the tool runtime library. This option enables to use debug, profiling and MPI trace functions

at execution of a program. This option is effective by default.


- 114 -

7.2.2 Start debugger

Use the fdb command (CUI) to launch debugger. For more information on the fdb command, please

refer to man command of fdb. For information on the xfdb(GUI), please refer to “Debugger User’s

Guide”.

[username@ricc1:~] f77 –pc –g –Ktl_trt sample.f

[username@ricc1:~] srun fdb a.out

FDB [Fujitsu Debugger for C/C++ and Fortran] Version 7.0MT/OMP

Please wait to analyze the DEBUG information.

fdb*

Start debugger

fdb* list

5 double INTEGER i

6 read(*,*) i

7 print *,' ** fortran77 output=',i

8 go to (10,20,30) , i

9 print *,' i=0',i

10 go to 90

12 10 print *,' i=',i

13 go to 90

14

fdb* break 10

Insert break point

#1 0x100000ad0 (MAIN__ + 0x118) at line 10 in /home/username/sample.f

fdb* show break

Num Address Specify Stop? Where

#1 0x0000000100000ad0 Enable Yes (MAIN__ + 0x118) at line 10 in

/home/username/sample.f

fdb* p i

insert print cmd (variable i)

Result = 123

fdb* c

Continue program: a.out

The program: a.out terminated.


- 115 -

8. Tuning

8.1 Tuning overview

Modifying program to finish execution faster is called tuning. A series of work of collecting tuning

information, performance evaluation/analysis, modifying source code and performance measurement

etc. is done for the tuning of the program.

At first, find the part where a lot of execution time is spent in the program. Generally, a big tuning effect

is achieved by speeding up the part.

There are following methods to get execuction time information.

Call the subroutines which get time information in programs

Use the option of a batch job which collects statistics information

Use the profiler

Fig 8-1 Tuinig overview

8.2 Time measurement

8.2.1 Fortran program

The CPU_TIME sub routine returns CPU processing time by the second.

example 1) Invoke the CPU_TIME sub routine

Collecting tuning information

Performance evaluation/analysis

Tuning

（change compile option,

modify source code）

iteration


- 116 -

real(kind=8) start_time, stop_time

...

call cpu_time(start_time)

...portion to be measured

call cpu_time(stop_time)

write(6,*) "time = ", stop_time – start_time

8.2.2 C program

The clock function returns approximate value of processing time.

example 1) Invoke the clock function

#include <time.h>

clock_t start_time, stop_time;

start_time = clock();

...portion to be measured

stop_time = clock();

printf("time = %10.30f¥n", (double)(stop_time - start_time) /

CLOCKS_PER_SEC;

8.2.3 MPI program

Use the MPI_Wtime function to measure elapsed time. Invoke the MPI_Wtime function before and

after portion to be measured. Time difference between them is the elapsed time.

example 1) Invoke the MPI_Wtime function (Fortran)

real(kind=8) start_time, stop_time

...

call mpi_barrier(mpi_comm_world, ierr)

start_time = mpi_wtime()

....portion to be measured

call mpi_barrier(mpi_comm_world, ierr)

stop_time = mpi_wtime()

if (myrank .eq. 0) then

write(6,*) “time = “, stop_time – start_time

end if

example 2) Invoke the MPI_Wtime function (C)


- 117 -

double start_time, stop_time;

...

MPI_Barrier(MPI_COMM_WORLD, ierr);

start_time = MPI_Wtime();

....portion to be measured

MPI_Barrier(MPI_COMM_WORLD, ierr);

stop_time = MPI_Wtime();

if (myrank == 0) {

printf(“time = %lf”, stop_time – start_time);

}

8.2.4 System resource statistics information of batch job

By specify “-oi” or “-OI” option when submitting a jobs, the summary information and resource

information per each computing node is written to standard output file.

[username@ricc1:~] cat go.sh

#!/bin/sh

#MJS: -proc 8

#MJS: -cwd

#MJS: –eo

#MJS: -oi

#MJS: -time 1:00:00

#BEFORE: a.out

mpirun ./a.out


- 118 -

example 1) Used resource information (the standard output of batch request)

[username@ricc1:~] cat go.sh.o2733417.jms

(略)

Allocated Resource <- allocate resource of entire job

Virtual Nodes : 8 Node

Before Free Memory

Total Large Page Memory : 0 Mbyte

Total Normal Page Memory : 10737418240 Byte

After Free Memory



CPUs : 8 CPU

Inter-Node Barrier : 0 Unit

Execmode : CHIP_SHare

Elapse time limit : 3600.000 sec

Used Resource <- used resource of entire job

Total System CPU Time : 463 msec

Total User CPU Time : 470516 msec



CPUs : 8 CPU

Inter-Node Barrier : 0 Unit

---------------------------------------

Virtual Node Information : NODE : mpc0448 <- computing node

Archi Information : PG

Allocated Resource <- allocate resource per process

Before Free Memory

Large Page Memory : 0 Mbyte

Normal Page Memory : 1342177280 Byte

After Free Memory



Free memory time : 0 msec

CPUs : 1 CPU

CPU time limit : UNLIMITED

Used Resource <- used resource per process



CPUs : 1 CPU

CPU Time System time User time

Max CPU Time : 236 msec 59426 msec

Total CPU Time : 236 msec 59426 msec

SBID ChipID CPUID System time User time

0 0 0 : 236 msec 59426 msec


- 119 -

8.3 Program development support tool

8.3.1 Fujitsu compiler

8.3.1.1 Profiling function

The profiler is available for profiling function. The profiler is a tool for collecting information on

application performance. To improve application performance, it is a usual and effective method to find

the location where much execution time is consumed and speed it up.

The profiler can output following information.

Time statistic information

Elapsed time, breakdown of user CPU time / system CPU time , etc.

Interprocess communication information

Time of interprocess communication and waiting to synchronize by MPI and XPFortran

MPI library elapsed time information

Elapsed time to execute MPI library

8.3.1.2 Collect profiling data

Use the srun/mpirun/xpfrun command with option –prof or –profopt to collect profiling data.

With these options, the srun/mpirun/xpfrun commands invoke the fpcoll command internally.

For more information on the fpcoll command, please refer to Profiler User's Guide. Use –profopt

option to specify argument to the fpcoll command.

In Interactive jobs, profiling data collection and profiler information output can be performed at the same

time. In batch jobs, since Massively Parallel Cluster does not have shared area, profiling data collection

and profiler information output cannot be always performed at the same time. In that case, it is

necessary to perform profiling data collection and profiler information output separately.


- 120 -

example 1) Execute serial job by interactive job (-prof option)

[username@ricc1 ~]$ srun -prof ./stream

(execution result is skipped)

Fujitsu Performance Profiler Version 3.1

Measured time : Thu Jul 30 01:11:06 2009

CPU frequency : Process 0 2933 (MHz)

Type of program : SERIAL

Average at sampling interval : 11.0 (ms)

Measured range : All ranges

--------------------------------------------------------------

______________________________________________________________

Time statistics

Elapsed(s) User(s) System(s)

---------------------------------------------

28.4038 28.2500 0.1100 Application

---------------------------------------------

28.4038 28.2500 0.1100 Process 0

_________________________________________________________________

Procedures profile

**************************************************************

Application - procedures

**************************************************************

Cost % Start End

--------------------------------------------

2569 100.0000 -- -- Application

--------------------------------------------

2559 99.6107 127 285 main

10 0.3893 337 399 checkSTREAMresults

____________________________________________________________________

Lines profile

*****************************************************************

Application - lines

*****************************************************************

Cost % Line

-----------------------------------

2569 100.0000 -- Application

-----------------------------------

629 24.4842 251 main


- 121 -

example 2) Execute MPI parallel job by interactive job (-prof option)


Measured time : Thu Jul 30 01:15:59 2009

CPU frequency : Process 0 2933 (MHz)

Type of program : MPI



-------------------------------------------------------------

_____________________________________________________________

Time statistics


---------------------------------------------

1.0764 0.9159 0.0800 Application

---------------------------------------------

1.0764 0.9159 0.0800 Process 0

_________________________________________________________________

Communication profile

Elapsed(s) Communication(s) %

--------------------------------------------

1.0764 0.7343 68.2220 Application

--------------------------------------------

1.0764 0.7343 68.2220 Process 0

Send + Put

+--------------------------------------------------+

| ##########################| 52 % Process 0

+--------------------------------------------------+

Percentage of time waiting for a send and put

Received + Get

+--------------------------------------------------+

| ########| 16 % Process 0

+--------------------------------------------------+

Percentage of time waiting for a received and get

_________________________________________________________________

Procedures profile

**************************************************************

Application - procedures

**************************************************************

Cost % Start End

--------------------------------------------

83 100.0000 -- -- Application

--------------------------------------------

51 61.4458 -- -- __GI_memcpy

9 10.8434 369 397 IMB_ass_buf

7 8.4337 -- -- memcpy_nts_asm64a

3 3.6145 -- -- _LowLevel_MutexUnlock

2 2.4096 -- -- _LowLevel_Exchange4

1 1.2048 -- -- intra_Reduce

1 1.2048 -- -- mpigfc_

1 1.2048 -- -- PMPI_Sendrecv

1 1.2048 -- -- _GMP_StopSendTimer

1 1.2048 -- -- _GMP_Send

_________________________________________________________________

Loops profile

**************************************************************

Application - loops

**************************************************************


- 122 -

example 3) MPI parallel job by batch job

1. Specify the fpcoll command's option as the –profopt option to collect profiling date. Items of

profiling date can be specified by the -I option. Profiling date is created in a directory specified by

the –d option.

When executing an application on Massively Parallel Cluster, transfer profiling data to Login Server

by FTL.

$ cat go.sh

#!/bin/sh

#------- qsub option -------#

#MJS: -pc

#MJS: -proc 64

#MJS: -eo

#MJS: -time 10:00

#MJS: -cwd

#------- FTL command -------#

#BEFORE: a.out

#AFTER: ALL@${MJS_REQID}_prof:profile-data/*


mpirun -profopt "-C -Icpu,mpi -d profile-data" ./a.out

2. Use the fprof command to display profiler information. Items of profiling date to display can be

specified by the -I option. Specify the directory of profiling data by the -d option.

$ fprof -Impi -d 1417379.jms_prof


- 123 -

--------------------------------------------------------------------


Measured time : Wed Sep 2 15:50:18 2009

CPU frequency : Process 0 - 63 2933 (MHz)

Type of program : MPI



---------------------------------------------------------------------

_____________________________________________________________________

Time statistics


---------------------------------------------

59.3825 3701.1963 16.8100 Application

---------------------------------------------

59.3825 58.6720 0.1700 Process 14

59.3759 55.9310 0.3200 Process 25

59.3754 57.2190 0.4700 Process 36

59.3744 58.6740 0.1500 Process 50

59.3743 58.6620 0.1700 Process 13

59.3710 56.6740 0.2800 Process 42

59.3691 55.9420 0.4100 Process 24

59.3690 57.7630 0.2100 Process 34

59.3689 57.6010 0.2200 Process 18

59.3680 57.7970 0.3200 Process 8

59.3665 58.6170 0.2100 Process 48

59.3649 55.7580 0.4000 Process 27

59.3621 57.5100 0.2800 Process 32

59.3618 57.5390 0.2900 Process 16

59.3611 58.5850 0.2500 Process 12

59.3609 58.5590 0.1700 Process 47

_____________________________________________________________________

MPI libraries profile - based on the user procedure.

*********************************************************************

Application - MPI libraries

*********************************************************************

Elapsed(s) % Call to

---------------------------------------

59.3825 ---.---- ------------ Application

---------------------------------------

3.2200 5.4225 45312 jacobi_ ( 199 - 250)

2.6808 4.5144 226560 sendp1_ ( 577 - 629)

1.9381 3.2638 226560 sendp2_ ( 521 - 573)

0.7023 1.1827 226560 sendp3_ ( 465 - 517)

0.3752 0.6318 512 initcomm_ ( 254 - 332)

0.0840 0.1415 576 MAIN__ ( 38 - 142)

0.0000 0.0000 384 initmax_ ( 336 - 440)


- 124 -

8.4 Network topology

Network topology for Massively Parallel Cluster, Multi-purpose Parallel Cluster and MDGRAPE-3

Cluster is fat-tree topology, which consists of 60 leaf switches connecting each computing node and 2

spine switches connecting each leaf switch (refer to Fig. 8-2 Network topology outline diagram).

Each leaf switch has 24 ports. 20 of them are connected to computing nodes and 4 of them are

connected to spine switches. Therefore, when 20 computing nodes connected to the same leaf switch

are concurrently communicating computing nodes connected to the other switches, communication

data of the 20 computing nodes is needed to transfer using 4 InfiniBand cables, and network bandwidth

can be limited up to 1/5.

Fig. 8-2 Network topology outline diagram

Job scheduler of RICC minimize the number of leaf switches connecting computing node, and allocates

the parallel job. However, allocated computing node might be distributed to more reef switches in a

situation where system usage ratio is high because computing nodes allocated to a job executed next

depend on the jobs which finished previously

This difference of allocateion computing nodes may not have an impact on normal job execution but it

may have an impact on job execution of high communication load such as network communication

benchmark test.

Spine switch

Leaf switch Leaf switch

Leaf switch

Leaf switch

Spine switch

Compute node x 20 Compute node x 20

Compute node x 20

Compute node x 20

・・・

・・・・・・・・・・・・ x 20

x 120 x 120

x 2

x 2

x 2 x 2 x 2 x 2 x 2 x 2

x 20 x 20 x 20

InfiniBand


- 125 -

9. How to use Archive system

To transfer files to Archive system over network, use the file transfer special commnads (pftp, hsi

and htar).

* pftp is an extended command of ordinary ftp. A usage method of pftp is the same as ftp.

* hsi is an extended command of ordinary pftp. It is possible to transfer the directory.

* htar is an extended command of ordinary tar. A usage method of htar is the same as tar.

* Size of a file is restricted to 1.22TB on pftp, hsi and htar. When transferring files to Archive system

by htar, all transferred files are archived as one htar format file. Therefore, total size of transferred

files must be less than 1.22TB.

9.1 Configuration

If you use the hsi and the htar for the first time in RICC or you use them after RICC password is

updated, use the arc_keytab command to gererate Keytab file for authentication.

You don't need to generate Keytab file from the next time.

Example:

[username@ricc:~] arc_keytab

Getting a KEYTAB file for user: username

Please wait ....

...............

A KEYTAB file was generated successfully.

As above example, if "successfully" is displayed, configuration completes.


- 126 -

9.2 pftp

9.2.1 Get file

9.2.1.1 Login

[username@ricc:~] pftp arc

Using /opt/hpss/etc/HPSS.conf

Connected to arc.

220 hpcore FTP server (HPSS 7.1 PFTPD V1.1.1 Tue Jan 19 07:16:29

JST 2010) ready.

Parallel stripe width set to (1).

Name (arc:username): Enter return key

331 Password required for username.

Password:********* Enter RICC password

230 User username logged in as [email protected]

Remote system type is UNIX.

Using binary mode to transfer files.

ftp> Login completed

9.2.1.2 Transfer file

ftp>pget file_name Enter file name

remote: file_name local: file_name

200 Command Complete (4104704, "file_name", 0, 1, 4194304, 0).

200 Command Complete.

150 Transfer starting.

226 Transfer Complete.(moved = 4104704).

4104704 bytes received in 0.1400 seconds (27.961 MBytes/sec)

200 Command Complete.

9.2.1.3 Confirm transferred file and Logout

ftp> !ls –la Confirm transferred files

-rw-r--r-- 1 username groupname 4104704 Jul 28 20:40 file_name

ftp> bye Logout

221 Goodbye.

[username@ricc:~]


- 127 -

9.3 hsi

9.3.1 Get file

9.3.1.1 Login

[username@ricc:~] hsi

Username: username UID: UID Acct: UID(UID) Copies: 1 Firewall: off

[hsi.3.5.3 Wed Jan 20 07:32:04 JST 2010]

A:[RICC]/home/username-> Login completed

9.3.1.2 Get file

A:[RICC]/home/username-> get -R testdir Enter file name

get '/home/username/testdir/testfile1' : /home/username/testdir/testfile1'

(2009/07/28 20:57:57 1048576 bytes, 7050.6 KBS )

get '/home/username/testdir/testfile2' :'/home/username/testdir/testfile2'

(2009/07/28 20:57:58 1048576 bytes, 10074.9 KBS )

get '/home/username/testdir/testfile3' :'/home/username/testdir/testfile3'

(2009/07/28 20:57:58 1048576 bytes, 17090.1 KBS )

9.3.1.3 Confirm got file and Logout

A:[RICC]/home/username-> !ls -l testdir Confirm got file

total 6144

-rw------- 1 username groupname 1048576 Jul 28 21:03 testfile1



A:[RICC]/home/username-> quit Logout

[username@ricc:~]


- 128 -

9.4 htar

Followings are the restrictions of the htar command.

Size of a member file is up to 68,719,476,735(64G-1) byte.

Number of member files in a tar file is up to 1 milion.

Directory name is up to 154 characters, file name is up to 99 characters when path name of a

member file is divided into directory name / file name.

(Example) Path name : /home/username/dir1/dir2/test.data

Directory name: /home/username/dir1/dir2

File name : test.data

Link name of a symbolic link is up to 99 characters.

9.4.1 Confirm put file

Confirm contents of tar file with –tf option.

[username@ricc:~] htar -tf test.tar

......

HTAR: -rw-r--r-- username/groupname 1252 2004-06-18 09:45 work/test1

HTAR: -rw-r--r-- username/groupname 3390 2004-03-04 11:56 work /test2

HTAR: -rw-r--r-- username/groupname 20932 2004-11-09 17:49 work /test3

HTAR: HTAR SUCCESSFUL

To confirm tar file name, login with the hsi command and then use the ls command.

9.4.2 Get file

A usage method is the same as the tar command. Extract files with –xf option.

[username@ricc:~] htar -xf test.tar

HTAR: HTAR SUCCESSFUL

Files are extracted in the current directory.


- 129 -

10. RICC Portal

10.1 RICC Portal

10.1.1 URL to access

On RICC Portal, users can operate files on Login Server, compile, link programs and submit jobs for all

computing server system using web interface.

Access the following URL to login RICC Portal

10.1.2 How to Login

In the following login window, enter RICC user account, RICC password and then click LOGIN button.

After authentication completed, RICC Portal is available.



2. Select Client Certification

Click [OK]


- 130 -

Fig. 10-1 RICC login window

On usage of RICC Portal, please click help icons of functions or refer to online manual on RICC Portal.

11. Manual

Access RICC Portal from Web browser. On accessing RICC Portal, please refer to 10 RICC Portal.

After login, click links in [Documentation] in the left of menu to refer online manuals.

Available manuals are listed at next section.


2. Enter RICC password

3. Click [LOGIN]


- 131 -

Fig. 11-1 RICC Portal online manual window

1. Click [MAIN]

2. Click

[Documentation]

[Product Manual]

-> Product manuals

are available to refer


- 132 -

11.1 Product manual

11.1.1 Common

RICC Portal User's Guide

11.1.2 Language

Fortran User's Guide

Fortran Language Reference

Fortran Compiler Messages

Fortran Runtime Messages

C User's Guide

C++ User's Guide

C++ Compiler Feature

XPFortran User's Guide

MPI User's Guide

11.1.3 Programming Tools

Debugger User's Guide

MPI Tracer User's Guide

Programming Workbench User's Guide

Profiler User's Guide

11.1.4 Scientific Subroutine Library II (SSL II)

List of Subroutines

How to use SSL II

How to link-edit SSL II

SSLII User's Guide

SSLII Extended Capabilities User's Guide

SSLII Extended Capabilities User's Guide II

How to compile (Thread-Parallel Capabilities)

Thread-Parallel Capabilities User's Guide

How to compile(C language)

SSLII User's Guide(C language)

How to use C-SSL II

How to compile Thread-Parallel Capabilities (C language)

Thread-Parallel Capabilities User's Guide(C language)

How to compile (MPI)

MPI User's Guide

11.1.5 BLAS LAPACK ScaLAPACK

User's Guide


- 133 -

11.1.6 Intel Compiler

Fortran User's Guide

C User's Guide

Math Kernel Library(MKL) User's Guide

11.1.7 PGI Compiler

PGI User's Guide

PGI Tools Guide

PGI Fortran Reference

11.1.8 Message Passing Toolkit (MPT)

User's Guide


- 134 -

Appendix


- 135 -

1. FTL Examples

Job execution scripts using FTL are introduced in this appendix. FTL commands are in bold.

Following environment variables are used in this appendix.

Environment variable Value

$MJS_CWD /home/username/job

$MJS_DATA /data/username

$MJS_REQID REQUEST-ID of job


- 136 -

1.1 Execute serial job

1.1.1 sample 1 (transfer output file to job execution directory)

Content of job execution

Execute execution module a.out. Transfer output file to job execution directory.

Item Value Remark

Job execution directory $MJS_CWD

Execution module a.out

Input file / directory (none)

Output file output

Destination of output file $MJS_CWD

Job execution script

#!/bin/sh

#MJS: -proc 1 -eo

#MJS: -cwd

#BEFORE: a.out

srun ./a.out

#AFTER: output


- 137 -

Transfer input file

Transfer output file

Host: Login Server

$MJS_CWD

a.out

transfer

Host: node 0 (rank 0)

$MJS_CWD

a.out

Host: Login Server

$MJS_CWD

output

transfer


$MJS_CWD

output


- 138 -

1.1.2 sample 2 (Transfer output file to /data)


Execution module a.out. Transfer output file to /data/username/data.

Item Value Remark



Input file (none)

Output file output

Destination of output file $MJS_DATA/data


#!/bin/sh

#MJS: -proc 1 -eo

#MJS: -cwd

#BEFORE: a.out

srun ./bin/a.out

#AFTER: 0@${MJS_DATA}/data: output


- 139 -

Transfer input file


Host: Login Server

$MJS_DATA/username/data

output

transfer


$MJS_CWD

output

Host: Login Server

$MJS_CWD

a.out

transfer


$MJS_CWD

a.out


- 140 -

1.1.3 sample 3 (Transfer directory)


Transfer directory which has necessary files for job execution.

Execution module a.out. Transfer output file to job execution directory.

Item Value Remark


Transfer directory bin

Input file (none)

output file output



#!/bin/sh

#MJS: -proc 1 -eo

#MJS: -cwd

#BEFORE_R: bin

srun ./bin/a.out

#AFTER: output


- 141 -

Transfer input directory


Host: Login Server

$MJS_CWD

output

transfer


$MJS_CWD

output

Host: Login Server

$MJS_CWD

bin

transfer


$MJS_CWD

bin


- 142 -

1.2 Execute parallel job

1.2.1 sample 1 (16 cores in parallel job)


Transfer input file which is necessary for each rank. Execute MPI execution module a.out of 16 cores

in parallel job. Transfer output file of each rank to Job execution directory.

Item Value Remark



Input file input.0 - input.15 Input file is different with

each rank.

output file output.0 - output.15 Output for each rank

Destination of output

file $MJS_CWD


#!/bin/sh

#MJS: -proc 16 -eo

#MJS: -cwd

#<BEFORE>

# a.out

# 0: input.0, input.1, input.2, input.3, input.4, input.5, input.6,

input.7

# 8: input.8, input.9, input.10, input.11, input.12, input.13, input.14,

input.15

</BEFORE>

mpirun ./a.out

#<AFTER>

# 0: output.0, output.1, output.2, output.3, output.4, output.5,

output.6, output.7

# 8: output.8, output.9, output.10, output.11, output.12, output.13,

output.14, output.15

</AFTER>


- 143 -

Transfer input file


Host: Login Server

$MJS_CWD

transfer

Host：node 0(rank 0 - 7)

$MJS_CWD

transfer

output.

0

output.

15

output.

8

output.

7

output.

0

output.

7


$MJS_CWD

output.

8

output.

15

..

.

..

.

..

.

..

.

Host: Login Server

$MJS_CWD

a.out transfer


$MJS_CWD

input.

0

input.

15

input.

8

input.

7

a.out

input.

0

input.

7


$MJS_CWD

a.out

input.

8

input.

15

transfer

..

.

..

.

..

.

..

.


- 144 -

1.2.2 sample 2 (Use FTL variable)


Use FTL variable ($MPI_RANK) for 1.2.1 sample 1 (16 cores in parallel job) case.

Item Value Remark



Input file input.0 - input.15 Input file is different

with each rank

output file output.0 - output.15 Output for each rank



Transfer input file

It is the same as 1.2.1 sample 1 (16 cores in parallel job)


It is the same as 1.2.1 sample 1 (16 cores in parallel job)

#!/bin/sh

#MJS: -proc 16 -eo

#MJS: -cwd

#<BEFORE>

# a.out, input.$MPI_RANK

</BEFORE>

mpirun ./a.out

#<AFTER>

# output.$MPI_RANK

</AFTER>


- 145 -

1.2.3 sample 3 (Transfer files of same file name avoiding overwriting)


Execute MPI execution module a.out of 16 cores in parallel job. Transfer output files of the same

name on each rank to Job execution directory avoiding overwritng.

Item Value Remark



Input file (none)

output file output 1 output file per rank



#!/bin/sh

#MJS: -proc 16 -eo

#MJS: -cwd

#BEFORE: a.out

mpirun ./a.out

#FTL_SUFFIX: on

#AFTER: output


- 146 -

Transfer input file


Output file name is added rank number (first rank of the node) in advance of file transfer.

Host: Login Server

$MJS_CWD

output.

0


$MJS_CWD

output


$MJS_CWD

output output.

8

transfer

transfer

Host: Login Server

$MJS_CWD

a.out

transfer


$MJS_CWD

a.out

transfer


$MJS_CWD

a.out


- 147 -

1.2.4 sample 4 (Use rank format)


Execute MPI execution module a.out of 16 cores in parallel job. Transfer output file of each rank (rank

number is 3 digits) to Job execution directory.

Item Value Remark



Input file (none)

output file output.000 - output.015 Output file for each

rank



Input file transfer

It is the same as 1.2.3 sample 3 (Transfer files of same file name avoiding overwriting).

#!/bin/sh

#MJS: -proc 16 -eo

#MJS: -cwd

#BEFORE: a.out

mpirun ./a.out

#FTL_RANK_FORMAT: 3

#AFTER: output.$MPI_RANK


- 148 -

output file transfer

Host: Login Server

$MJS_CWD

transfer


$MJS_CWD

transfer

output.

000


$MJS_CWD

output.

007

output.

008

output.

015

output.

000

output.

007

output.

008

output.

015

..

.

..

.

..

.

..

.


- 149 -

1.3 FTL basic directory (FTLDIR command)

1.3.1 sample 1 (16 cores in parallel job)




Item Value Remark


FTL basic directory $MJS_CWD


Input file input

output file output.0 - output.15 Output file for each

rank



#!/bin/sh

#MJS: -proc 16 -eo

#MJS: -cwd

#FTLDIR: $MJS_CWD

mpirun ./a.out


- 150 -

Transfer input file


Transfer only newly created files during job execution.

File name is added rank number.

Host: Login Server

$MJS_CWD/$MJS_REQID

transfer


$MJS_CWD

transfer

output.

0.0

output.

15.8

output.

8.8

output.

7.0

output.

0

output.

7


$MJS_CWD

output.

8

output.

15

..

.

..

.

..

.

..

.

Host: Login Server

$MJS_CWD

a.out

transfer


$MJS_CWD

input

a.out input


$MJS_CWD

a.out input

transfer


- 151 -

1.3.2 sample 2 (File collect type: mtime)




Item Value Remark


FTL basic directory $MJS_CWD


Input file input

output file

output.0 - output.15 Newly created for each

rank

input Input file is updated

during job execution



Transfer input file

It is the same as 1.3.1 sample 1 (16 cores in parallel job).

#!/bin/sh

#MJS: -proc 16 -eo

#MJS: -cwd

#FTL_COLLECT_TYPE: mtime

#FTLDIR: $MJS_CWD

mpirun ./a.out


- 152 -


Host: Login Server

$MJS_CWD/$MJS_REQID

transfer

transfer

output.

0.0

output.

15.8

output.

8.8

output.

7.0


$MJS_CWD

output.

8

output.

15

input


$MJS_CWD

output.

0

output.

7

input input.

0

input.

8

..

.

..

.

..

.

..

.


- 153 -

1.4 Others

1.4.1 Execute job using temporary directory


Execute MPI execution module a.out of 16 cores in parallel job. The a.out needs tmp directory in job

execution directory of each rank.

Item Value Remark



Input file (none)

output file output 1 output file for each

rank



#!/bin/sh

#MJS: -proc 16 -eo

#MJS: -cwd

#FTL_MAKE_DIR: $MJS_CWD/tmp

#BEFORE: a.out

mpirun ./a.out

#AFTER: output


- 154 -

Make directory

Transfer input file




Host: Login Server

make


$MJS_CWD

tmp

make


$MJS_CWD

tmp


- 155 -

1.4.2 Execute job using meta character


Transfer the same input file to each rank. Execute MPI execution module a.out of 16 cores in parallel

job.

Item Value Remark



Input file input.0 - input.15 The same input file is

necessary for each rank

output file output only on MPI master node



#!/bin/sh

#MJS: -proc 16 -eo

#MJS: -cwd

#<BEFORE>

# a.out, input*

#</BEFORE>

mpirun ./a.out

#AFTER: 0@$MJS_CWD:output


- 156 -

Transfer input file


Host: Login Server

$MJS_CWD

a.out

transfer


$MJS_CWD

input.

0

input.

15

a.out

input.

0

input.

15

transfer


$MJS_CWD

a.out

input.

0

input.

15

..

.

..

.

..

.

Host: Login Server

$MJS_CWD

output

transfer


$MJS_CWD

output

RIKEN Integrated Cluster of Clusters System User's Guide

Documents

Transcript of RIKEN Integrated Cluster of Clusters System User's Guide