RIKEN Integrated Cluster of Clusters System User's Guide
Transcript of RIKEN Integrated Cluster of Clusters System User's Guide
Version 1.22
Apr. 01, 2015
Advanced Center for Computing and
Communication
RIKEN
RIKEN Integrated Cluster of Clusters System
User's Guide
Copyright (C) RIKEN, Japan. All rights reserved.
- ii -
Version management table
Version Revision date Change content
1.4 2010.02.19 1.1 “Outline of the System” modified
2.1.5 “Access to the RIKEN network from RICC” added
2.2 “Account and Authentication” modified
2.6 “Login environment” modified
3.1 “Available file area” modified
3.2.3 “Local disk area (work area)” modified
4. “How to create jobs” modified
5.1.1.3.1 “Example of chain job” added
5.1.1.4 “Major options for job submission” modified
5.1.1.6 “Software resource” modified
5.1.1.10 “Script file for Batch job” modified
5.1.2 “Confirm job information” modified
5.1.3 “Operate job” modified
5.2 “Interactive Job” modified
6.6 “FTL Syntax” modified
8 “How to use Archive system” modified
9.2 “RICC Mobile Portal” modified
10.1 “Product manual” modified
1.5 2010.03.17 5.1.3 “Operate job” modified
7.2 “Time measurement” modified
8 “How to use Archive system” modified
11.5.3 “Create script (Amber8 : MDGRAPE-3 Cluster)” modified
1.6 2010.04.01 1.1 “Outline of the System” modified
1.2 “Hardware outline” modified
5.1.1.3 “Function outline and Job submit command format” modified
5.1.1.5 “Hardware resources” modified
1.7 2010.06.07 3.1 “Available file area” modified
3.2.2 “Data area” modified
5.1.1.4 “Major options for job submission” modified
5.1.1.6 “Software resource” modified
5.1.1.8 “Major options for job submission command” modified
11.2.3 “Specify temporary directory” added
11.3.1 “Create script” modified
Copyright (C) RIKEN, Japan. All rights reserved.
- iii -
1.8 2010.07.30 3.2.2 “Data area” modified
7.2 “Debugger” added
8 “Tuning” added
1.9 2010.08.25 1.1 “Outline of the System” modified
1.2 “Hardware outline” modified
5.1.1.3 “Function outline and Job submit command format” modified
5.1.1.5 “Hardware resources” modified
5.1.1.6 “Soware resources” modified
6.7 “FTL generating tool : ftlgen added
12.2 GaussView modified
12.4 ANSYS modified
12.5 Amber modified
1.10 2010.11.19 1.3 “Available application and libraries” added
5.1.1.6 “Soware resources” modified
12.3 “NBO for Gaussian” added
1.11 2011.05.02 5.1.5 “Confirm project user job information” added
6.6.12.5 “FTL variable” modified
12.1.1.6 “Use scrach area on the local disk of computing node” added
12.5 “ANSYS” modified
1.12 2011.08.17 5.1.1 “Submit batch job” modified
5.1.3.2 “Hardware resources” modified
5.2 “Interactive job” modified
1.13 2012.4.2 1.2.7 “Cluster for single jobs using SSD” added
1.5.1 “Available computation time” modified
3.1 “Available file area” modified
3.2.2 “Data area” modified
3.2.3 “Local disk area (work area)” modified
5.1.3.2 “Hardware resources” modified
5.1.3.3 “Software resource” modified
5.1.6.1 “Display resource information” modified
5.1.6.2 “Display usage of core” modified
12.1.1.1 “Create script” modified
1.14 2012.7.13 5.1.3.3 “Software resource” modified
12.6 “Amber” modified
1.15 2012.11.26 4.2 “Compilation / Linkage for GPGPU program” modified
Copyright (C) RIKEN, Japan. All rights reserved.
- iv -
1.16 2013.1.11 1.1 “Outline of the System” modified
1.2.2 “Multi-purpose Parallel Cluster” modified
5.1.3.2 “Hardware resources” modified
5.1.3.3 “Software resources” modified
12.7 “GAMESS” modified
1.17 2013.4.1 4.4.5 “IMSL Fortran Numerical Library” added
12.4 “ANSYS” modified
12.8 “MATLAB” added
1.18 2013.5.14 5.1.3.3 “Software resources” modified
12.9 “Q-Chem” added
1.19 2013.9.2 4.2 “Compilation / Linkage for GPGPU program” modified
5.1.3.2 “Hardware resources” modified
5.1.3.3 “Software resources” modified
5.1.6.1 “Display resource information” modified
12. “Application” removed (moved to RICC portal https://ricc.riken.jp)
1.20 2013.10.16 5.1.3.3 “Software resources” modified
5.1.6.1 “Display resource information” modified
1.21 2014.08.04 5.1.3.3 “Software resources” modified
1.22 2015.04.01 1. “Outline of the system” modified
2. “How to Access” modified
3. “File Area” modified
4. “How to create jobs” modified
5. “How to execute Job” modified
6. “FTL(File Transfer Language)” modified
9. “How to use Archive system” modified
Copyright (C) RIKEN, Japan. All rights reserved.
- v -
Contents
Introduction ............................................................................................................................................ 1
1. Outline of the System ........................................................................................................................ 2
1.1 Outline of the System..................................................................................................................... 2
1.2 Hardware outline ............................................................................................................................ 3
1.3 Available application and libraries .................................................................................................. 4
1.4 Maintenance .................................................................................................................................. 4
1.5 Usage categories ........................................................................................................................... 5
2. How to Access ................................................................................................................................... 6
2.1 Login Flow ...................................................................................................................................... 6
2.2 Account and Authentication ......................................................................................................... 19
2.3 Update Password ......................................................................................................................... 20
2.4 Access RICC ................................................................................................................................ 23
2.5 Login environment ....................................................................................................................... 26
2.6 File transfer .................................................................................................................................. 27
3. File Area ............................................................................................................................................ 30
3.1 Available file area ......................................................................................................................... 30
3.2 Type of available file area ............................................................................................................ 31
4. How to create jobs ........................................................................................................................... 32
4.1 Outline of Compilation / Linkage .................................................................................................. 32
4.2 Compilation / Linkage for GPGPU program ................................................................................ 37
4.3 Library management .................................................................................................................... 39
4.4 Linkage of Math library................................................................................................................. 40
4.5 Job Freeze Function .................................................................................................................... 41
5. How to execute Job ......................................................................................................................... 44
5.1 Batch job / Interactive batch job ................................................................................................... 45
5.2 Interactive Job .............................................................................................................................. 83
6. FTL (File Transfer Language) ......................................................................................................... 84
6.1 Introduction .................................................................................................................................. 84
6.2 Transfer input file ......................................................................................................................... 85
6.3 Transfer input directory ................................................................................................................ 86
6.4 Transfer output file ....................................................................................................................... 87
6.5 FTL Basic Directory ..................................................................................................................... 89
6.6 FTL Syntax ................................................................................................................................... 91
Copyright (C) RIKEN, Japan. All rights reserved.
- vi -
6.7 FTL generating tool : ftlgen ......................................................................................................... 110
7. Development Environment ............................................................................................................112
7.1 Endian conversion ...................................................................................................................... 112
7.2 Debugger .................................................................................................................................... 113
8. Tuning ..............................................................................................................................................115
8.1 Tuning overview .......................................................................................................................... 115
8.2 Time measurement ..................................................................................................................... 115
8.3 Program development support tool ............................................................................................. 119
8.4 Network topology ....................................................................................................................... 124
9. How to use Archive system .......................................................................................................... 125
9.1 Configuration .............................................................................................................................. 125
9.2 pftp ............................................................................................................................................. 126
9.3 hsi............................................................................................................................................... 127
9.4 htar ............................................................................................................................................. 128
10. RICC Portal ................................................................................................................................... 129
10.1 RICC Portal .............................................................................................................................. 129
11. Manual .......................................................................................................................................... 130
11.1 Product manual ........................................................................................................................ 132
Appendix ............................................................................................................................................ 134
1. FTL Examples ................................................................................................................................ 135
1.1 Execute serial job ....................................................................................................................... 136
1.2 Execute parallel job .................................................................................................................... 142
1.3 FTL basic directory (FTLDIR command) ................................................................................... 149
1.4 Others ........................................................................................................................................ 153
Copyright (C) RIKEN, Japan. All rights reserved.
- 1 -
INTRODUCTION In this User’s Guide, we explain usage of the Supercomputer System (RICC, RIKEN Integrated Cluster
of Clusters) installed at RIKEN. Please read this document before you start using the system. This
User’s Guide is available for reference and download at the following homepage. The contents of this
User’s Guide are subject to change.
Shell scripts and other examples in this User’s Guide are available in the following directory on RICC.
RICC system operation schedule is announced by the following web page and RICC user ’s mailing list.
Futhermore, training classes are scheduled several times a year to provide technical support to use
RICC. Class schedule is available on the following web page of the Advanced Center for Computing
and Communication.
Please send your inquiry on programming consultation, such as usage methods, debugging, paralleling
or tuning programs and any questions about RICC to the following e-mail address.
No portion of this document may be copied, reproduced, or distributed in any way, or by any means,
without permission.
https://ricc.riken.jp
ricc.riken.jp:/usr/local/example
http://accc.riken.jp/riccinfo
Email: [email protected]
Copyright (C) RIKEN, Japan. All rights reserved.
- 2 -
1. Outline of the System
1.1 Outline of the System
RICC (RIKEN Integrated Cluster of Clusters) consists of 2 computing systems for different purposes
(Massively parallel computing, Multi-purpose parallel computing,), Frontend system, 2.2PB disk device
and 2PB tape library system. Massively Parallel Cluster, core of the system, is PC cluster system of
3888 cores (peak performance 45.6 TFLOPS) for massively parallel computing. Multi-purpose Parallel
Cluster with GPU type of accelerator (peak performance 9.3 TFLOPS + 93.3 TFLOPS [single
precision]) for multi-purpose computing such as commercial or free applications’ execution.
Users are able to edit, compile, link programs, submit batch jobs and obtain computed results through
Login Server (ricc.riken.jp). Also each computing server can run interactive jobs which is necessary for
users to debug their programs. In addition, users can access the system from non-RIKEN network
through VPN and can use the system as if they were on the RIKEN network.
Users are able to login RICC on the RIKEN network by the ssh or the scp, etc. In addition, RICC
provides the web portal site, RICC Portal, which allows users to access RICC by web browser on the
PC. Users are able to edit, compile, link programs, submit batch jobs and obtain computed results on
RICC Portal.
In RICC, users’ home directories are located in the high speed magnetic disk device. Users can access
files in their home directories from Login Server, Multi-purpose Parallel Cluster. When executing batch
jobs on Massively Parallel Cluster, users need to transfer necessary files from their home directories to
local disks of Massively Parallel Cluster and return computed results back to their home directories.
These operations can be performed easily by commands in shell scripts used when submitting batch
jobs.
All systems of RICC are available to login by the issued RICC user accounts, the RICC passwords and
the passphrases of the public-key based authentication method. The passphrases can be generated on
RICC Portal.
Copyright (C) RIKEN, Japan. All rights reserved.
- 3 -
1.2 Hardware outline
PC Clusters consist of Massively Parallel Cluster [486 nodes (3888 cores)] and Multi-purpose Parallel
Cluster [100 nodes (800 cores)].
1.2.1 Massively Parallel Cluster
Computation performance
Intel Xeon X5570 (2.93GHz) 1048 nodes (952 CPUs, 3888 cores)
Total peak performance: 2.93 GHz x 4 calculations x 4 cores x 972 CPUs = 45.6 TFLOPS
Memory
12.5TB (12GB x 1048 nodes)
Memory bandwidth: 25.58GB/s = 1066MHz (DDR3-1066) x 8Byte x 3channel)
Byte/FLOP: 0.54 (Byte/Flop) = 25.58GB/s / (2.93GHz x 4calculations x 4cores)
HDD
272TB((147GB × 3 + 73GB) × 436 + (147GB × 6 + 73GB) × 50)
Interconnect (DDR InfiniBand)
All 486 nodes with DDR InfiniBand HCA are configured as a computer network of two-way
communication with performance of 16 Gbps per way.
1.2.2 Multi-purpose Parallel Cluster
Computation performance
Intel Xeon X5570 (2.93GHz) 100 nodes (200 CPUs, 800 cores) + NVIDIA Tesla C2075 GPU type
accelerator x 100
Total peak perfomance: 2.93GHz x 4 calculations x 4 cores x 100 CPUs = 9.3 TFLOPS
1.03 TFLOPS (single precision) x 100 = 103 TFLOPS
Memory
2.3 TB (24GB x 100 nodes)
Memory bandwidth: 25.58GB/s = 1066MHz (DDR3-1066) x 8Byte x 3channel)
Byte/FLOP: 0.54 (Byte/Flop) = 25.58GB/s / (2.93GHz x 4calculations x 4cores)
HDD
25.0TB (250GB x 100 nodes)
Interconnect (DDR InfiniBand)
All 100 nodes with DDR InfiniBand HCA are configured as a computer network of two-way
communication with performance of 16 Gbps per way.
Copyright (C) RIKEN, Japan. All rights reserved.
- 4 -
1.2.3 Frontend system
Frontend system is the first host to login to access RICC. Also it provides environment for program
development and execution for PC Clusters, MDGRAPE-3 Cluster and Large Memory Capacity Server.
Frontend system has 4 Login Servers. The Login Servers are connected to 2 load-balancers for
redundancy and high availability.
1.2.4 Cluster for single jobs using SSD
This cluster can be used via a RICC, provides an environment for jobs that require high-speed I/O
and non-parallel.
Local disk area
SSD 360GB (30GB / core)
Interconnect for data transfer
QDR InfiniBand
1.3 Available application and libraries
Information about available application(Gaussian, ANSYS, Amber etc) and libraries(FFTW, GSL,
HDF5, Python library etc) on RICC is released in the following URL.
1.4 Maintenance
Basically RICC is 24/7 operation, but emergent maintenance is performed if needed. We make every
effort to inform users of the maintenance in advance.
https://ricc.riken.jp/cgi-bin/hpcportal.2.2/index.cgi?LMENU=SYSTEM
Copyright (C) RIKEN, Japan. All rights reserved.
- 5 -
1.5 Usage categories
We have following four user categories. Users use RICC in one of the categories.
General Use
Quick Use
For more information, please refer to “ 4. Usage Categories” in “RIKEN Supercomputer System Usage
Policy”, which is available on the following URL.
1.5.1 Available computation time
Available computation time is different by projects. Use the listcpu command to check allotted
computation time, used computation time and date of expiry of allotted computation time. When used
computation time reaches 100%, jobs cannot be submitted.
[explanation]
Limit(h) : Allotted computation time (unit: hour)
Used(h) : Used computation time (unit: hour)
Use(%) : Used computation time / Alloted computation time (unit: %)
Date of expiry : Date of expiry of allotted computation time
1.5.2 List Project number / Project name
Use listprj (or listproject) to list Project number and Project name.
http://accc.riken.jp/ricc/policy_e.html
[username@ricc1:~] listcpu
[Q00100] Study of parallel programs <-- Project no./Project name
Limit(h) Used(h) Use(%) Date of expiry
----------------------------------------------------------------------
Total 402000.0 80400.0 20.0% 2016/03/31
+- mpc - 80000.0 - -
+- upc - 400.0 - -
+- ssc - 0.0 - -
[username@ricc1:~] listprj
Q00001(Quick) Study of massively parallel programs on RIKEN Cluster of
Clusters
G00001(General) Research of RICC
Copyright (C) RIKEN, Japan. All rights reserved.
- 6 -
2. How to Access
2.1 Login Flow
The login flow for RICC system from account application to login as folllows:
When the account is issued, the e-mail with the client certficate attachment is sent. After installing the
client certificate on your PC, access to the RICC Portal. You can login to the front end servers via SSH
by registering your ssh public key on the RICC Potal.
Figure 2-1 Login Flow
Copyright (C) RIKEN, Japan. All rights reserved.
- 7 -
2.1.1 Initial Settings
When accessing the system for the first time, login to the RICC Portal and make sure to do the
following initial settings:
2.1.2 Install Client Certificate
2.1.3 Generate / Register public key and private key
2.1.2 Install Client Certificate
2.1.2.1 Windows
Install the client certificate ACCC sent you by e-mail.
Double click the client certificate provided by ACCC. The Certificate Import Wizard starts. Click
"Next" button.
Figure 2-2 The first screen of "Certificate Import Wizard"
Copyright (C) RIKEN, Japan. All rights reserved.
- 8 -
Figure 2-3 The second screen of "Certificate Import Wizard"
Figure 2-4 The third screen of "Certificate Import Wizard"
Figure 2-5 The fourth screen of "Certificate Import Wizard"
1. Enter the passowrd for "Client
Certificate" issued by ACCC.
2. Click "Next" button.
Copyright (C) RIKEN, Japan. All rights reserved.
- 9 -
Figure 2-6 The fifth screen of "Certificate Import Wizard"
Figure 2-7 The sixth screen of "Certificate Import Wizard"
Copyright (C) RIKEN, Japan. All rights reserved.
- 10 -
2.1.2.2 Mac
Install the client certificate ACCC sent you by e-mail.
Double click the client certificate provided by ACCC.
Figure 2-8 The first screen of "Certificate Import Wizard"
Enter the passowrd for
"Client Certificate" issued
by ACCC.
Copyright (C) RIKEN, Japan. All rights reserved.
- 11 -
2.1.3 Generate / Register public key and private key
When accessing RICC by Virtual terminal (ssh / scp, etc), the authentication method is the public-key
based authentication method whether accessing from the RIKEN network or from non-RIKEN network.
Therefore, each user needs to register the public key into Login Server and the private key into the PC /
WS accessing RICC. Preparation flow is following.
(1) Access RICC Portal (refer to 2.1.3.1 Access RICC Portal)
(2) Generate and/or register a public key by either of following way.
Generate a public key on RICC Portal
Generate a public-key pair of a public key and a private key on RICC Portal, and
then store the private key into the terminal.
(refer to 2.1.3.2 Generate public key and private key on RICC Portal)
B) Generate a public key on the terminal (Mac, Linux, etc.) (for advanced users)
Generate a public-key pair of a public key and a private key on the terminal (Mac,
Linux, etc.) accessing RICC, and then register the public key on RICC Portal.
(refer to 2.1.3.3 Generate a public key on the terminal (Mac, Linux, etc.) (for
advanced users))
2.1.3.1 Access RICC Portal
RICC users access RICC Portal (following URL) to generate a public-key pair.
https://ricc.riken.jp
1. Select Client Certification
Click [OK]
Copyright (C) RIKEN, Japan. All rights reserved.
- 12 -
Fig. 2-9 RICC Portal login window
1. Enter RICC user account
2. Enter RICC password
3. Click [LOGIN]
Copyright (C) RIKEN, Japan. All rights reserved.
- 13 -
2.1.3.2 Generate public key and private key on RICC Portal
(1) At [Setting] – [Key Generation] menu, enter a public-key passphrase.
(Don’t forget the public-key passphrase)
Fig. 2-10 Public-key pair generation window
1. Click [Setting]
2. Click [Key Generation]
3. Enter a passphrase
4. Retype the same
passphrase
5. Select [SSH-2RSA]
6. Select OS(Software) type
7. Click [Generate Key]
Copyright (C) RIKEN, Japan. All rights reserved.
- 14 -
Save the private key into the PC.
[In case of Windows (for PuTTY, WinSCP)]
Fig. 2-11 Private key display window
[In case of Mac(OS X)/UNIX/Linux]
Fig. 2-12 Private key display window
* Public-key pairs can be generated as many as users want. Also, registered public keys
generated in past time are not deleted by generation of public-key pairs.
1. The private key is displayed
at the bottom of the window
2. Copy the private key strings
3. Save it into the terminal as a text file
4. Change permission of the file to 600
e.g. $ chmod 600 ~/.ssh/id_rsa
(note)
Save the private key file as ~/.ssh/id_rsa. If
saved in other directory or as other name,
specify the private key file when you access
RICC by the ssh command as follows.
e.g. $ ssh -i private-key-file
-l RICC-account ricc.riken.jp
1. The private key is displayed
at the bottom of the window
2. Copy the private key strings
3. Save it into the terminal as a text file
Note 1: Save the text file by one of the
following editors with the character code.
-notepad: ANSI
-wordpad: text document
Note 2: Extension of the text file should be
“ppk”
(e.g. id_rsa.ppk)
Copyright (C) RIKEN, Japan. All rights reserved.
- 15 -
2.1.3.3 Generate a public key on the terminal (Mac, Linux, etc.) (for advanced users)
(*) If you generate a public key in the way of Generate public key and private key on RICC
Portal2.1.3.2 , please skip this section.
(1) Use the ssh-keygen command on the terminal to generate a public-key pair on the terminal.
Mac (OS X): Start Terminal. Execute the ssh-keygen command.
UNIX / Linux: Start terminal emulater. Execute the ssh-keygen command.
Fig. 2-13 Generate a public-key pair
Access RICC Portal from web browser. Move to Key management window.
Fig. 2-14 Move to key management window
1. Click [Setting]
2. Click [Key Management]
3. Click [Update Public Key]
https://ricc.riken.jp
1. Enter the ssh-keygen command
2. Press the return key (If you save it
as other than ~/.ssh/id_rsa file,
enter a file name. (Note))
3. Enter passphrase
4. Retype the same passphrase
(Note) In such case, specify the private
key file when you access RICC by the
ssh command as follows.
Exapmple)
$ ssh -i private-key-file
-l RICC-account ricc.riken.jp
Copyright (C) RIKEN, Japan. All rights reserved.
- 16 -
Display the generated public key and register it on RICC Portal.
Mac(OS X): Start Terminal. Execute the cat command to display the public key.
UNIX / Linux: Start terminal emulater. Execute the cat command to display the public key..
(Note) If the ssh-keygen command is executed with no argument at step (1), a public
key is stored in ~/.ssh/id_rsa.pub file.
Command example: $ cat ~/.ssh/id_rsa.pub
Fig. 2-15 Copy the content of the public key
Fig. 2-16 Register the public key
Logout RICC
Fig. 2-17 RICC Portal logout
1. Display the generated public
key.
$ cat "public-key-file"
2. Copy the content
1. Paste the content of the
public key.
2. Select key type
3. Click [save]
Click [logout]
Copyright (C) RIKEN, Japan. All rights reserved.
- 17 -
2.1.3.4 Delete registered public key
(1) Access RICC Portal from web browser.
Move to [Delete Public Key] window.
Fig. 2-18 Move to Delete Public Key window
(2) Delete the registered public keys.
Fig. 2-19 Deletion of public keys window
(3) Logout RICC Portal.
Fig. 2-20 RICC Portal logout
Click [Delete All Keys]
*All the registered public
keys are deleted.
1. Click [Setting]
2. Click [Key Management]
3. Click [Delete Public Key]
Click [logout]
https://ricc.riken.jp
Copyright (C) RIKEN, Japan. All rights reserved.
- 18 -
2.1.4 Network Access
Destination hosts are as follows:
note 1 : On how to use GaussView , please refer to RICC Portal(https://ricc.riken.jp)
2.1.5 Available service
ssh/scp (Virtual terminal, file transfer)
https (RICC Portal, online manual)
2.1.6 Access to out of the RIKEN network from RICC
When you access to external systems from the RICC system, login to the front end servers enabling
SSH Agent forwarding (-A option).
After login HOKUSAI-GreatWave, login to the RICC front end servers enabling SSH Agent
forwarding (-A option).
Host name(FQDN) Purpose to access
ricc.riken.jp Usaual access
riccgv.riken.jp GaussView use note 1
[username@greatwave:~]$ ssh -A –l username ricc.riken.jp
[username@Your-PC ~]$ ssh -A –l username greatwave.riken.jp
Copyright (C) RIKEN, Japan. All rights reserved.
- 19 -
2.2 Account and Authentication
The account to access RICC, which is called RICC user account, is what the user specified in the
application form. However, the password to enter is different by access methods. Password for each
access method is listed in Table 2-1 Access method list.
The RICC password is what is given to the user when the RICC user account is issued. Please change
the initial RICC password after logging into RICC Portal for the first time.
Access method Protocol Account Password
RICC Portal https RICC user
account
(specified in the
application form)
RICC password
HPSS pftp3 RICC password
Virtual terminal ssh
(scp/sftp)
Public-key passphrase (specified
by user) 4
Table 2-1 Access method list
1 : pftp is the special commands to transfer files between users’ home directories and the Archive
system.
pftp is an enhanced command of the ftp. The pftp can be used in the same way of the ftp.
2 : Public-key passphrase is specified by a user when a pair of a public key and a private key is
generated. A pair of a public key and a private key can be generated in RICC Portal. Please refer
to 2.1.3 Generate / Register public key and private key.
Copyright (C) RIKEN, Japan. All rights reserved.
- 20 -
2.3 Update Password
When logging into RICC for the first time, make sure to update the initial RICC password on RICC
Portal.
Password updating flow is following.
(1) Access RICC Portal (refer to 2.3.1 Access RICC)
(2) Update password(refer to 2.3.2 Password updating procedure)
2.3.1 Access RICC Portal
RICC users access RICC Portal (following URL) to update the initial password.
2.3.2 Password updating procedure
(1) Login RICC Portal
Fig. 2-21 RICC Portal login window
https://ricc.riken.jp
Select Client Certification
Copyright (C) RIKEN, Japan. All rights reserved.
- 21 -
Fig. 2-22 RICC Portal login window
(2) At [Setting] – [Password Update] menu, update the initial password.
(If the initial password is not updated on RICC Portal, Password Update menu is shown just after
logging into RICC Portal.)
Fig. 2-23 Password update window
1. Click [Setting]
2. Click [Password Update]
3. Enter the initial password
4. Enter a new password
5. Retype the same password
6. Click [Update]
Condition of password:
- At least 6 characters
- Not simple
(e.g. dictionary word)
1. Enter RICC user account
2. Enter the initial password
3. Click [LOGIN]
Copyright (C) RIKEN, Japan. All rights reserved.
- 22 -
(3) Confirm password was updated.
Fig. 2-24 Confirmation of password update window
(4) Logout RICC Portal
Fig. 2-25 RICC Portal logout
Click [logout]
Click [OK]
Copyright (C) RIKEN, Japan. All rights reserved.
- 23 -
2.4 Access RICC
2.4.1 Login
Use ssh service to login RICC from PC / WS. The ssh command for UNIX / Mac (OS X) and PuTTY for
Windows are recommended. PuTTY is available on the following website.
http://www.chiark.greenend.org.uk/~sgtatham/putty/
The host to access is following.
Login prompt varies each time of login because Login Servers (4 servers) are load-balanced by the
load-balancers.
A) For UNIX / Mac(OS X)
Host name (FQDN)
ricc.riken.jp
% ssh –l username greatwave.riken.jp
T The authenticity of host 'greatwave.riken.jp' can't be established. Displayed only
RSA key fingerprint is 26:8a:53:1e:d3:3f:ed:29:e0:a3:32:0d:d5:6e:1a:e2 . at first-time login.
Are you sure you want to continue connecting (yes/no)? yes <---------------------- Enter [yes]
Warning: Permanently added 'greatwave.riken.jp' (RSA) to the
list of known hosts.
Enter passphrase for key '/home/username/.ssh/id_rsa': ++++<---Enter the pablic-key passphrase
[username@greatwave1:~] ssh –l username ricc.riken.jp
The authenticity of host 'ricc.riken.jp' can't be established. Displayed only
RSA key fingerprint is 26:8a:53:1e:d3:3f:ed:29:e0:a3:32:0d:d5:6e:1a:e2 . at first-time login.
Are you sure you want to continue connecting (yes/no)? yes <---------------------- Enter [yes]
Warning: Permanently added 'ricc.riken.jp' (RSA) to the
list of known hosts.
Enter passphrase for key '/home/username/.ssh/id_rsa': ++++<---Enter the pablic-key passphrase
[username@ricc1:~]
Copyright (C) RIKEN, Japan. All rights reserved.
- 24 -
B) For Windows
1. Specify the private key in a virtual terminal.
Fig. 2-26 Private key setting window
2. Access RICC with a virtual terminal.
Fig. 2-27 Virtual terminal (PuTTY) Session window
For PuTTY,
1. Go to [Connection] – [SSH] –
[Auth] menu
2. Click [Browse] and specify the
private key in
2.1.3
Generate / Register public key and
private key
For PuTTY,
1. Click [Session]
2. Enter following items
Host name: greatwave.riken.jp
Port: 22
Connection type: SSH
3. Enter a session name at [Saved
Sessions] (e.g.GreatWave)
4. Click [Save]
5. Click [Open]
Copyright (C) RIKEN, Japan. All rights reserved.
- 25 -
3. For the first-time login, following security alert window is shown. Click [Yes].
This alert is now shown at future logins.
Fig. 2-28 Virtual terminal (PuTTY) Security Alert window
4. Enter the RICC user account and the public-key passphrase.
Fig. 2-29 Virtual terminal login completion
2.4.2 Logout
Enter “exit” or “logout” at prompt. Logout process might take a little time for post processing (writing the
history file).
1. Enter RICC user account at
[login as]
2. Enter the public-key passphrase
Copyright (C) RIKEN, Japan. All rights reserved.
- 26 -
2.5 Login environment
In RICC, bash or tcsh is available as login shell. Default is bash. If you want to change it, please
contact the Advanced Center for Computing and Communication ([email protected])
An environment setting file to use RICC is stored in your login directory.
(note) To add paths to environment variable PATH, add them to the end of PATH. If not, you may not
use the system properly.
Also, original skeleton files are available in the following directory of Login Server.
ricc.riken.jp:/usr/local/example/skel
Copyright (C) RIKEN, Japan. All rights reserved.
- 27 -
2.6 File transfer
2.6.1 File transfer of RICC
Use ssh service to transfer files between RICC and the PC / WS. The scp (sftp) command for UNIX /
Mac (OS X) and WinSCP for Windows are recommended. WinSCP is available on the following
website.
http://winscp.net/eng/docs/
The host to access is following.
A) For UNIX / Mac (OS X)
B) For Windows
Login RICC with WinSCP. Files can be transferred by drag & drop after login.
1. Login RICC with WinSCP.
Host name (FQDN)
ricc.riken.jp
% scp local-file [email protected]:remote-dir
public key passphrase:++++++++ Enter the public-key passphrase
file-name 100% |***********************| file-size
[username@greatwave1:~] scp local-file [email protected]:remote-dir
public key passphrase:++++++++ Enter the public-key passphrase
1. Click [New Site]
2. Enter following items
Host name: greatwave.riken.jp
Port number: 22
User name: RICC user account
Password: Public-key passphrase
3. Click [Advanced]
Copyright (C) RIKEN, Japan. All rights reserved.
- 28 -
Fig. 2-30 WinSCP Login window
4. Open [Authentication]
5. Enter following items
Private key file:Private key file
6. Clock [OK]
Copyright (C) RIKEN, Japan. All rights reserved.
- 29 -
2. Files can be transferred by drag & drop.
Fig. 2-31 WinSCP after login window
Login to GreatWave(greatwave.riken.jp) for transfer files to RICC
Also, files can be uploaded / downloaded on RICC Portal using web browser.
However, upload / download function of RICC Portal cannot transfer multiple files.
[username@greatwave1:~] scp local-file [email protected]:remote-dir
public key passphrase:++++++++ Enter the public-key passphrase
file-name 100% |***********************| file-size
Copyright (C) RIKEN, Japan. All rights reserved.
- 30 -
3. File Area
3.1 Available file area
Available file areas are following.
Area Area name Size Device
homenote 1 /home 2.2PB 4TB/user
data /data (4TB~52TB)/Project
local disk
(work area)
/work depends on clusternote 2 computing node
archive /arc 2PB ReadOnly
Table 3-1 Available file area list
Note 1 : home area is limited to less than 500GB per user by Quota.
Note 2 : local disk area of each cluster is limited as follows:
Massively Parallel Cluster 40GB/core
Multi-purpose Parallel Cluster 10GB/core
Cluster for single jobs using SSD 30GB/core
Available file areas for nodes are following.
File area Login
Server
Massively Parallel
Cluster
Multi-purpose Parallel
Cluster
Cluster for single
jobs using SSD
homenote 3 O O O O
data O O O O
local disk
(work area) - O
(for prestaging)
O
(scratch area for job)
O
(scratch area for job)
archive O - - -
O : Available for use - : Not available
Table 3-2 Available file area for nodes
Copyright (C) RIKEN, Japan. All rights reserved.
- 31 -
3.2 Type of available file area
3.2.1 Home area
Home area is 2.2PB shared file system located on Disk Storage System.
Home area is accessible from Login Server, Multi-purpose Parallel Cluster, Cluster for single jobs using
SSD.
Intended purpose:
To store source programs, object files and execution modules
To store small amounts of data
Usage of home area is limited to less than 2.2PB per user by Quota.
3.2.2 Data area
Data area is 2.2PB shared file system located on Archive System.
Data area is accessible from Login Server, Multi-purposes Parallel Cluster, Cluster for single jobs using
SSD.
Intended purpose:
Data sharing between Project members
To store large amounts of data
3.2.3 Local disk area (work area)
Local disk area (work area) is local file system on PC Clusters, Cluster for single jobs using SSD.
Intended purpose:
Staging area for jobs (FTL)
Scratch area while running jobs
Local disk area can be used by users’ jobs and the files are deleted when the jobs finish.
For Massively Parallel Cluster, the area is limited to less than 40GB per core. The more cores a job
uses, the more capacity the job can use. For example, a job using 4 cores can use the area up to
160GB.
For Multi-purpose Parallel Cluster, Cluster for single jobs using SSD, the area can be used for scratch
area while running jobs.
Copyright (C) RIKEN, Japan. All rights reserved.
- 32 -
4. How to create jobs
4.1 Outline of Compilation / Linkage
In RICC, programs are compiled and linked on Login Server. Specify which machine to compile / link a
program for.
Compilation / Linkage for GPGPU program is done on GPGPU Compile Server (accel). For more
information, please refer to 4.2 Compilation / Linkage for GPGPU program.
Format of compilation / Linkage is as follows.
command (serial / thread parallel) f77, f90, cc, c++
command (MPI parallel / XPFortran
parallel)
mpif77, mpif90, mpicc, mpic++
xpfrt
machine-option -pc
option optional (options of each compiler)
file source file, object file
Table 4-1 Format of Compilation / Linkage
command machine-option [option] file [...]
Copyright (C) RIKEN, Japan. All rights reserved.
- 33 -
Options to PC Clusters is following.
Option Meaning
-c output only object programs
-g generate the debugging information in object programs
-I specify the file for include
-L add directory to the list of directories in which the linker searches for libraries
-l search the library libname.so or libname.a
-o specify name of execution module
Table 4-2 Common option list
Common optimization options are following.
Common option Meaning Fujitsu
compiler
Intel compiler
-high (*1) Optimize for high speed
execution on the machine
-Kfast -O3 –ipo –no-prec-div –xHost
-middle In addition to basic
optimization、loop unrolling,
configuration change of
multiple loop, etc. are
performed.
-O2
-low Basic optimization -O1
-none No optimization -O0
Table 4-3 Common optimization option list
(*1) Specifying this option may give rise to side effects. Please pay attention.
Common options for thread parallel are following.
Common option Meaning Fujitsu compiler Intel compiler
-auto_parallel Perform auto parallelization -Kparallel -parallel
-auto_parallel_info Display information of auto
parallelization
-Kpmsg -par-report
-omp Enable OpenMP directives -KOMP -openmp
Table 4-4 Common thread option list
Copyright (C) RIKEN, Japan. All rights reserved.
- 34 -
Following libraries are available in RICC.
Machine Serial Parallel
Math library Parallel library Math library
PC Clusters BLAS
LAPACK
SSL II
IMSL
MPI
PVM
ScaLAPACK
SSL II
Table 4-5 Available library
If a machine to run modules is specified in CLTK user configuration file (${HOME}/.cltkrc),
machine-option (-pc) can be omitted for compilation / linkage.
* Option in command line has a priority over one in CLTK configuration file.
Parameter of CLTK user configuration file is following.
Parameter Value Meaning
CLTK_TARGET_MACHINE pc generate module for PC Clusters
Table 4-6 Parameter of CLTK user configuration file
Example of CLTK user configuration file:
There are cautions on compilation / linkage of thread parallel programs or MPI parallel programs. For
more information, please refer to product manuals. On how to refer to product manuals, please refer to
0
Manual.
CLTK_TARGET_MACHINE=pc
Copyright (C) RIKEN, Japan. All rights reserved.
- 35 -
4.1.1 Compilation / Linkage for PC Clusters
Fujitsu compiler is used for PC Clusters.
4.1.1.1 Serial program
Use f77/f90/cc/c++ to compile / link serial programs for PC Clusters.
1. Compile / link Fortran77 a program for PC Clusters. (optimization: high)
[username@ricc1:~] f77 –pc –high –o sample1.out sample1.f
2. Compile / link C a program for PC Clusters. (optimization: high)
[username@ricc1:~] cc –pc –high –o sample2.out sample2.c
4.1.1.2 Thread parallel program
Use f77/f90/cc/c++ to compile / link thread programs for PC Clusters. Specify a common option in
Table 4-4 as thread-option.
1. Compile / link a Fortran77 program for PC Clusters with auto parallelization.
[username@ricc1:~] f77 –pc –auto_paralell –o auto_para.out auto_para.f
2. Compile / link a C program including OpenMP for PC Clusters.
[username@ricc1:~] cc –pc –omp –o omp.out omp.c
4.1.1.3 MPI parallel program
User mpif77/mpif90/mpicc/mpic++ to compile / link MPI parallel programs for PC Clusters.
1. Compile / link an MPI Fortran77 program for PC Clusters.
[username@ricc1:~] mpif77 –pc –o mpi_sample1.out mpi_sample1.f
2. Compile / link an MPI C program for PC Clusters.
[username@ricc1:~] mpicc –pc –o mpi_sample2.out mpi_sample2.c
f77/f90/cc/c++ -pc [option] file [...]
f77/f90/cc/c++ -pc thread-option [option] file [...]
mpif77/mpif90/mpicc/mpic++ -pc [option] file [...]
Copyright (C) RIKEN, Japan. All rights reserved.
- 36 -
4.1.1.4 XPFortran parallel program
Use xpfrt to compile / link XPFortran(former VPP Fortran) parallel programs.
1. Compile / link an XPFortran program
[username@ricc1:~] xpfrt –o xpf.out xpf.f
xpfrt [option] file [...]
Copyright (C) RIKEN, Japan. All rights reserved.
- 37 -
4.2 Compilation / Linkage for GPGPU program
Compilation / Linkage for GPGPU programs (CUDA programs (*)) is done on GPGPU Compile Server
(accel).
(*) For more information on CUDA (Compute Unified Device Architecture), please refer to the following
web site (CUDA ZONE).
1. Log in to GPGPU Compile Server (accel) from Login Server
[username@ricc1 ~] ssh accel
[username@upc0000 ~]
2. Compile / link GPGPU programs (CUDA C programs) (use of CUDA compiler)
[username@upc0000 ~] nvcc [OPTION] file [...]
or
[username@upc0000 ~] cc –nvidia [OPTION] file [...]
3. Compile / link GPGPU programs (CUDA Fortran programs) (use of PGI compiler)
4. Compile / link GPGPU programs (CUDA MPI Fortran programs) (use of PGI compiler)
(*) Without machine-option on accel, PGI compiler is used by default.
Example of GPGPU programs (CUDA programs) Job script file:
[username@upc0000 ~] f90 [-pgi] [OPTION] file [...]
[username@upc0000 ~] mpif90 [-pgi] –ta=nvdia -Mcuda file [...]
http://www.nvidia.com/object/cuda_home_new.html
[username@ricc1:~] vi go.sh
#!/bin/sh
#------ qsub option --------#
#MJS: -accel
#MJS: -cwd
#---- Program execution ----#
srun ./a.out
Copyright (C) RIKEN, Japan. All rights reserved.
- 38 -
Example of GPGPU programs (CUDA MPI Fortran programs) Job script file:
[note]
Jobs can be submitted on Login Server (ricc1-4).
Jobs can not be submitted on GPGPU Compile Server (accel).
Specify –accel as hardware resource to submit jobs using GPGPU (CUDA programs).
When -accel is specified, the job consumes 1 CPU (4 cores) as resource.
Specify -accelex as the hardware resource if you want to use 1 node exclusively. In this case, each
process consumes 8 cores resource.
A job which uses 2 or more GPGPU consumes 1 node (8cores) per process.
[username@ricc1:~] vi go.sh
#!/bin/sh
#------ qsub option --------#
#MJS: -accelex
#MJS: -proc 2
#MJS: -cwd
#---- Program execution ----#
mpirun -np 2 ./multi.exe
Copyright (C) RIKEN, Japan. All rights reserved.
- 39 -
4.3 Library management
Format of archive command is following. Specify options of the ar command as option.
1. Create an archive for PC Clusters.
[username@ricc1:~] ar –pc cr libarchive.a sub1.o sub2.o sub3.o
ar machine-option option archive [member...]
Copyright (C) RIKEN, Japan. All rights reserved.
- 40 -
4.4 Linkage of Math library
When Math libraries are used by Fujitsu C/C++ compiler, please read cautions in product manuals too.
On how to refer to product manuals, please refer to 0
Manual.
4.4.1 BLAS
Specify –blas option to link BLAS library. Specify –blas_t option to link BLAS library for thread
parallel.
1. Link BLAS library for PC Clusters.
[username@ricc1:~] f77 –pc –blas –o blas.out blas.o
4.4.2 LAPACK
Specify –lapack option to link LAPACK library. Specify –lapack_t option to link LAPACK library for
thread parallel.
1. Link LAPACK library for PC Clusters.
[username@ricc1:~] f77 –pc –lapack –o lapack.out lapack.o
4.4.3 ScaLAPACK
Specify –scalapack option to link ScaLAPACK library. Specify –scalapack_t option to link
ScaLAPACK library for thread parallel.
1. Link ScaLAPACK library for PC Clusters.
[username@ricc1:~] mpif77 –pc –scalapack –o scalapack.out scalapack.o
4.4.4 SSLII
For PC Clusters, SSL II and C-SSL II are available. Specify –SSL2 option to link SSL II or C-SSL II
library.
1. Link an object compiled by Fortran77 for PC Clusters and SSL II library.
[username@ricc1:~] f77 –pc –SSL2 –o ssl2.out ssl2_f.o
2. Link an object compiled by Fortran77 for PC Clusters and SSL II library for thread parallel.
[username@ricc1:~] f77 –pc –SSL2 –o ssl2thread.out ssl2thread_f.o
3. Link an object compiled by C for PC Clusters and SSL II library.
[username@ricc1:~] cc –pc –SSL2 –o ssl2.out ssl2_c.o
4. Link an object compile by C for PC Clusters and SSL II library for thread parallel.
[username@ricc1:~] cc –pc –SSL2 –o ssl2thread.out ssl2thread_c.o
Copyright (C) RIKEN, Japan. All rights reserved.
- 41 -
4.5 Job Freeze Function
On RICC, the Job Freeze Function can save the status of a running job (in a file) that is not completed
before a halt of system operation. When the system operation is restarted, this function restores the job
from the file, and then restarts it.
4.5.1 Jobs as the targets of the Job Freeze Function
Job Freeze applies to the jobs that meet the following condition:
Program compiled by a Fujitsu compiler
To enable the job freeze, the job program must be compiled and linked by Fujitsu compiler and
linked with the job freeze library. (The job freeze library is linked automatically by compilers
described in “4.1.1 Compilation / Linkage for PC Clusters”)
4.5.2 Jobs excluded from the targets of the Job Freeze Function
The Job Freeze Function cannot always freeze all jobs. The jobs and their internal information
described below are excluded from Job Freeze targets. An attempt to freeze or defrost such jobs may
fail. Even if one of the said jobs has been frozen and defrosted successfully, its operation may be
unpredictable.
Job concerned with time
If a job using time information is frozen and defrosted, the time information on the period from the
end of freezing to the end of defrosting will be lost. The same applies to the job that uses a timer
process.
Job concerned with the i node number of file
For example, the i node number that can be acquired with system call stat(2) may be changed
during the period between job freezing and job defrosting.
Job process that standard output is redirected to the file
Since the file is overwritten after job defrosting, job lost the output before job freezing.
Job process that standard input is redirected to the file
Since the file cannot be seek at the file location set before job freezing, job operation after job
defrosting will be unpredictable.
Job cooperated or sharing a resource with an external process
When a job cooperated or sharing a resource with an external process is frozen, the external
status related to the job cannot be saved by job freezing. An example of such jobs is the job that
exchanges data with other jobs via files.
Job using a profiler
When a job uses a profiler, the status of the job cannot be saved because the profiler may
communicate with the processes outside of the job. Freezing of the job using the profiler fails.
Shell scripts
When a job uses script language (perl, python, etc.), the Job Freeze will fail.
Copyright (C) RIKEN, Japan. All rights reserved.
- 42 -
Job using unsupported file system
The Job Freeze Function does not support the following file systems:
/dev, procfs, and namefs, etc.
If the program uses any of these file systems when Job Freeze is performed, the Job
Freeze will fail.
Job opening a directory
Freezing of a job that is currently opening a directory fails.
Job using I/O event notification facility epoll system call
Freezing of a job that is using epoll system call fails.
Interactive job
Interactive batch job
Job not using srun, mpirun, xpfrun
The Job Freeze Function freezes a process launched by srun, mpirun, xpfrun. Other processes
does not frozen.
Copyright (C) RIKEN, Japan. All rights reserved.
- 43 -
Job executing srun, mpirun, xpfrun repeatedly in “for” or “while” sentence
When defrosting a job, the job script is restarted. The Job Freeze Function saves a line number of
the job script and defrosts the job at that point. Therefore, when executing srun, mpirun, xpfrun
repeatedly in “for” or “while” sentence, the Job Freeze Function may not work properly. However,
if the job script is written to save and restore the status at the point of job freezing, the Job Freeze
Function can work properly.
[username@ricc1:~] vi go.sh
#!/bin/bash
#------ qsub option --------#
#MJS: -pc
#MJS: -proc 16
#MJS: -time 10:00:00
#MJS: -eo
#MJS: -cwd
#----- FTL command -----#
#BEFORE: a.out
#BEFORE: input.1
#AFTER: output.*
#---- Program execution ----#
start=1
end=100
if [ -f ${QSUB_REQID}_index ]; then
start=`cat ${QSUB_REQID}_index`
fi
for (( i = $start; i <= $end; i = i + 1 )); do
echo $i > ${QSUB_REQID}_index
mpirun -stdinfile input.${i} ./a.out > output.${i}
cp output.${i} input.$((i+1))
done
rm ${QSUB_REQID}_index
Copyright (C) RIKEN, Japan. All rights reserved.
- 44 -
5. How to execute Job
There are 3 types of Job. (Refer to Table 5-1 Type of Job)。
To batch jobs, necessary resources such as cores and memory for computing are exclusively
allocated. In addition to batch jobs, which do not need to recieve input from terminal, jobs which need
input from terminal can be executed as Interactive batch job.
Interactive job is executed sharing resources for interactive job by time sharing methods.
Job type Purpose Occupancy of
resource Start of execution
Batch job Execute a job as batch type Yes
When resources
are allocated Interactive batch job Execute a job as interactive type
Interactive job
Execute a program (debug etc.)
which is preferred to run
immediately rather than
occupancy of cores / memory.
No(time sharing
with other
interavtive jobs)
Immediate
Table 5-1 Type of Job
Batch job is classified 4 types by submission patern.
Batch job type Purpose Procedure of submission
Normal batch job Execute a job in each script 5.1.1.1 Submit Normal batch job
Chain job Execute a set of Jobs by specified
order 5.1.1.2 Submit chain job
Bulk job Execue Jobs in same script
Jobs that be managed as one job 5.1.1.3 Submit bulk job
Coupled
calculation job
Execute a set of Jobs started at
same time
5.1.1.4 Submit couppled calculation
job
Table 5-2 Type of Batch Job
Copyright (C) RIKEN, Japan. All rights reserved.
- 45 -
5.1 Batch job / Interactive batch job
5.1.1 Submit batch job
5.1.1.1 Submit Normal batch job
Use the qsub command with script file name as argument to submit batch jobs.
Example) Batch job submission
Above message (REQUEST-ID) is displayed at job submission.
If blank characters are included in the current directory path, an error message is displayed. In that
case, please change the directory name including blank characters.
5.1.1.2 Submit chain job
Chain job are executed sequentially job by specified order in submit command line. Two or more jobs
are never executed at the same time.
Specify two or more script files separated by comma (,) without white space to submit chain jobs for
qsub command.
When a job which composed chain jobs is cancelled by qdel command, all subsequent jobs are
cancelled.
Example of chain jobs which use output file for input file of the next job.
Prepare scripts
Prepare scripts which transfer output file (output.x) of the previous job by FTL and use it for input file
of the next job. (go1.sh, go2.sh, go3.sh)
qsub [option] script-file [...]
% qsub go.sh Submit a batch job
Request 123777.jms submitted to MJS.
qsub [option] script-file,script-file[,script-file[,...]]
Copyright (C) RIKEN, Japan. All rights reserved.
- 46 -
Submit chain job
Specify the prepared scripts files separated by a comma without white space.
go2.sh
input file: output.1
output file: output.2
go3.sh
input file: output.2
output file: output.3
#!/bin/sh
#MJS: -proc 8
#MJS: -time 1:00:00
#MJS: -eo
#MJS: -cwd
#BEFORE: a.out
#AFTER: output.1
mpirun ./a.out -o output.1
#!/bin/sh
#MJS: -proc 8
#MJS: -time 1:00:00
#MJS: -eo
#MJS: -cwd
#BEFORE: a.out
#BEFORE: output.1
#AFTER: output.2
mpirun ./a.out -i output.1 -o output.2
#!/bin/sh
#MJS: -proc 8
#MJS: -time 1:00:00
#MJS: -eo
#MJS: -cwd
#BEFORE: a.out
#BEFORE: output.2
#AFTER: output.3
mpirun ./a.out -i output.2 -o output.3
go1.sh
output file: output.1
[username@ricc1:~] qsub go1.sh,go2.sh,go3.sh
Copyright (C) RIKEN, Japan. All rights reserved.
- 47 -
5.1.1.3 Submit bulk job
A bulk job is a structure that allows execution of the same program with the same resources multiple
times but with different input files. A bulk job can be submitted, controlled as a single unit. Each job
(subjob) in the bulk job shares the same bulk ID but has a unique bulk index.
Specify “-B” option and the range of BULK INDEX ID from start number <StartNO> to end number
<EndNO>. Steps of Bulk Index ID can be specified by Step number <StepNO> for qsub command.
It facilitates handling of output and input. The environment variable MJS_BULKINDEX is avalable to
refer bulk index.
A bulk job or a part of sub jobs can be cancelled at once by specifying Bulk ID or Bulk Index ID.
Bulk ID is set to the environment variable MJS_BULKID. Bulk index ID is set to the environment
variable MJS_BULKINDEX. Input or output files can be switched using bulk index ID.
Prepare input files
Prepare input files to use in each subjob.
Prepare script file
Prepare script file for bulk job. The environment variable of each subjob is set to variable
MJS_BULKID and variable MJS_BULKINDEX.
Submit Bulk job
Specify “-B” option as bulk job.
[username@ricc1:~] qsub –B 1-3 go-bulkjob.sh
Bulk Request 145678.jms Submitted to MJS.
Sub job [1] inputfile: input.1
Sub job [2] inputfile: input.2
Sub job [3] inputfile: input.3
#!/bin/sh
#MJS: -proc 8
#MJS: -time 1:00:00
#MJS: -eo
#MJS: -cwd
#BEFORE: a.out
#BEFORE: input.${MJS_BULKINDEX}
#AFTER: output.${MJS_BULKINDEX}
mpirun ./a.out -i input.${MJS_BULKINDEX} -o output.${MJS_BULKINDEX}
qsub -B <StartNO>-<EndNO>[:<StepNO>][option] script-file
Copyright (C) RIKEN, Japan. All rights reserved.
- 48 -
For above-mentioned example, bulk Job is given to bulk ID “145678” and each subjob is given bulk
Index ID ”1”, ”2”, ”3”. The environment variables, input file and output file name of each subjob are the
following.
Bulk ID Bulk Index ID Environment variables Input file name Output file name
145678
1 MJS_BULKID=145678
MJS_BULKINDEX=1 input.1 output.1
2 MJS_BULKID=145678
MJS_BULKINDEX=2 input.2 output.2
3 MJS_BULKID=145678
MJS_BULKINDEX=3 input.3 output.3
5.1.1.4 Submit couppled calculation job
Two or more jobs as coupled calculation job are stared at the same time. Coupled calculation jobs
are not started until all jobs are allocated computing resource.
Specify two or more script files separated by colon (:) to execute coupled calculation jobs (jobs that
start at the same time).
When a job which composed coupled calculation jobs is cancelled by qdel command, the others also
are cancelled.
5.1.1.5 Confirm completion of Batch job
When a job completes, a standard output file and a standard error output file are created in the
directory where the job is submitted.
In standard output file, standard output of executed job is wrote. In standard error output file, error
messages if errors occur are wrote.
[Execution result file of PC Clusters, MDGRAPE-3 Cluster]
Request-name.oXXXXX.jms --- Standard output file
Request-name.eXXXXX.jms --- Standard error output file
(XXXXX is REQUEST-ID displayed at job submission)
[Execution result file of Large Memory Capacity Server]
Request-name.oXXXXX --- Standard output file
Request-name.eXXXXX --- Standard error output file
(XXXXX is REQUEST-ID displayed at job submission)
qsub [option] script-file:script-file:[script-file:[...]]
Copyright (C) RIKEN, Japan. All rights reserved.
- 49 -
Caution
In PC cluster and MDGRAPE-3 Cluster, when some batch jobs which write more than hundreds
megabyte data to standard output/error finish at the same time, it takes long time to transfer standard
output/error files because the load of job scheduler becomes high. The all user's job termination
processes are delayed due to this influence.
Therefore, if the size of standard output/error is large, please redirect them to normal files as below.
Example1: Redirect standard output/error to file "$MJS_REQID.log"
Use bash as login shell
Use tcsh as login shell
Example2: Suppress to write to standard output/error
Use bash as login shell
Use tcsh as login shell
#------- FTL command -------#
#AFTER:0: $MJS_REQID.log ! FTL command
#------- Program Execution -------#
mpirun ./a.out >> $MJS_REQID.log 2>&1 # Redirect output to
# $MJS_REQID.log
#------- FTL command -------#
#AFTER:0: $MJS_REQID.log ! FTL command
#------- Program Execution -------#
mpirun ./a.out >>& $MJS_REQID.log # Redirect output to $MJS_REQID.log
mpirun ./a.out > /dev/null 2>&1
mpirun ./a.out >& /dev/null
Copyright (C) RIKEN, Japan. All rights reserved.
- 50 -
5.1.2 Submit interactive batch job
Use the qsub command with option –i (hyphen i) to execute jobs interactively. At job submission,
specify qsub options as argument of the command.
Example) Interactive batch job submission
If job submission is successful, the system return a notification of submission completion to the prompt
and the job becomes in waiting status on the terminal until the job starts. When resources are allocated,
a notification of start of execution is displayed and resources are available to use. If the user does not
any operation on the terminal for 10 minutes, the user will logout automatically.
If blank characters are included in the current directory path, an error message is displayed. In that
case, please change the directory name including blank characters.
qsub -i [option] [script file]
% qsub -i -pc -proc 4 -mem 4048mb Submit an interactive batch job
Request 123777.jms submitted to MJS. Notification of submission completion
Request 123777.jms start Notification of start of execution
[username@mpc0025~ ] mpirun ./a.out
(output of job execution)
[username@mpc0025~ ] exit Notification of job completion
% End job
Copyright (C) RIKEN, Japan. All rights reserved.
- 51 -
5.1.3 Function outline and Job submit command format
(1) The resources are classified into the following three categories.
A) Basic resources : Number of cores, elapsed time, amount of memory
B) Hardware resources: Resources that depend on hardware
C) Software resources: Resources that depend on software (application such as ISV)
(2) Specify resources following “#MJS:” in scripts.
Example
(3) Don’t use colon (:) and comma (,) in a script file name because they have special meanings.
(4) Chain job, bulk job and Coupled calculation job cannot be used for interactive batch job. cannot
be used for interactive batch job.
(5) A user can submit up to 500 jobs per project.
(6) A user can submit up to 5000 bulk jobs per project including normal jobs.
(7) There are limits of using numbers of cores at the same time.
General Use: Up to 3888 cores per project
Quick Use: Up to 256 cores per project
#MJS: -pc -amber
#MJS: -proc 4 -mem 1024mb
#MJS: -time 12:00:00
Copyright (C) RIKEN, Japan. All rights reserved.
- 52 -
5.1.3.1 Major options for job submission
Major options for job submission command are as follows.
Command option Meaning
-proc <PROCNO> Specify a number of processes (cores) (default: 1)
-thread <THREANO> Specify a number of threads (cores) (default: 1)
-mem[ory] <MEMSIZE>[kb
|mb|gb]
Specify amount of memory per process
default unit: mb (PC Clusters / MDGRAPE-3 Cluster)
gb (Large Memory Capacity Server)
-hdd <HDDSIZE>[kb
|mb|gb]
Specify HDD size per process
default unit: gb
-time <hh:mm:ss> |
<sssss>
Specify running time (elapsed time)
format: hh(hours):mm(minutes):ss(seconds) or sssss(seconds)
default : refer to Table 5-5 Available hardware resource
-eo
Merge standard output and standard error output (default: not
merge)
(*) Invalid for interactive batch job
-oi
Output statistical information of the job to standard output
(default: not output)
(*) In case of interactiv batch job, output statistical information
to a file.
-mb Send an email when a job starts (default: not send)
-me Send an email when a job ends (default: not send)
-mu “email
address” Email address (default: email address in the application form)
-r <REQNAME> Specify a request name (default: script name or STDIN)
-rerun [ Y | N ] Specify if job restarts in case of trouble. Y: restart (default:N)
(*) Invalid for interactive batch job
-chaindel [ Y | N ]
Specify if subsequent jobs are deleted when the chain job ends
abnormally. Y: delete (default: Y)
(*) Invalid for non chain job
-cwd Move to a directory where a job is submitted when a job starts
(default: home directory)
Copyright (C) RIKEN, Japan. All rights reserved.
- 53 -
-comp[iler] <COMPTYPE>
Specify a compiler that generated modules
[fj|intel|gcc|pgi|nvidia]
fj Fujitsu compiler
intel Intel compiler
gcc GNU compiler
pgi PGI compiler(using GPGPU)
nvidia CUDA compiler (using GPGPU)
default: fj (except for Large Memory Capacity Server)
intel (Large Memory Capacity Server)
-para[llel] <PARALLEL>
Specify parallel execution environment that linked modules
[fjmpi|xpf|mpt|pvm]
fjmpi Fujitsu MPI
xpf XPFortran
mpt Message Passing Toolkit
pvm Parallel Virtual Machine
default: fjmpi (except for Large Memory Capacity Server)
mpt (for Large Memory Capacity Server)
-project <PRJ-ID> Specify a project ID (for users who have two or more projects)
-B <N>-<M>[:S]
Submit a batch job as specified number of bulk jobs.
(*) Invalid for interactive batch job
<N> Start number of bulk job
<M> End number of bulk job
<S> Number of steps of bulk job(default: 1)
-bmb Send an email when a first bulk subjob starts (default: not
send) (*) Invalid for non bulk job
-bmab Send an email when all bulk subjobs start (default: not send)
(*) Invalid for non bulk job
-bme Send an email when a first bulk subjob ends (default: not send)
(*) Invalid for non bulk job
-bmae Send an email when all bulk subjobs end (default: not send)
(*) Invalid for non bulk job
-fstype [ftl | share] Presence of FTL specification (default: share)
Table 5-3 Major options for job submit command
Copyright (C) RIKEN, Japan. All rights reserved.
- 54 -
5.1.3.2 Hardware resources
Hardware resources to be specified are as follows.
Hardware resource Computing server system to use
-pc PC Clusters (Massively Parallel Cluster, Multi-purpose Parallel Cluster)
(*)
-mpc Massively Parallel Cluster, Multi-purpose Parallel Cluster)
-upc Multi-purpose Parallel Cluster
-accel Multi-purpose Parallel Cluster (GPGPU)
-accelex Multi-purpose Parallel Cluster (GPGPU: 1node possession)
-ssc Cluster for single jobs using SSD
Table 5-4 Hardware resource list
(*) If the –pc option is specified and no software resource(e.g. –g03, -adf and so on) is specified
when submitting a job to PC Clusters, the job is executed on either Massivelly Parallel Cluster or
Multi-purpose Parallel Cluster.
However, because home area(/home) and data area(/data) are not shared in Massively Parallel
Cluster, files are transferred by FTL (reffer to 6 FTL (File Transfer Language)) at the start/end time.
Therefore, the location of the output file is different between Massively Parallel Cluster and
Multi-purpose Parallel Cluster.
Example:
#!/bin/sh
#MJS: -pc
#MJS: -proc 1
#MJS: -eo
#MJS: -cwd
#FTLDIR: $MJS_CWD
srun ./a.out > output.log
The output.log will be created as follows:
* Executed on Massively Parallel Cluster: $MJS_CWD/REQUEST-ID/output.log.0
* Executed on Multi-purpose Parallel Cluster: $MJS_CWD/output.log
So, specify a hardware resource as follows:
* Execute a job on Massively Parallel Cluster: -mpc
* Execute a job on Multi-purpose Parallel Cluster: -upc
* Execute a job on Massively Parallel Cluster or Multi-purpose Parallel Cluster: -pc
Copyright (C) RIKEN, Japan. All rights reserved.
- 55 -
Number of cores, amount of memory and elapsed time depend on hardware resource.
Hardware
resource
(*1)
Number of available
cores per job(*2)
Max
elapsed
time to
specify(*3)
per process (*4)
Quick
Use
General
Use
amount of memory to
specify
local disk size
to specify
-pc
(PC Clusters)
1~128 1~128 72 H [executed on Massively Parallel Cluster]
default 1,200MB
(max. 9,600MB)
default 40GB
(max.
320GB) 129~256 129~512 24 H
- 513~3803 6 H
[executed on Multi-purpose Parallel
Cluster]
default 2,600MB
(max. 20,800MB)
default 10GB
(max. 80GB)
-mpc
(Massively
Parallel
Clusters)
2~128 2~128 72 H
default 1,200MB
(max. 9,600MB)
default 40GB
(max.
320GB)
129~256 129~512 24 H
- 513~8192 6 H
-upc/-accel
-accelex
(Multi-purpose
Parallel Cluster
/GPGPU (*5) )
1~128 1~128 72 H
default 2,600MB
(max. 20,800MB)
default 10GB
(max. 80GB)
129~256 129~512 24 H
- 513~800 6 H
Table 5-5 Available hardware resource
Caution
(*1) One hardware resource must be specified. (Two or more hardware resources cannot to be
specified.)
(*2) Number of available cores is number of process x number of thread.
(*3) If -time option is omitted when job is submitted, maximum elapse time is set according to the
number of cores assigned to the job
(*4) Amount of memory per process can be specified up to maximum value according to hardware
resource in the table. When amount of memory more than default is specified, use computation
time based on a number of cores occupied according to specified amount of memory.
(Example) In case of executing 2 cores in parallel job which is specified 30GB amount of memory
per process on Large Memory Capacity Server
Specified amount of memory per process 30GB = default amount of memory 15GB x 2
It is equivalent to 2 core’s amount of memory.
Computation time of 2 cores x 2 cores in parallel job = The job uses 4 core’s computation time
Copyright (C) RIKEN, Japan. All rights reserved.
- 56 -
(*5) On GPGPU, when -accelex is specified, the job is executed exclusively occupying a node per
process. So, computation time for cores of the nodes is used regardless of a specified number of
cores.
When -accel is specified, the amount of the occupation of the resource occupies one CPU (for
4cores).
(*6) This cluster is 12 cores per node unlike other clusters. Pleae take care when you spacify a
parallel number.
Massively Parallel Cluster, Multi-purpose Parallel Cluster have 2 CPU (4 core / CPU) per computing
node. Jobs using 1 core share CPU. However, parallel jobs which specified two or more cores occupy
CPU (4 cores). Therefore, if a job occupies more cores than specified, computation time is used
accordingly.
A job using 2 cores
core
Jobs of parallel core always occupy
CPU. Computation time is used
according to number of CPU.
not available to use
Jobs using 1 core CPU(4 core)
core
Jobs using 1 core share CPU.
CPU(4 core)
Copyright (C) RIKEN, Japan. All rights reserved.
- 57 -
5.1.3.3 Software resource
Software resources to be specified are as follows.
Software resource Execution software Computing server system to use
-g03 Gaussian03 Multi-purpose Parallel Cluster Cluster for single jobs using SSD Large Memory Capacity Server
-g09 Gaussian09 Multi-purpose Parallel Cluster Cluster for single jobs using SSD Large Memory Capacity Server
-g03nbo NBO 5.G Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-g09nbo NBO 5.9 Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-g09nbo6 NBO 6.0 Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-adf ADF2013.01 Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-adf2010 ADF2010.02 Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-gamess GAMESS(socket) Massively Parallel Cluster Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-gamess_mpi GAMESS(socket) Massively Parallel Cluster Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-amber8 Amber8 MDGRAPE-3 Cluster
-amber10 Amber10 Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-amber11 Amber11 Multi-purpose Parallel Cluster Cluster for single jobs using SSD Multi-purpose Parallel Cluster (GPGPU)
-amber12 Amber12
-amber14 Amber14
-ansys ANSYS Multi-purpose Parallel Cluster
-clustalw ClustalW Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-blast BLAST Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-hmmer HMMER Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-fasta FASTA Multi-purpose Parallel Cluster Cluster for single jobs using SSD
-cluster3 CLUSTER 3.0 Multi-purpose Parallel Cluster Cluster for single jobs using SSD Large Memory Capacity Server
-qchem Q-Chem 4.1 Multi-purpose Parallel Cluster Cluster for single jobs using SSD
Table 5-6 Software resource list
Copyright (C) RIKEN, Japan. All rights reserved.
- 58 -
Based on specified software resource, number of processes to specify or elapsed time to specify is
different from one according to hardware resource. Available resources to specify are as follows.
(“**”means the same value according to hardware resource as shown in Table 5-5 Available hardware
resource)
Software resource
Hardware resource to specify
Number of processes to specify (*1)
Number of threads to specify (*2)
Max. Elapsed time
Amount of memory to specify
-g03
-g09 -pc/-upc/-ssc 1 ** ** **
-g03nbo
-g09nbo
-g09nbo6
-pc/-upc/-ssc 1 ** ** **
-adf
-adf2010 -pc/-upc/-ssc ** 1 ** **
-gamess -pc/-mpc/-upc/
-ssc ** ** ** **
-gamess_mpi -pc/-mpc/-upc/
-ssc ** ** ** **
-amber10 -pc/-upc/-ssc ** 1 ** **
-amber11 -pc/-upc/-accel/
-accelex/-ssc ** 1 ** **
-amber12 -pc/-upc/-accel/
-accelex/-ssc ** 1 ** **
-amber14 -pc/-upc/-accel/
-accelex/-ssc ** 1 ** **
-clustalw -pc/-upc/-ssc ** 1 ** **
-blast -pc/-upc/-ssc 1 ** ** **
-hmmer -pc/-upc/-ssc 1 ** ** **
-fasta -pc/-upc/-ssc 1 ** ** **
-cluster3 -pc/-upc/-ssc 1 1 ** **
Table 5-7 Available software resource
---- Note ----
(*1) It is a number of processes generated for job execution. It is specified by "-proc" of qsub
option.
(*2) It is a number of threads generated for job execution. It is specified by "-thread" of qsub
option.
(*3) The number of ANSYS Solver license is 1. Therefore, only one job using ANSYS can be
executed at the same time.
Copyright (C) RIKEN, Japan. All rights reserved.
- 59 -
5.1.3.4 Other job properties
Job property Meaning How to specify
-pri Priority to a job
Specify 0 – 65535. Default is 100.
The larger value is the higher priority.
Example: #MJS: -pri 10000
-start_time Time when a job starts (*1)
Format [[YYYY/]MM/DD-]HH:MM
Example:
#MJS:-start_time 2009/10/01-09:00
Table 5-8 Job property list
Caution
(*1) If specified resources cannot be secured by specified time, status changes from WAIT(WIT) to
TIME OVER(TOV). Jobs in TOV status can be deleted but do not start.
5.1.3.5 Major options for job submission command
5.1.3.5.1 Specify a number of processes
A number of cores specified by PROC-NO are allocated for processes for a job. If this option is omitted,
PROC-NO is set to 1. Specify a number of processes to execute a parallel job with interprocess
communication such as MPI program or XPFortran program.
Caution
If PROC-NO x THREAD-NO of "-thread <THREAD-NO>" at the next section exceeds the maximum
number of cores to specify, a job submission error occurs. Please specify a proper number of cores to
submit a job.
5.1.3.5.2 Specify a number of threads
A number of cores specified by THREAD-NO are allocated for threads for a job. If this option is omitted,
THREAD-NO is set to 1. Specify a number of threads to execute a parallel job generating threads.
-proc <PROC-NO>
-thread <THREAD-NO>
Copyright (C) RIKEN, Japan. All rights reserved.
- 60 -
5.1.3.5.3 Specify amount of memory
Amount of memory per process specified by MEMSIZE is secured for job execution. Units which can be
specified are kb, mb or gb (default: mb (PC Clusters/MDGRAPE-3 Cluster), gb (Large Memory
Capacity Server)). Blank characters must not be put between MEMSIZE and a unit. If this option is
omitted, default of amount memory is set. (Please refer to Table 5-4 Hardware resource list.)
(Example 1) -mem 800mb --> 800MegaByte = 800 x 1024KiloByte = 800 x 1024 x 1024Byte
(Example 2) -mem 8gb --> 8GigaByte = 8 x 1024MegaByte = 8 x 1024 x 1024KiloByte
5.1.3.5.4 Specify HDD size (for PC Clusters, MDGRAPE-3 Cluster)
HDD size per process is specified by HDDSIZE for job execution. Units which can be specified are kb,
mb or gb. Blank characters must not be put between HDDSIZE and a unit. This option is for users who
need large size of local disk area. If this option is omitted, default of local disk size is set.
(Example 1) -hdd 2000mb → 2000MegaByte = 2000 x 1024KiloByte = 2000 x 1024 x 1024Byte
(Example 2) -hdd 10gb → 10GigaByte = 10 x 1024MegaByte = 10 x 1024 x 1024KiloByte
5.1.3.5.5 Specify elapsed time
Execute job within specified elapse time. When the job does not end within the elapse time, the job is
forcibly deleted. This prevents the job wasting resources when the job goes into an infinite loop, etc. If
this option is omitted, maximum elapse time is set according to the number of cores assigned to the job
(refer to 5.1.3.2 Hardware resources Table 5-5 Available hardware resource). Elapsed time specified by
ELAPSETIME is set in format of HH:MM:SS (HH: hour, MM: minute, SS: second) or SSSSS (SSSSS:
second).
(Example 1) -time 24:10:10 --> 24 hours 10 minutes 10 seconds
(Example 2) -time 3600 --> 3600 seconds
(Example 3) -time 59:01 --> 59 minutes and 1 second
[Backfill function]
The job scheduler determines priorities of jobs based on the users' usage of resources and determines
the job which starts next. However, the job scheduler starts other low-priority jobs so long as they don't
delay the highest priority job by backfill function. Therefore, it is possible that job starts earlier if proper
ELAPSETIME is specified for the job.
-mem <MEMSIZE>[kb|mb|gb]
-hdd <HDDSIZE>[kb|mb|gb]
-time <ELAPSETIME>
Copyright (C) RIKEN, Japan. All rights reserved.
- 61 -
5.1.3.5.6 Merge standard output and standard error output
Merge a standard error output file with a standard output file. If this option is omitted, a standard error
output file and a standard output file are generated separately.
5.1.3.5.7 Send email at start of job
At the start of the job, an email is sent to address in the application form.
5.1.3.5.8 Send email at end of job
At the end of the job, an email is sent to address in the application form.
5.1.3.5.9 Specify Request name
Execute a job as request name REQNAME. If this option is omitted, request name of a job is script file
name. Blank character must not be included in request name.
5.1.3.5.10 Specify if subsequent jobs are deleted when the chain job ends abnormally
Specify if subsequent jobs are deleted when the chain job ends abnormally (Y: delete, N: not delete).
Caution
A) If the option is omitted, default is –chaindel Y (delete subsequent jobs). However, if –rerun Y
(rerun the job) is specified, default is –chaindel N (execute subsequent jobs).
B) Job submission with both –rerun Y and –chaindel Y (delete subsequent jobs) fails with the
following error message.
qsub: ERROR: 0016: invalid options: cannot enable -chaindel Y and -rerun Y at the same time.
C) This option is valid for chain job. It is ignored for non chain job.
D) It is possible to submit jobs which are specified –chaindel and jobs which are not specified
–chaindel as a chain job.
-eo
-mb
-me
-r <REQNAME>
-chaindel [Y|N]
Copyright (C) RIKEN, Japan. All rights reserved.
- 62 -
5.1.3.5.11 Move to directory where job is submitted when job starts
A Job executes the script in the home directory by default. Specify –cwd option to execute the script in
the directory where the job is submitted.
5.1.3.5.12 Specify project number
Specify a project number for job execution. Project numbers are ID issued when the applications are
permitted by the administrator. (Users who have one project number don't need this option. This option
is for users who have two or more projects.)
A default project number can be specified by the variable MJS_QSUB_PROJECT in the .cltkrc (CLTK
user configuration file) located in home directory.
Example) Edit the .cltkrc file
[username@ricc1:~] vi $HOME/.cltkrc
MJS_QSUB_PROJECT = G00001 <-- Specify a default project number
5.1.3.5.13 Submit bulk jobs
Submit a batch job as bulk jobs. Specify a range of Bulk Index ID by <N>-<M>. Steps of Bulk Index ID
can be specified by S. please refer to “5.1.1.3 Submit bulk job”.
Example) submit 50 sub jobs as bulkjob
[username@ricc1:~] qsub –B 1-50 go.sh
Bulk Request 5381290.jms Submitted to MJS.
[username@ricc1:~] qstat
REQID NAME STAT ELAPSE START-TIME CORE
--------------------------------------------------------------------
5381290[1].jms go.sh RUN 00:00 06/09 14:37 1
5381290[2].jms go.sh RUN 00:00 06/09 14:37 1
5381290[3].jms go.sh RUN 00:00 06/09 14:37 1
5381290[4].jms go.sh RUN 00:00 06/09 14:37 1
5381290[5].jms go.sh RUN 00:00 06/09 14:37 1
5381290[6].jms go.sh RUN 00:00 06/09 14:37 1
5381290[7].jms go.sh RUN 00:00 06/09 14:37 1
5381290[8-50].jms go.sh QUE --:-- --/-- --:-- 1
-cwd
-project <PROJECT-NO>
-B <N>-<M>[:S]
Copyright (C) RIKEN, Japan. All rights reserved.
- 63 -
Example) submit 13 sub jobs as bulkjob with step number
[username@ricc1:~] qsub –B 1-25:2 go.sh
Bulk Request 5381341.jms Submitted to MJS.
[username@ricc1:~] qstat
REQID NAME STAT ELAPSE START-TIME CORE
--------------------------------------------------------------------
5381341[1].jms go.sh RUN 00:00 06/09 14:37 1
5381341[3].jms go.sh RUN 00:00 06/09 14:37 1
5381341[5].jms go.sh RUN 00:00 06/09 14:37 1
5381341[7].jms go.sh RUN 00:00 06/09 14:37 1
5381341[9].jms go.sh RUN 00:00 06/09 14:37 1
5381341[11].jms go.sh RUN 00:00 06/09 14:37 1
5381341[13].jms go.sh RUN 00:00 06/09 14:37 1
5381341[15,17,19,21,23,25].jms go.sh QUE --:-- --/-- --:-- 1
5.1.3.5.14 Send email at start of first bulk subjob
When any one bulk subjob starts, an email is sent to address in the application form.
5.1.3.5.15 Send email at start of all bulk subjobs
When all bulk subjobs start, an email is sent to address in the application form.
5.1.3.5.16 Send email at end of first bulk subjob
When any one bulk subjob ends, an email is sent to address in the application form.
5.1.3.5.17 Send email at end of all bulk subjobs
When all bulk subjob end, an email is sent to address in the application form.
-bmb
-bmab
-bme
-bmae
Copyright (C) RIKEN, Japan. All rights reserved.
- 64 -
5.1.3.6 Execution command
Batch jobs and interactive batch jobs execute execution commands specified after job submission
options and resources to use.
5.1.3.6.1 Execution command for PC Clusters
For PC Clusters, following commands are available.
Execution command (*1) Meaning
srun Serial program
Thread parallel program (maximum number of threads: 8)
mpirun MPI parallel program, Hybrid parallel program (MPI + thread)
xpfrun XPFortran parallel program
---- Note ----
(*1) Options of execution commands are not necessary since number of cores (MPI parallel or thread
parallel) and resources to use are already specified at job submission.
Example 1) Execute a serial program
srun ./serial.out
Example 2) Execute an MPI parallel program
mpirun ./mpi.out
Example 3) Execute an XPFortran program
xpfrun ./xpf.out
5.1.3.6.2 Execution command for MDGRAPE-3 Cluster
The commands of PC Clusters are available.
5.1.3.6.3 Execution command for Large Memory Capacity Server
For Large Memory Capacity Server, following commands are available.
Execution command (*1) Meaning
srun Serial program
Thread parallel program (maximum number of threads: 32)
mpirun MPI parallel program, Hybrid parallel program (MPI + thread)
---- Note ----
(*1)Options of execution commands are not necessary since number of cores (MPI parallel or
thread parallel) and resources to use are already specified at job submission.
Copyright (C) RIKEN, Japan. All rights reserved.
- 65 -
Example 1) Execute a serial program
srun ./serial.out
Example 2) Execute an MPI parallel program
mpirun ./mpi.out
Copyright (C) RIKEN, Japan. All rights reserved.
- 66 -
5.1.3.7 Script file for Batch job
Create scripts by vi or emacs etc. to submit batch jobs. Script files except resource name (hardware
resources and software resources) are available to use among all the computing server systems.
5.1.3.7.1 Script for PC Clusters
Since Massively Parallel Cluster cannot access home area, transfer execution files to computing
nodes' local disk area in advance of job execution by specifying file transfer function (FTL) in scripts.
On FTL, please refer to 6 FTL (File Transfer Language).
We explain a script for a job which needs following resources.
- Number of processes (cores) : 8 cores
- Amount of memory : 1200MB
- Elapsed time : 10 H
- Merge standard error output and standard output : Yes
- Restart the job in case of trouble : Yes
- Move to directory where job is submitted : Yes
Table 5-9 Script for PC Clusters
Don't change "#!/bin/sh" in the first line and "#MJS:" for job options in 3rd to 9th line because these
have special meanings.
[username@ricc1:~] vi go-pc.sh
#!/bin/sh
#------ qsub option --------#
#MJS: -pc Specify hardware resource
#MJS: -proc 8 Specify a number of processes
#MJS: -mem 1200mb Specify amount of memory
#MJS: -time 10:00:00 Specify elapsed time
#MJS: -eo Merge standard output / error output
#MJS: -rerun Y Restart job in case of trouble
#MJS: -cwd Move to directory where job is submitted
#------- FTL command -------#
#FTLDIR: $MJS_CWD Specify file transfer (Note)
#------- Program execution -------#
mpirun ./para.out Execute job
Copyright (C) RIKEN, Japan. All rights reserved.
- 67 -
5.1.3.7.2 Script for Large Memory Capacity Server
We explain a script for a job which needs following resources.
- Number of processes (cores) : 8 cores
- Amount of memory : 30GB
- Elapsed time : 10 H
- Merge standard error output and standard output : Yes
- Restart the job in case of trouble : Yes
- Move to directory where job is submitted : Yes
Note) Files in the directory where the job is submitted are transferred to computing nodes' local disk
area automatically in advance of job execution. Also, files in the directory on computing nodes are
collected after job execution. Except for Massively Parallel Cluster, this command is ignored.
When FTLDIR is used, unnecessary files may be transferred. Also, existence of files is checked after
job execution even though there is no file to be transferred. For large scale parallel jobs, as the costs
may be high, please use BEFORE and AFTER instead of FTLDIR.
For more information on BEFORE and AFTER, please refer to 6 FTL (File Transfer Language).
[username@ricc1:~] vi go-ax.sh
#!/bin/sh
#------ qsub option --------#
#MJS: -ax Specify hardware resource
#MJS: -proc 8 Specify a number of processes
#MJS: -mem 30gb Specify amount of memory
#MJS: -time 10:00:00 Specify elapsed time
#MJS: -eo Merge standard output / error output
#MJS: -rerun Y Restart job in case of trouble
#MJS: -cwd Move to directory where job is submitted
#------- FTL command -------#
#FTLDIR: $MJS_CWD
#------- Program execution -------#
mpirun ./para.out Execute job
Copyright (C) RIKEN, Japan. All rights reserved.
- 68 -
Table 5-10 Script for Large Memory Capacity Server
Don't change "#!/bin/sh" in the first line and "#MJS:" for job options in 3rd to 9th line because these have
special meanings.
Copyright (C) RIKEN, Japan. All rights reserved.
- 69 -
5.1.4 Confirm job information
Use the qstat command to confirm job information.
Option Meaning
(none) Display job information
-d In addition to job information, display directory where a job is submitted
-m In addition to job information, display using amount of memory
-p In addition to job information, display priority
-e Display completion information
-w Display reason for waiting
-project Display job of specified project (for users who have two or more projects)
5.1.4.1 qstat command
Display user's own job list submitted currently.
REQID : REQUEST-ID
* Bulk job (running): "Bulk ID"."Bulk Index ID"
* Bulk job (waiting): "Bulk ID".["start of Bulk Index ID"-"end of Bulk Index ID"]
NAME: REQUEST-NAME (If omitted, request name is script file name)
STAT: Batch job status
( RUN: running, QUE: waiting to run, WIT: waiting to specified start time,
HLD: hold(*1), END:end(*2), TOV:time over to specified start time(*3))
ELAPSE: Elapsed time (HH:MM)
START-TIME: Time when the job starts (MM/DD HH:MM)
CORE: Number of allocated cores
(*1): State where the job is prevented from starting
(*2): Only for coupled calculation jobs
(*3): State where the job is held because job cannot start at specified start time.
qstat [-d|-m|-p|-e|-w|-project] [REQID]
[username@ricc1:~] qstat
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID NAME STAT ELAPSE START-TIME CORE
-------------------------------------------------------------------
12342.jms go.sh RUN 12:34 07/28 12:00 8
12348.jms go.sh QUE --:-- --/-- --:-- 8
12412[1].jms bulk.sh RUN 12:04 07/28 12:30 1
12412[2].jms bulk.sh RUN 12:04 07/28 12:30 1
12412[3-10].jms bulk.sh QUE --:-- --/-- --:-- 1
Copyright (C) RIKEN, Japan. All rights reserved.
- 70 -
5.1.4.2 qstat command (directory where job is submitted)
With –d option, directory where job is submitted is displayed in addition to job information.
[Meaning of additional information]
SUBMIT-DIR: Directory where job is submitted
5.1.4.3 qstat command (amount of memory in use)
With –m option, memory information is displayed in addition to job information. Memory information is
updated periodically. Maximum amount of memory in use in all processes is displayed.
[Meaning of additional information]
MEM: Maximum amount of memory in use
5.1.4.4 qstat command (priority)
With -p option, priority is displayed in addition to job information.
[Meaning of additional information]
PRI: Job priority
[username@ricc1:~] qstat –m
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID NAME STATUS ELAPSE START-TIME CORE MEMORY
--------------------------------------------------------------------
12342.jms go.sh RUN 12:34 07/28 12:00 8 500M
12348.jms go.sh QUE --:-- --/-- --:-- 8 --
[username@ricc1:~] qstat –p
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID NAME STAT ELAPSE START-TIME CORE PRI
--------------------------------------------------------------------
12342.jms go.sh RUN 12:34 07/28 12:00 8 100
12348.jms go.sh QUE --:-- --/-- --:-- 8 100
[username@ricc1:~] qstat -d
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID NAME STATUS ELAPSE START-TIME CORE SUBMIT-DIR
--------------------------------------------------------------------
12342.jms go.sh RUN 12:34 07/28 12:00 8 $HOME/JOB
12348.jms go.sh QUE --:-- --/-- --:-- 8 $HOME/JOB
12412[1].jms bulk.sh RUN 12:04 07/28 12:30 1 $HOME/JOB
12412[2-10].jms bulk.sh QUE --:-- --/-- --:-- 1 $HOME/JOB
Copyright (C) RIKEN, Japan. All rights reserved.
- 71 -
5.1.4.5 qstat command (finished job)
With -e option, list of finished jobs is displayed.
(*) If amount of memory cannot be obtained. "-" is displayed.
[username@ricc1:~] qstat -e
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID NAME STATTIME ENDTIME CORE MEM(*) SUBMIT-DIR
--------------------------------------------------------------------
12321.jms go.sh 07/21 08:00 07/28 14:21 896 500M $HOME/JOB1
4649.ax go.sh 07/25 12:40 07/28 16:04 8 20G $HOME/axjob
12324.jms go.sh 07/28 12:00 07/28 13:09 896 500M $HOME/JOB
12412[1].jms bulk.sh 07/28 12:00 07/28 13:09 1 500M $HOME/JOB
12412[2].jms bulk.sh 07/28 12:00 07/28 13:09 1 500M $HOME/JOB
Copyright (C) RIKEN, Japan. All rights reserved.
- 72 -
5.1.4.6 qstat command (reason for waiting)
For jobs of top 10 priority which are waiting to run, display reasons and estimated time when the jobs
start. Estimated time may be different from actual start time because it depends on other jobs execution
situation, etc.
[Meaning of additional information]
MEM: Specified amount of memory (If not specified, "-" (hyphen) is displayed)
ESTIMATE: Estimated time when the job starts
< 6hrs The job will start in 6 H.
<12hrs The job will start in 12 H.
<24hrs The job will start in 24 H.
< 3days The job will start in 3days.
> 3days The job will end after 3days.
REASON: Reason for waiting to run
Insufficient cores Cores are insufficient
Insufficient memory Amount of memory is insufficient
Insufficient license License is insufficient
Otherjob booking cores Other jobs inhibit the job to run
Chain job Chain job inhibits the job to run
Upper limit of project The limit on number of cores the project could
use at a time inhibits the job to run.
Over specified start time Job cannot run because of over specified
start time
[username@ricc1:~] qstat -w
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID NAME STAT CORE MEM ESTIMATE REASON
--------------------------------------------------------------------
13574.jms go.sh QUE 1024 -- < 6hrs Insufficient cores
13575.jms sim.sh QUE 8 -- > 24hrs Insufficient license
4695.ax go.sh QUE 16 -- < 12hrs Insufficient memory
13577.jms go.sh QUE 1024 -- > 24hrs Insufficient cores
13577.jms go-1.sh QUE 1024 -- > 3days Insufficient cores
13577.jms go-2.sh QUE 1024 -- < 3days Chain job
*****************************************************************************
The estimation time is transitorily changed by the job execution or submission.
*****************************************************************************
Copyright (C) RIKEN, Japan. All rights reserved.
- 73 -
5.1.4.7 qstat command (project number)
Display jobs of specified project number. This function is only available for users who have two or more
projects.
5.1.5 Display standard output / standard error output
Use the qcat command to display submitted script files, running jobs' standard output file or standard
error output file.
Option Meaning
(none) Display running job's standard output file.
-e Display running job's standard error output file.
-o Display running job's standard output file.
-s Display job's script file.
5.1.6 Confirm resource information
Display information on usage of the system or available resources.
Option Meaning
-x Display available resources and limit.
-uc Display usage of cores in the system.
-um Display usage of memory on Large Memory Capacity Server.
qstat [-x|-uc|-um]
qcat [-o|-e|-s] REQID
[usernane@ricc ~] qstat –project G00001
[G00001] Research of RICC <-- Display project number (G00001)
REQID NAME STAT ELAPSE START-TIME CORE
---------------------------------------------------------------------
12342.jms go.sh RUN 12:34 07/28 12:00 8
12348.jms go.sh QUE --:-- --/-- --:-- 8
Copyright (C) RIKEN, Japan. All rights reserved.
- 74 -
5.1.6.1 Display resource information
Display hardware and software resources which the user can specify for each project.
[username@ricc1:~] qstat -x
[Q00001] Study of massively parallel programs on RIKEN Cluster....
H_RESOURCE MAX_CORE/J MAX_CORE/P SUBMIT ELAPSE MEMORY RUN QUEUED
-----------------------------------------------------------------------
pc - 256 10/500 - - 0( 189) 0( 206)
+- mpc 256 - - 72H 10240mb 10( 180) -
+- upc 256 - - 72H 21200mb 0( 6) -
+- ssc 96 - - 72H 43200mb 0( 0) -
S_RESOURCE[pc(mpc)] MAX_PROC/J MAX_THREAD/J ELAPSE MEMORY
-----------------------------------------------------------------------
amber14 - 1 - -
g09 1 - - -
g09nbo 1 - - -
g09nbo6 1 - - -
gamess - - - -
gamess_mpi - - - -
S_RESOURCE[pc(upc)] MAX_PROC/J MAX_THREAD/J ELAPSE MEMORY
-----------------------------------------------------------------------
adf - 1 - -
adf2010 - 1 - -
adf2013 - 1 - -
adf2014 - 1 - -
amber10 - 1 - -
amber11 - 1 - -
amber12 - 1 - -
amber14 - 1 - -
blast 1 - - -
clustalw - 1 - -
cluster3 1 1 - -
fasta - - - -
g03 1 - - -
g03nbo 1 - - -
g09 1 - - -
g09nbo 1 - - -
g09nbo6 1 - - -
gamess - - - -
hmmer 1 - - -
visit 160 8 24H -
visitgpu - - - -
S_RESOURCE[pc(ssc)] MAX_PROC/J MAX_THREAD/J ELAPSE MEMORY
------------------------------------------------------------------------
adf - 1 - -
adf2010 - 1 - -
adf2013 - 1 - -
adf2014 - 1 - -
amber10 - 1 - -
amber11 - 1 - -
amber12 - 1 - -
amber14 - 1 - -
blast 1 - - -
clustalw - 1 - -
cluster3 1 1 - -
Copyright (C) RIKEN, Japan. All rights reserved.
- 75 -
Item Meaning
H_RESOURCE Hardware resource
MAX_CORE/J Max. number of cores to specify per job (default value in
parenthesis)
MAX_CORE/P Max. number of cores the project can use at a time
SUBMIT Number of Submitted jobs / Max. number of jobs the project can
submit
ELAPSE Max. elapsed time to specify per job
MEMORY Max. amount of memory to specify per job (default value in
parenthesis)
RUN Number of running jobs of the project (total value in parenthesis)
QUEUED Number of waiting jobs of the project (total value in parenthesis)
S_RESOURCE[pc(mpc)] Available software resources for Massively Parallel Cluster
S_RESOURCE[pc(upc)] Available software resources for Multi-purpose Parallel Cluster
S_RESOURCE[pc(accel)] Available software resources for Multi-purpose Parallel
Cluster(GPU)
S_RESOURCE[pc(accelex)
]
Available software resources for Multi-purpose Parallel
Cluster(GPU)
MAX_PROC/J Max. number of processes to specify per job
MAX_THREAD/J Max. number of threads to specify per job
Application_NAME Software resource that has limitation on number of use.
USE/MAX Number of using cores / Max. number of available cores
5.1.6.2 Display usage of core
Display usage of cores in the system.
Display current usage of cores in each system. RATIO(USED/ALL) means ratio of use (%), number
of cores in use and max. number of cores.
[username@ricc1:~] qstat -uc
The status of CORE use RATIO(USED/ALL)
------------------------------------------------------------------------
mpc *********************************------- 84.9%(3232/3888)
upc **************************************** 100.0%(0800/0800)
ssc *********************------------------- 54.6%(0118/0216)
Copyright (C) RIKEN, Japan. All rights reserved.
- 76 -
5.1.7 Confirm project user job information
Use prjstat command to confirm job lists of user belongs to same project.
Option Meaning
-r Sort request id
5.1.7.1 prjstat command
Display same project user's job list submitted currently.
Example : Display job list of users who belongs to project G00001
Item Meaning
USER Username submitted jobs
5.1.7.2 prjstat command (display job list in order of request ID)
With -p option, job list is displayed in order of request ID.
prjstat [-r]
[username@ricc1:~] prjstat
[G00001] Research of RICC
USER REQID NAME STAT ELAPSE START-TIME CORE
----------------------------------------------------------------------
userA 1234567.jms go.sh RUN 01:40 02/15 13:40 128
userB 1234599.jms test.sh RUN 01:46 02/15 13:34 8
userC 1234600.jms run.sh QUE --:-- --/-- --:-- 256
userC 1234601.jms run.sh QUE --:-- --/-- --:-- 512
userD 1234301.jms run-d.sh RUN 49:40 02/13 13:34 32
[username@ricc1:~] prjstat -r
[G00001] Research of RICC
REQID NAME USER STAT ELAPSE START-TIME CORE
----------------------------------------------------------------------
1234301.jms run-d.sh userD RUN 49:40 02/13 13:34 32
1234567.jms go.sh userA RUN 01:40 02/15 13:40 128
1234599.jms test.sh userB RUN 01:46 02/15 13:34 8
1234600.jms run.sh userC QUE --:-- --/-- --:-- 256
1234601.jms run.sh userC QUE --:-- --/-- --:-- 512
Copyright (C) RIKEN, Japan. All rights reserved.
- 77 -
5.1.8 Operate job
5.1.8.1 Operate file of running job
Operate files in the local disk area of computing nodes. So, these commands are only for jobs running
on Massively Parallel Cluster.
5.1.8.1.1 Display file list on computing node
Use the qls command to display running job's file list on computing nodes' local disk area.
The ls command's options are available as OPTION.
Specify REQUEST-ID and process number (@RankNO) to display file list in the job execution directory.
Example: Display files in the job execution directory of rank 0 of REQUEST-ID 13562.jms
5.1.8.1.2 Get file of running job
Use the qget command to transfer files of running job on computing nodes' local disk area.
From Login Server, get specified REQUEST-ID and process (@RankNO) 's files on computing nodes'
local disk area to home area.
Example) Get a file (resultfile) of rank 0 of a running job (REQUEST-ID 13579.jms)
qget REQID[@RankNO] SRC .. [DEST]
qls REQID [@RankNO] [OPTION]
[username@ricc ~] qls 13562.jms@0
result_file
exec.tar.gz
go.sh
[username@ricc ~]$ qls 13579.jms -l ← Confirm files in the job
total 48
-rwxr-xr-x 1 username group 46555 Jul 22 13:45 resultfile
[username@ricc ~]$ qget 13579.jms@0 result /tmp ← Execute qget command
[username@ricc ~]$ ls /tmp/result ← Confirm the files are transferred
/tmp/result
Copyright (C) RIKEN, Japan. All rights reserved.
- 78 -
5.1.8.1.3 Put result file
Use the qput command to put files on computing nodes' local disk area to other nodes or home area
Option Meaning
-del Delete source files after putting files.
The qput command can be invoked from script.
Example) Put resultfile to the directory where the job is submitted during the job execution.
qput [OPTION] [@SRC_Rank:] SRC DEST
qput [@SRC_Rank:] SRC @DEST_Rank_LIST
[usernane@ricc ~] vi go-qput.sh
#!/bin/sh
#------ Option Set for qsub command --------#
#MJS: -pc
#MJS: -proc 8
#MJS: -time 10:00:00
#MJS: -rerun Y
#MJS: -cwd
mpirun ./a.out > resultfile
qput resultfile $MJS_CWD ← Copy to the directory where the job is submitted
mpirun ./b.out
Copyright (C) RIKEN, Japan. All rights reserved.
- 79 -
5.1.8.2 Cancel job
Use the qdel command to cancel a job. Use the qd command to cancel two or more jobs interactively.
Option Meaning
(none) Cancel a job
-K Cancel a job and delete standard output / error output file (except for Large
Memory Capacity Server).
-collect Cancel a job and collect files on computing nodes (except for Massively
Parallel Cluster).
5.1.8.2.1 Example of qdel command
Confirm REQUEST-ID of job to delete.
Specify job's REQUEST-ID (REQID) as argument of the qdel command.
If standard output / error output are not necessary on PC Clusters, specify –K option. This is
especially specified to cancel the job in EXITING status.
[username@ricc1:~] qstat
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID NAME STAT ELAPSE START-TIME CORES
--------------------------------------------------------------------
12342.jms go.sh RUN 12:34 07/28 12:00 8
12348.jms go.sh RUN 0:20 07/28- 00:14 8
12356.jms go.sh RUN 0:05 07/28- 00:29 8
12412[1].jms bulk.sh RUN 0:05 07/28 00:29 1
12412[2].jms bulk.sh RUN 0:05 07/28 00:29 1
12348[3-10].jms bulk.sh QUE --:-- --/-- --:-- 1
qdel [-K|-collect] REQID
qd [-K|-collect]
[usernane@ricc ~] qdel 12348.jms
Request 12348.jms has been deleted.
[usernane@ricc ~] qdel 5963.ax
Request 5963.ax has been deleted.
[usernane@ricc ~] qdel -K 12342.jms
Request 12342.jms has been deleted.
Copyright (C) RIKEN, Japan. All rights reserved.
- 80 -
Specify –collect option to cancel the job and collect files the job generated on computing nodes of
PC Clusters.
Specify Bulk ID to delete a whole bulk job.
Specify "Bulk ID"."Bulk Index ID" to delete a bulk job individually.
Specify "Bulk ID"["Bulk Index ID"] to delete 2 bulk jobs or more at the same time.
5.1.8.2.2 Example of qd command
Display submitted job list by the qd command.
Enter NO of the job to cancel. If two or more jobs are to be cancelled, specify them separated by
comma (,) or using hyphen (-). If all jobs are to be cancelled, enter "all". Enter "q" or "quit" to quit
the qd command.
On the qd command, running bulk jobs are handled as a job individually and waiting bulk jobs are
handled as a job for a whole.
[usernane@ricc ~] qdel -collect 12356 .jms
Request 12356.jms is running, and has been signalled.
qd: input NO: 2 3
qd: Are you sure? (yes|no)? yes
Request 12348.jms is running, and has been signalled.
Request 5963.ax has been deleted
qd: Normal end
[username@ricc1:~] qd
NO REQID NAME STAT ELAPSE START-TIME CORES
-------------------------------------------------------------------
1 12342.jms go.sh RUN 12:34 07/28 12:00 8
2 12348.jms go.sh RUN 0:20 07/28 00:14 8
3 12412[1].jms bulk.sh RUN 0:05 07/28 00:29 1
4 12412[2].jms bulk.sh RUN 0:05 07/28 00:29 1
5 12348[3-10].jms bulk.sh QUE --:-- --/-- --:-- 1
qd: input NO:
[username@ricc1:~] qdel 12412.jms
Bulk Request 12412.jms has been deleted.
[username@ricc1:~] qdel 12412[1].jms
Request 12412[1].jms has been deleted.
[username@ricc1:~] qdel 12412[1,3,5-10].jms
Bulk Request 12412[1,3,5-10].jms has been deleted.
Copyright (C) RIKEN, Japan. All rights reserved.
- 81 -
5.1.8.3 Delete completed job information
Specify –e option to delete completed job information.
5.1.8.3.1 Example of qdel command
Display list of completed jobs.
Specify REQUEST-ID (REQID) of job to delete it from list of completed jobs.
(*) 500 completed jobs are preserved.
When a number of ended jobs is 500 or more, oldest job is deleted.
qdel -e REQID
qd -e
[usernane@ricc ~] qdel -e 12321.jms
Request 12321.jms was deleted from jobhistory-file.
[username@ricc1:~] qstat -e
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID REQNAME START-TIME END-TIME CORES MEM SUBMIT-DIR
------------------------------------------------------------------
12321.jms go.sh 07/21 08:00 07/28 14:21 896 500M $HOME/JOB1
12324.jms go.sh 07/28 12:00 07/28 13:09 896 500M $HOME/JOB
Copyright (C) RIKEN, Japan. All rights reserved.
- 82 -
5.1.8.3.2 Example of qd command
Specify –e option as option of the qd command.
Enter NO of the job to cancel. If two or more jobs are to be cancelled, specify them separated by
comma (,), blank character or hyphen (-). If all jobs are to be cancelled, enter "all". Enter "q" or
"quit" to quit the qd command.
5.1.8.4 Alter priority of job
Alter priority of submitted job.
Specify priority from 0 to 65535 following –p option. (default: 100)
The priority of submitted bulkjob can not be changed.
qalter -p <PRIORITY> <REQID>
qd: input NO: 1 2
qd: Are you sure? (yes|no)? yes
Request 12321.jms was deleted from jobhistory-file.
Request 4649.ax was deleted from jobhistory-file.
qd: Normal end
[usernane@ricc ~] qalter -p 200 12343.jms
Request 12343.jms was changed to priority(200).
[username@ricc1:~] qd -e
NO REQID REQNAME START-TIME END-TIME CORES MEM SUBMIT-DIR
-----------------------------------------------------------------------
1 12321.jms go.sh 07/21 08:00 07/28 14:21 896 500M $HOME/JOB1
2 12322.jms go.sh 07/25 12:40 07/28 16:04 8 20G $HOME/JOB2
3 12324.jms go.sh 07/28 12:00 07/28 13:09 128 1.2G $HOME/JOB3
Copyright (C) RIKEN, Japan. All rights reserved.
- 83 -
5.2 Interactive Job
For interactive job, there are limits as follows:
Number of
processes to
specify
Number of
threads to
specify
Max.
Elapsed time to
specify
Amount of
memory to
specify
PC Clusters 32 8 4 hour 2GB
5.2.1 Interactive Job execution for PC Clusters
Use following commands on Login Server to execute interactive job for PC Clusters.
Execution command Meaning
srun Serial program
Thread parallel program (max. number of threads: 8)
mpirun MPI parallel program (max. number of processes: 32)
xpfrun XPFortran parallel program (max. number of processes: 32)
If programs are executed not using above commands, the programs are executed on Login Server and
this may cause adverse effect on the system. When executing interactive jobs, above commands need
to use. Also, to execute programs of script language such as Perl or Python as interactive jobs, please
specify –pc option. Furthermore, if a job requires input from keyboard, please specify –pty to make
buffering of standard output off.
example 1) Execute serial program (execution module) (buffering of standard output is off)
[username@ricc1:~] srun –pty ./serial.out
example 2) Execute serial program (script)
[username@ricc1:~] srun –pc ./serial.pl
example 3) Execute thread parallel program by 4 threads
[username@ricc1:~] srun –thread 4 ./thread.out
example 4) Execute MPI parallel program by 4 processes
[username@ricc1:~] mpirun –np 4 ./mpi.out
example 5) Execute XPFortran parallel program by 4 processes
[username@ricc1:~] xpfrun –np 4 ./mpi.out
Also, ISV applications such as Gaussian, ADF, ANSYS(solver) and etc. cannot be executed by
interactive jobs. Please execute them by batch jobs.
Copyright (C) RIKEN, Japan. All rights reserved.
- 84 -
6. FTL (File Transfer Language)
6.1 Introduction
In RICC, job execution area is different among clusters. For Multi-purpose Parallel Cluster, shared area
is used for variable job execution. For Massively Parallel Cluster, local area of computing nodes is used
to realize fast I/O and reduce as much access load as possible. Therefore, necessary files for job need
to be transferred from home area of Login Server to computing nodes in advance of job running. Also,
computation results need to be transferred from computing nodes to home area of Login Server after
jobs finish. FTL (File Transfer Language) is used for file transfer. FTL commands are embedded within
a script file for job execution or generated to execute ftlgen command (refer to 6.7 FTL generating tool :
ftlgen).
In addition, each process of parallel program cannot be controlled using files since computing nodes
don't share files with each other.
Please specify FTL option in the job scripts. (default: share)
ex1) File transfer by using FTL
#MJS: -fstype ftl
ex2) Do not use FTL option
#MJS: -fstype share
Copyright (C) RIKEN, Japan. All rights reserved.
- 85 -
6.2 Transfer input file
To transfer input files, specify one or more following items.
Input files
RANK-LIST and Computing node's directory (optional)
Specify name of files to transfer as "Input files", rank (0 to number of processes -1) as "RANK-LIST"
and name of computing node's directory as "Computing node's directory".
If "RANK-LIST and Computing node's directory" is not specified, specified input files are transferred to
computing nodes in the same directory configuration of Login Server.
Fig. 6-1 Transfer input files from Login Server to computing nodes
Input files: /home/username/job/input
Rank: All ranks (Number of processes: n)
Login Server
shared area
computing node
rank: 0
rank: 7
Dir:/home/username/job
File:input
t
local area
computing node
rank: n-8
rank: n-1
Dir:/home/username/job
File:input
local area
Dir:/home/username/job
File:input
..
..
.....
Copyright (C) RIKEN, Japan. All rights reserved.
- 86 -
6.3 Transfer input directory
To transfer input directories, specify one or more following items.
Input directories
RANK-LIST and Computing node's directory (optional)
Specify name of directories to transfer as "Input directories", rank (0 to number of processes -1) as
"RANK-LIST" and name of computing node's directory as "Computing node's directory".
All files in the specified input directories are transferred.
If "RANK-LIST and Computing node's directory" is not specified, specified input directories are
transferred to computing nodes in the same directory configuration of Login Server.
Fig. 6-2 Transfer input directory from Login Server to computing nodes
Input directory: /home/username/job/bin
Rank: All ranks (Number of processes: n)
Login Server
shared area
computing node
rank: 0
rank: 7
Dir:/home/username/job/bin
t
local area
computing node
rank: n-8
rank: n-1
Dir:/home/username/job/bin
local area
Dir: /home/username/job/bin
..
.. .....
Copyright (C) RIKEN, Japan. All rights reserved.
- 87 -
6.4 Transfer output file
To transfer output files, specify one or more following items.
Output files
RANK-LIST and Login Server's directory (optional)
Specify name of output files to transfer as "Output files", rank (0 to number of processes -1) as
"RANK-LIST" and name of Login Server's directory as "Login Server's directory".
If "RANK-LIST and Login Server's directory" is not specified, specified output files are transferred to
Login server in the same directory configuration of computing nodes.
Fig. 6-3 Transfer output files from computing nodes to Login Server
Output file: /home/username/job/output
Rank: All ranks (Number of processes: n)
NFS
Dir:/home/username/job
File:output.0
local area
Dir:/home/username/job
File:output.n-1
local area
computing node
rank: 0
rank: 7
computing node
rank: n-8
rank: n-1
Login Server
Dir:/home/username/job
File:output.0
Dir:/home/username/job
File:output.n-1
・・・・・・
shared area
..
.. .....
Copyright (C) RIKEN, Japan. All rights reserved.
- 88 -
If each output file has the same name among computing nodes, it is possible to avoid overwriting by
adding rank number ( the first rank number in a node) to output file name.
Fig. 6-4 Transfer output files (in case of avoidance of overwriting)
Output file: /home/username/job/output
Rank: All ranks (Number of processes: n)
NFS
Dir:/home/username/job
File:output
local area
Dir:/home/username/job
File:output
local area
computing node
rank: 0
rank: 7
computing node
rank: n-8
rank: n-1
Login Server
Dir:/home/username/job
File:output.0
Dir:/home/username/job
File:output.n-8
・・・・・
・
shared area
..
.. .....
Copyright (C) RIKEN, Japan. All rights reserved.
- 89 -
6.5 FTL Basic Directory
There is a simple way to transfer files. Just specifying a directory name where files to transfer are
located as FTL basic directory, the files are transferred as follows:
[before job runs]
Files (not include directories) in the FTL basic directory on Login Server
--> recognized as input files and transferred to computing nodes
[after job ends]
Files (not include directories) in the FTL basic directory on computing nodes
--> recognized as output files and transferred to Login Server
Fig. 6-5 Transfer input files by specifying FTL basic directory
FTL basic directory: /home/username/job
Login Server
shared area
computing node
rank: 0
rank: 7
computing node
rank: n-8
rank: n-1
・・・・
・・・・・
Dir:/home/username/job
File:input.0
local area
Dir:/home/username/job
File:input.n-1
・・・・・・・・
Dir:/home/username/job
File:input.0
local area
Dir:/home/username/job
File:input.n-1
・・・・・・・・
Dir:/home/username/job
File:input.0
Dir:/home/username/job
File:input.n-1
..
..
.....
Copyright (C) RIKEN, Japan. All rights reserved.
- 90 -
Transfer of output files by specifying FTL basic directory works as follows.
1. "ReqID" directory is created in the FTL basic directory on Login Server.
2. Files in the FTL basic directory on computing nodes are transferred into "ReqID" directory on
Login Server.
3. The first rank number in the computing node is added to the output file.(Note 1) (Note 2)
Note 1: It is possible to transfer output files with no rank number. However, the output files are
overwrited if the files are the same name.
Note 2: You can specify which type of files to transfer; newly created files while the job is running,
or updated files while the job is running. If neither of the types is specified, only newly created files
are transferred.
Fig. 6-6 Transfer output files by specifying FTL basic directory
Login Server
shared area
computing node
rank: 0
rank: 7
computing node
rank: n-8
rank: n-1
・・・・
local area
・・・・
・・・・・
local area
・・・・
・・・・・
Dir:/home/username/job/ReqID
File:output.0.0
Dir:/home/username/job/ReqID
File:output.0.n-8
・・・・
・・・・・
・・・・・・・
Dir:/home/username/job/ReqID
File:output.n-1.0
Dir:/home/username/job/ReqID
File:output.n-1.n-8
Dir:/home/username/job
File:output.0
Dir:/home/username/job
File:output.n-1
Dir:/home/username/job
File:output.n-1
Dir:/home/username/job
File:output.0
..
..
.....
Copyright (C) RIKEN, Japan. All rights reserved.
- 91 -
6.6 FTL Syntax
You can select "single line mode" or "multi-line mode" for FTL. Basically, you indicate only one FTL
command in "single line mode" and two or more commands in "multi-line mode". FTL sentense begins
with # character. It is necessary to put # in the first column. If # is not put in the first column, it is not
recognize as FTL sentence.
Syntax of each mode is as follows.
Single line mode
Multi-line mode
Items in FTL command are describes as follows.
files
Specify input / output file name to transfer
Files need to be general files and symbolic links
Files do NOT need to be directory, device, socket or FIFO
Some meta characters are available to use.
Specify relative path from a directory where a batch job is submitted.
Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home
or /data. On FTL variable, please refer to 6.6.12.5 FTL variable.
RANK-LIST
Specify destination of input files and destination of output files by RANK-LIST. For more
information on RANK-LIST, please refer to 6.6.12.6 RANK-LIST
directory
Specify destination directory.
Meta characters are not available.
Specify relative path from a directory where a batch job is submitted.
Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home
or /data. On FTL variable, please refer to 6.6.12.5 FTL variable.
#FTL command: [RANK-LIST[@directory]:] files[, files... ]
#<FTL command>
# [RANK-LIST[@directory]:] files[, files... ]
# [RANK-LIST[@directory]:] files[, files... ]
#</FTL command>
Copyright (C) RIKEN, Japan. All rights reserved.
- 92 -
Restrictions on FTL are followings.
Files which are not contained in /home or /data cannot be specified in FTL command.
Blank character cannot be included in file name and directory name.
Batch job's standard / error output files (extension: .jms) and swap files (extension: .swp) are not
transferred.
Multi-line mode cannot be nested.
Use the ftlchk command to check FTL syntax and existence of file. For more information, please
see ftlchk --man.
Example:
[username@ricc1 ~]$ ftlchk go.sh
=====================
FTL Analysis Result
=====================
Line Type TargetRank Stat SourcePath[Login] DestinationDir[Calc]
11 BEFORE 0-15 o $CWD/a.out --> $CWD
Copyright (C) RIKEN, Japan. All rights reserved.
- 93 -
6.6.1 FTL Syntax (transfer input file)
Transfer input files from Login Server to computing nodes by following syntax.
More than one command can be specified in a script file.
Use a comma as separator to specify multiple files to transfer.
Destination of files is determined by set of RANK-LIST and computing node's directory.
RANK-LIST and directory are optional. If "RANK-LIST and Computing node's directory" is not
specified, specified input files are transferred to computing nodes in the same directory
configuration of Login Server.
6.6.1.1 Single line mode (#BEFORE)
6.6.1.2 Multi-line mode (#<BEFORE> - #</BEFORE>)
#BEFORE: [RANK-LIST[@computing node's directory]:] input file [...]
#<BEFORE>
#[RANK-LIST[@computing node's directory]:] input file [...]
#</BEFORE>
Copyright (C) RIKEN, Japan. All rights reserved.
- 94 -
6.6.2 FTL Syntax (transfer input directory)
Transfer input directories from Login Server to computing nodes by following syntax.
More than one command can be specified in a script file.
Use a comma as separator to specify multiple directories to transfer.
Destination of input directories is determined by set of RANK-LIST and computing node's
directory.
RANK-LIST and computing node's directory are optional. If "RANK-LIST and Computing node's
directory" is not specified, specified input directories are transferred to computing nodes in the
same directory configuration of Login Server.
6.6.2.1 Single line mode (#BEFORE_R)
6.6.2.2 Multi-line mode(#<BEFORE_R> - #</BEFORE_R>)
#BEFORE_R: [RANK-LIST[@computing node's directory]:] input directory [... ]
#<BEFORE_R>
#[RANK-LIST[@computing node's directory]:] input directory [...]
#</BEFORE_R>
Copyright (C) RIKEN, Japan. All rights reserved.
- 95 -
6.6.3 FTL Syntax (transfer output directory)
Transfer output files from computing nodes to Login Server by following syntax.
More than one command can be specified in a script file.
Use a comma as separator to specify multiple files to transfer.
Destination of output files is determined by set of RANK-LIST and Login Server's directory.
RANK-LIST and Login Server's directory are optional. If "RANK-LIST and Login Server's
directory" is not specified, specified output files are transferred to Login Server in the same
directory configuration of computing nodes.
6.6.3.1 Single line mode (#AFTER)
6.6.3.2 Multi-line mode (#<AFTER> - #</AFTER>)
#AFTER: [RANK-LIST[@Login Server's directory]:] output file [...]
#<AFTER>
#[RANK-LIST[@Login Server's directory]:] output file [...]
#</AFTER>
Copyright (C) RIKEN, Japan. All rights reserved.
- 96 -
6.6.4 FTL Syntax (avoid overwrite output file)
Add rank number (first rank number in a node) to output files transferred by the AFTER command using
following syntax. This avoid overwriting output files when output files on computing nodes are the same
name.
This command can be specified only once in a script file.
If this command is not specified, "off" is set for the flag.
This is valid for files (collected from 2 or more computing nodes) specified by the AFTER
command.
There is no multi-line mode.
6.6.4.1 Single line mode (#FTL_SUFFIX)
Item Value Meaning
flag
on Add rank number to output files
off Not add rank number to output files
Table 6-1 FTL_SUFFIX flag
#FTL_SUFFIX: flag
Copyright (C) RIKEN, Japan. All rights reserved.
- 97 -
6.6.5 FTL Syntax (FTL basic directory)
Specify FTL basic directory by following syntax.
This command can be specified only once in a script file.
Only one FTL basic directory can be specified.
There is no multi-line mode for this command
Specify relative path from a directory where a batch job is submitted.
Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home or
/data. On FTL variable, please refer to 6.6.12.5 FTL variable.
6.6.5.1 Single line mode (#FTLDIR)
(note) When FTLDIR is used, unnecessary files may be transferred. Also, existence of files is checked
after job execution even though there is no file to be transferred. For large scale parallel jobs, as the
costs may be high, please use BEFORE and AFTER instead of FTLDIR.
#FTLDIR: FTL basic directory
Copyright (C) RIKEN, Japan. All rights reserved.
- 98 -
6.6.6 FTL Syntax (File collect type of FTL basic directory)
Specify file collect type for output files to transfer from FTL basic directory on computing nodes after the
job finishes.
This command can be specified only once in a script file.
If this command is not specified, "new" is set for File collect type.
There is no multi-line mode for this command
6.6.6.1 Single line mode (#FTL_COLLECT_TYPE)
Item Value Meaning
file collect type
new Collect files which are not transferred at the start of job
mtime Collect updated files only while the job is running
Table 6-2 FTL_COLLECT_TYPE
#FTL_COLLECT_TYPE: file collect type
Copyright (C) RIKEN, Japan. All rights reserved.
- 99 -
6.6.7 FTL Syntax (Avoid adding rank number of FTL basic directory)
Avoid adding rank number (first rank number in a node) to output files transferred by the FTLDIR
command using following syntax. Output files will be overwrote when output files on computing nodes
are the same name.
This command can be specified only once in a script file.
If this command is not specified, "off" is set for the flag.
This is valid for files specified by the FTLDIR command.
There is no multi-line mode.
6.6.7.1 Single line mode (#FTL_NO_RANK)
Item Value Meaning
flag
on Not Add rank number to output files
off Add rank number to output files
Table 6-3 FTL_NO_RANK
#FTL_NO_RANK: flag
Copyright (C) RIKEN, Japan. All rights reserved.
- 100 -
6.6.8 FTL Syntax (Rank Format)
Specify a number of digits of rank number by following syntax.
Specify a number of digits to add RANK-LIST to files with specified number of digits. This is valid
for FTL variable (*1), FTLDIR command and AFTER command (when FTL_SUFFIX is set to
on).
This command is valid for FTL commands (FTL variable, etc.) which have been specified before
this command.
Specify a number from 0 to 9 as a number of digits.
There is no multi-line mode for this command.
(*1): On FTL variable, please refer to 6.6.12.5 FTL variable.
6.6.8.1 Single line mode (#FTL_RANK_FORMAT)
Item Value Meaning
number of
digits
0-9
0: no use of RANK FORMAT
a number of digits of RANK-LIST
Table 6-4 FTL_RANK_FORMAT
RANK-LIST Number of digits:
none specified
Number of digits:
1
Number of digits:
2
Number of digits:
3
1 1 1 01 001
10 10 10 10 010
100 100 100 100 100
Table 6-5 RANK_FORMAT example
#FTL_RANK_FORMAT: number of digits
Copyright (C) RIKEN, Japan. All rights reserved.
- 101 -
6.6.9 FTL Syntax (make directory)
Make directories on all computing nodes running a job before the job starts by following syntax.
This command can be specified only once in a script file.
Use a comma as separator to specify multiple directories to make.
There is no multi-line mode for this command.
Specify relative path from a directory where a batch job is submitted.
Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home or
/data. On FTL variable, please refer to 6.6.12.5 FTL variable.
6.6.9.1 Single line mode (#FTL_MAKE_DIR)
#FTL_MAKE_DIR: directory [...]
Copyright (C) RIKEN, Japan. All rights reserved.
- 102 -
6.6.10 FTL Syntax (statistic information output)
Output statistic information (file transfer time, number of files, file size) of files transferred before and
after the job execution to standard output at the end of job by following syntax.
This command can be specified only once in a script file.
If this command is not specified, "off" is set for the flag.
There is no multi-line mode.
6.6.10.1 Single line mode (#FTL_STAT)
Item Value Meaning
flag
off None of statistic information output
normal Normal mode.
Output statistic information to standard output.
detail
Detail mode.
In addition to Normal mode, output statistic information
of files transferred to each rank.
Table 6-6 FTL_STAT flag
#FTL_STAT: flag
Copyright (C) RIKEN, Japan. All rights reserved.
- 103 -
6.6.10.2 Output format of statistic information
Output items
Item Meaning
ELAPSE(s) Elapsed time of file transfer (unit: second)
FILE_NUM Total number of transferred files
FILE_SIZE(KB) Total size of transferred files (unit: KB)
Table 6-7 Output item of statistic information
Normal mode format
Detail mode format
#=========== FTL STATISTICS INFORMATION =============#
---------------------- BEFORE ----------------------------------
ELAPSE(s) FILE_NUM FILE_SIZE(KB)
------------------------------------------------------------------------
TOTAL 60 30 16384
---------------------- AFTER ----------------------------------
ELAPSE(s) FILE_NUM FILE_SIZE(KB)
------------------------------------------------------------------------
TOTAL 10 30 16384
#=========================================================#
#=========== FTL STATISTICS INFORMATION =============#
---------------------- BEFORE ----------------------------------
ELAPSE(s) FILE_NUM FILE_SIZE(KB)
------------------------------------------------------------------------
TOTAL 60 10 100
RANK: 0-7 60 10 100
---------------------- AFTER ----------------------------------
ELAPSE(s) FILE_NUM FILE_SIZE(KB)
------------------------------------------------------------------------
TOTAL 60 10 10
RANK: 0-7 60 5 5
RANK: 8-15 60 5 5
#=========================================================#
Copyright (C) RIKEN, Japan. All rights reserved.
- 104 -
6.6.11 FTL Syntax (output transferred file information)
Output information of files transferred before the job execution and files created while the job is running
to standard output at the end of job by following syntax.
This command can be specified only once in a script file.
If this command is not specified, "off" is set for the flag.
There is no multi-line mode.
Directory with no files is not displayed.
6.6.11.1 Single line mode (#FTL_INFO)
Item Value Meaning
flag
off None of file information output
before Output information of files transferred before the job
starts.
after Output information of files (including transferred files
before the job starts) created while the job is running.
all
Output information of files transferred before the job
starts and files (including transferred files before the
job starts) created while the job is running.
Table 6-8 FTL_INFO flag
#FTL_INFO: flag
Copyright (C) RIKEN, Japan. All rights reserved.
- 105 -
6.6.11.2 Output format of transferred file information
Output items
Item Meaning
TIME Access time of file (Month Date HH:MM)
SIZE(KB) File size (unit: KB)
FILE_NAME File name
Table 6-9 Output items of transferred file information
Output format
Output file information of each rank. Output format with flag "all" is following. With flag "before"
output is only part of (*1), with flag "after" output is only part of (*2).
#=============== FTL FILE INFORMATION ===============#
------------------- BEFORE ---------------------
[RANK: 0-7]
TIME SIZE(KB) FILE_NAME
--------------------------------------------------------
Jul 16 10:41 14246 /home/username/job/a.out
Jul 24 10:20 361 /home/username/job/input.1
[RANK: 8-16]
TIME SIZE(KB) FILE_NAME
--------------------------------------------------------
Jul 16 10:41 14246 /home/username/job/a.out
Jul 24 10:20 361 /home/username/job/input.2
------------------- AFTER ----------------------
[RANK: 0-7]
TIME SIZE(KB) FILE_NAME
---------------------------------------------------------
Jul 16 10:41 14246 /home/username/job/a.out
Jul 24 10:20 361 /home/username/job/input.1
Jul 24 10:25 361 /home/username/job/output
[RANK: 8-16]
TIME SIZE(KB) FILE_NAME
---------------------------------------------------------
Jul 16 10:41 14246 /home/username/job/a.out
Jul 24 10:20 361 /home/username/job/input.2
#=======================================================#
(*1)
(*2)
Copyright (C) RIKEN, Japan. All rights reserved.
- 106 -
6.6.12 FTL Syntax (others)
6.6.12.1 Comment
Characters after an exclamation mark ! are regards as a comment.
Example
6.6.12.2 Blank line
Blank lines and lines of only # are ignored.
Example
6.6.12.3 Special character
Comma (,), colon (:), equal (=) and exclamation mark (!) are special characters in FTL commands.
Put a backslash before a special character when the special character is included in file name or
directory name.
記述例
#<BEFORE>
#! this line is comment <-- This line is commnet
# a.out ! b.out <-- a.out is transferred, but b.out is not.
#</BEFORE>
#<BEFORE>
# <-- This line is ignored.
<-- This line is igonored.
# a.out
#</BEFORE>
#<BEFORE>
# a¥:b.out <-- transfer a:b.out
#</BEFORE>
Copyright (C) RIKEN, Japan. All rights reserved.
- 107 -
6.6.12.4 Meta character
Following meta characters are available in file name and directory name. However, they are not
available in directory portion of file name.
Meta character Meaning
* Match any (zero or more) characters
? Match any single character
Table 6-10 Meta character list
Example
6.6.12.5 FTL variable
Following FTL variables are available in file name and directory name. However, they are not available
in directory portion of file name and the FTL_MAKE_DIR command.
Using FTL variables, input / output file transfer commands for MPI jobs can be specified easily.
Variable Meaning
$MJS_HOME Home directory path (/home/username)
$MJS_DATA Data directory path (/data/username)
$MJS_CWD directory path where a job is submitted
$MJS_REQID REQUEST-ID.
This is available in file name in the AFTER command and directory name.
$MJS_REQNAME REQUEST-NAME
This is available in file name and directory name.
$MJS_BULKINDEX Bulk Index ID
This is available in file name and directory name.
$MPI_RANK MPI rank (from 0 to number of processes -1)
$XPF_RANK XPF processer identification number (from 1 to number of processes)
Table 6-11 Environment variable list
#<BEFORE>
# input.? input.0, input.1, input3 are transferred.
# a* a.out, a.1, a.2 are transferred
# bin/exe*/a.out Meta characters are not available for directory portion
#</BEFORE>
Copyright (C) RIKEN, Japan. All rights reserved.
- 108 -
Example 1
Example 2
With above BEFORE command, if an MPI program of 16 processes is executed, input files are
transferred to 16 processes. Files (/home/username/input.0 – input.7) are transferred to the
first computing node and files (/home/username/input.8 – input.15) are transferred to the
second computing node as indicated in Fig. 6-7 Example of input file transfer with FTL variable.
Fig. 6-7 Example of input file transfer with FTL variable
Login Server
shared area
computing node
rank: 0
rank: 7
computing node
rank: 8
rank: 15
Dir:/home/username
File:input.0
local area
Dir:/home/username
File:input.7
Dir:/home/username
File:input.8
local area
Dir:/home/username
File:input.15
Dir:/home/username
File:input.0
Dir:/home/username
File:input.7
Dir:/home/username
File:input.15
Dir:/home/username
File:input.8
...
...
...
...
..
.. #<AFTER>
# 0@$MJS_CWD: log/output log/output file on MPI master node is transferred to
the directory where the job is submitted
#</AFTER>
#BEFORE: input.$MPI_RANK
Copyright (C) RIKEN, Japan. All rights reserved.
- 109 -
6.6.12.6 RANK-LIST
Specify destination of input file and source of output file by following descriptions.
If ranks are specified redundantly, they are processed as specified once.
If nonexistent ranks are specified, no file is transferred to the ranks.
If existent ranks and nonexistent ranks are specified at the same time, files are transferred to the
existent ranks but not to the nonexistent ranks.
Item Format Meaning
I 1 File transfer command to rank1 of computing node.
II 1–3 File transfer command to rank1, rank2, rank3 of computing nodes.
III 1,3 File transfer command to rank1 and rank3 of computing nodes.
IV
1-3,5,7
(combination of
Item II, III in this table)
File transfer command to ranks rank1, rank2, rank3, rank5 and
rank 7 of computing nodes.
V * File transfer command to all computing nodes assigned for a job.
VI ALL File transfer command to all computing nodes assigned for a job.
VII MASTER File transfer command to a master node (rank 0) of a job.
Table 6-12 RANK-LIST format
Ranges of RANK-LIST for each job type are following.
Job type Range of RANK-LIST
Serial job 0
MPI parallel job 0 – (number of processes -1)
OpenMP / auto parallel job 0
Hybrid job 0 – (number of processes -1)
Table 6-13 Range of RANK-LIST for Job type
Copyright (C) RIKEN, Japan. All rights reserved.
- 110 -
6.7 FTL generating tool : ftlgen
ftlgen generates FTL command and job submit option line, interactively.
Option Meaning
-chk execute ftlchk command after create FTL command line
-o <filename> Output shell script to file
[example] execute ftlgen command (ftlgen command is available on tab completion)
ftlgen <option>
[username@ricc1:~] ftlgen
MJS: Project id: G00001 ← Specified project-ID
MJS: Number of process(range: 1-8192, default: 1): 256 ← Specified procces
MJS: Number of thread(range: 1-8, default: 1): 1 ← Specified thread
MJS: Merge stderr to stdout ?('y' or 'n', default: 'y'): y
← Marge standard output/error output
MJS: Run on current working directory ?('y' or 'n', default: 'y'): y
← Specified Job execution directory
MJS: Other qsub options: -time 1:00:00 –mem 1.2GB
← Specified other qsub option
MJS: Executable module and command path: a.out
← Specified execution module
FTL(PRE): Are there any input file ?('y' or 'n'): y ← If input file exists
FTL(PRE): Input file or directory: input ← Specified input file
FTL(PRE): Destination rank number(0-255): * ← Specified rank number
FTL(PRE): Transfered Directory: [ENTER](skip) ← Specified destination
directory(If optional, specified Job execution directory)
FTL(PRE): Enter more ?('y' or 'n'): n ← If other input file exists
FTL(POST): Are there any output file ?('y' or 'n'): y ← If output file exists
FTL(POST): Output file: output.log ← Specified transfer output file
FTL(POST): Source rank number(0-255): * ← Specified rank number
FTL(POST): Transfered Directory: outputdir ← Specified destination
directory
FTL(POST): Enter more ?('y' or 'n'): n ← If other input file exists
#!/bin/sh ← output result
#--- qsub options ---#
#MJS: -project V10002
Copyright (C) RIKEN, Japan. All rights reserved.
- 111 -
#MJS: -proc 256
#MJS: -thread 1
#MJS: -eo
#MJS: -cwd
#MJS: -time 1:00:00 –mem 1.2GB
#MJS: -compiler fj
#MJS: -parallel fjmpi
#--- FTL file information ---#
#BEFORE:*: $MJS_CWD/input
#AFTER:*@$MJS_CWD/outputdir: $MJS_CWD/output.log
#BEFORE:*: $MJS_CWD/a.out
#--- Job execution ---#
mpirun a.out
Copyright (C) RIKEN, Japan. All rights reserved.
- 112 -
7. Development Environment
7.1 Endian conversion
7.1.1 Outline of endian
Endian is a method of how to store a number which consists of multiple bytes into memory. For
example, when number 1234 is stored, a method storing 12 into 1st byte and 34 into 2nd byte is called
Big Endian. On the other hand, a method storing 34 into 1st byte and 12 into 2nd byte is called Little
Endian.
7.1.2 Endian type of RSCC and RICC
RICC consists of little endian computers. However, RSCC (RIKEN Super Combined Cluster) consists
of big endian computers and little endian computers and big endian was used for unformatted
WRITE / READ statement. Therefore, please pay attention when reading Fortran's unformatted
output files (big endian) created on RSCC.
System Endian
RSCC Big endian
RICC Little endian
Table 7-1 Endian type of RSCC and RICC
7.1.3 Endian type
7.1.3.1 Fujitsu compiler
Specify runtime option –WI, -T to read or write big endian data by Fujitsu compiler. With –T option,
logical type data, integer type data and IEEE floating-point data are converted from big endian to little
endian in unformatted I/O statements.
example 1) Convert unit number 10 to little endian.
[username@ricc1:~] srun ./serial.out –Wl,-T10
example 2) Convert all unit numbers to little endian.
[username@ricc1:~] srun ./serial.out –Wl,-T
Copyright (C) RIKEN, Japan. All rights reserved.
- 113 -
7.1.3.2 Intel compiler
Specify environment variable F_UFMTENDIAN to read or write big endian data by Intel compiler.
example 1) Convert unit number 10 to little endian.
[username@ricc1:~] export F_UFMTENDIAN=10
[username@ricc1:~] srun ./serial.out
example 2) Convert all unit numbers to little endian.
[username@ricc1:~] export F_UFMTENDIAN=big
[username@ricc1:~] srun ./serial.out
7.2 Debugger
The debugger enables the user to run a program under control of the debugger to verify processing
logic.
The following types of operations can be performed for a serial program and an MPI program of Fortran
and C/C++, and an XPFortran prgram.
The profiler can output following information.
Application execution control
Setting of program execution stop position
Expression and variable evaluation and display
Use of calling stack
7.2.1 The preparation to use debugger
The following two compilation options must be specified when you compile and link programs to debug.
-g
Produce debugging information. If this option is omitted, you cannot diplay the value of variable
and so on.
-Ktl_trt
Link the tool runtime library. This option enables to use debug, profiling and MPI trace functions
at execution of a program. This option is effective by default.
Copyright (C) RIKEN, Japan. All rights reserved.
- 114 -
7.2.2 Start debugger
Use the fdb command (CUI) to launch debugger. For more information on the fdb command, please
refer to man command of fdb. For information on the xfdb(GUI), please refer to “Debugger User’s
Guide”.
[username@ricc1:~] f77 –pc –g –Ktl_trt sample.f
[username@ricc1:~] srun fdb a.out
FDB [Fujitsu Debugger for C/C++ and Fortran] Version 7.0MT/OMP
Please wait to analyze the DEBUG information.
fdb*
Start debugger
fdb* list
5 double INTEGER i
6 read(*,*) i
7 print *,' ** fortran77 output=',i
8 go to (10,20,30) , i
9 print *,' i=0',i
10 go to 90
12 10 print *,' i=',i
13 go to 90
14
fdb* break 10
Insert break point
#1 0x100000ad0 (MAIN__ + 0x118) at line 10 in /home/username/sample.f
fdb* show break
Num Address Specify Stop? Where
#1 0x0000000100000ad0 Enable Yes (MAIN__ + 0x118) at line 10 in
/home/username/sample.f
fdb* p i
insert print cmd (variable i)
Result = 123
fdb* c
Continue program: a.out
The program: a.out terminated.
Copyright (C) RIKEN, Japan. All rights reserved.
- 115 -
8. Tuning
8.1 Tuning overview
Modifying program to finish execution faster is called tuning. A series of work of collecting tuning
information, performance evaluation/analysis, modifying source code and performance measurement
etc. is done for the tuning of the program.
At first, find the part where a lot of execution time is spent in the program. Generally, a big tuning effect
is achieved by speeding up the part.
There are following methods to get execuction time information.
Call the subroutines which get time information in programs
Use the option of a batch job which collects statistics information
Use the profiler
Fig 8-1 Tuinig overview
8.2 Time measurement
8.2.1 Fortran program
The CPU_TIME sub routine returns CPU processing time by the second.
example 1) Invoke the CPU_TIME sub routine
Collecting tuning information
Performance evaluation/analysis
Tuning
(change compile option,
modify source code)
iteration
Copyright (C) RIKEN, Japan. All rights reserved.
- 116 -
real(kind=8) start_time, stop_time
...
call cpu_time(start_time)
...portion to be measured
call cpu_time(stop_time)
write(6,*) "time = ", stop_time – start_time
8.2.2 C program
The clock function returns approximate value of processing time.
example 1) Invoke the clock function
#include <time.h>
clock_t start_time, stop_time;
start_time = clock();
...portion to be measured
stop_time = clock();
printf("time = %10.30f¥n", (double)(stop_time - start_time) /
CLOCKS_PER_SEC;
8.2.3 MPI program
Use the MPI_Wtime function to measure elapsed time. Invoke the MPI_Wtime function before and
after portion to be measured. Time difference between them is the elapsed time.
example 1) Invoke the MPI_Wtime function (Fortran)
real(kind=8) start_time, stop_time
...
call mpi_barrier(mpi_comm_world, ierr)
start_time = mpi_wtime()
....portion to be measured
call mpi_barrier(mpi_comm_world, ierr)
stop_time = mpi_wtime()
if (myrank .eq. 0) then
write(6,*) “time = “, stop_time – start_time
end if
example 2) Invoke the MPI_Wtime function (C)
Copyright (C) RIKEN, Japan. All rights reserved.
- 117 -
double start_time, stop_time;
...
MPI_Barrier(MPI_COMM_WORLD, ierr);
start_time = MPI_Wtime();
....portion to be measured
MPI_Barrier(MPI_COMM_WORLD, ierr);
stop_time = MPI_Wtime();
if (myrank == 0) {
printf(“time = %lf”, stop_time – start_time);
}
8.2.4 System resource statistics information of batch job
By specify “-oi” or “-OI” option when submitting a jobs, the summary information and resource
information per each computing node is written to standard output file.
[username@ricc1:~] cat go.sh
#!/bin/sh
#MJS: -proc 8
#MJS: -cwd
#MJS: –eo
#MJS: -oi
#MJS: -time 1:00:00
#BEFORE: a.out
mpirun ./a.out
Copyright (C) RIKEN, Japan. All rights reserved.
- 118 -
example 1) Used resource information (the standard output of batch request)
[username@ricc1:~] cat go.sh.o2733417.jms
(略)
Allocated Resource <- allocate resource of entire job
Virtual Nodes : 8 Node
Before Free Memory
Total Large Page Memory : 0 Mbyte
Total Normal Page Memory : 10737418240 Byte
After Free Memory
Total Large Page Memory : 0 Mbyte
Total Normal Page Memory : 10737418240 Byte
CPUs : 8 CPU
Inter-Node Barrier : 0 Unit
Execmode : CHIP_SHare
Elapse time limit : 3600.000 sec
Used Resource <- used resource of entire job
Total System CPU Time : 463 msec
Total User CPU Time : 470516 msec
Total Large Page Memory : 0 Mbyte
Total Normal Page Memory : 342716416 Byte
CPUs : 8 CPU
Inter-Node Barrier : 0 Unit
---------------------------------------
Virtual Node Information : NODE : mpc0448 <- computing node
Archi Information : PG
Allocated Resource <- allocate resource per process
Before Free Memory
Large Page Memory : 0 Mbyte
Normal Page Memory : 1342177280 Byte
After Free Memory
Large Page Memory : 0 Mbyte
Normal Page Memory : 1342177280 Byte
Free memory time : 0 msec
CPUs : 1 CPU
CPU time limit : UNLIMITED
Used Resource <- used resource per process
Large Page Memory : 0 Mbyte
Normal Page Memory : 50913280 Byte
CPUs : 1 CPU
CPU Time System time User time
Max CPU Time : 236 msec 59426 msec
Total CPU Time : 236 msec 59426 msec
SBID ChipID CPUID System time User time
0 0 0 : 236 msec 59426 msec
Copyright (C) RIKEN, Japan. All rights reserved.
- 119 -
8.3 Program development support tool
8.3.1 Fujitsu compiler
8.3.1.1 Profiling function
The profiler is available for profiling function. The profiler is a tool for collecting information on
application performance. To improve application performance, it is a usual and effective method to find
the location where much execution time is consumed and speed it up.
The profiler can output following information.
Time statistic information
Elapsed time, breakdown of user CPU time / system CPU time , etc.
Interprocess communication information
Time of interprocess communication and waiting to synchronize by MPI and XPFortran
MPI library elapsed time information
Elapsed time to execute MPI library
8.3.1.2 Collect profiling data
Use the srun/mpirun/xpfrun command with option –prof or –profopt to collect profiling data.
With these options, the srun/mpirun/xpfrun commands invoke the fpcoll command internally.
For more information on the fpcoll command, please refer to Profiler User's Guide. Use –profopt
option to specify argument to the fpcoll command.
In Interactive jobs, profiling data collection and profiler information output can be performed at the same
time. In batch jobs, since Massively Parallel Cluster does not have shared area, profiling data collection
and profiler information output cannot be always performed at the same time. In that case, it is
necessary to perform profiling data collection and profiler information output separately.
Copyright (C) RIKEN, Japan. All rights reserved.
- 120 -
example 1) Execute serial job by interactive job (-prof option)
[username@ricc1 ~]$ srun -prof ./stream
(execution result is skipped)
Fujitsu Performance Profiler Version 3.1
Measured time : Thu Jul 30 01:11:06 2009
CPU frequency : Process 0 2933 (MHz)
Type of program : SERIAL
Average at sampling interval : 11.0 (ms)
Measured range : All ranges
--------------------------------------------------------------
______________________________________________________________
Time statistics
Elapsed(s) User(s) System(s)
---------------------------------------------
28.4038 28.2500 0.1100 Application
---------------------------------------------
28.4038 28.2500 0.1100 Process 0
_________________________________________________________________
Procedures profile
**************************************************************
Application - procedures
**************************************************************
Cost % Start End
--------------------------------------------
2569 100.0000 -- -- Application
--------------------------------------------
2559 99.6107 127 285 main
10 0.3893 337 399 checkSTREAMresults
____________________________________________________________________
Lines profile
*****************************************************************
Application - lines
*****************************************************************
Cost % Line
-----------------------------------
2569 100.0000 -- Application
-----------------------------------
629 24.4842 251 main
Copyright (C) RIKEN, Japan. All rights reserved.
- 121 -
example 2) Execute MPI parallel job by interactive job (-prof option)
Fujitsu Performance Profiler Version 3.1
Measured time : Thu Jul 30 01:15:59 2009
CPU frequency : Process 0 2933 (MHz)
Type of program : MPI
Average at sampling interval : 11.0 (ms)
Measured range : All ranges
-------------------------------------------------------------
_____________________________________________________________
Time statistics
Elapsed(s) User(s) System(s)
---------------------------------------------
1.0764 0.9159 0.0800 Application
---------------------------------------------
1.0764 0.9159 0.0800 Process 0
_________________________________________________________________
Communication profile
Elapsed(s) Communication(s) %
--------------------------------------------
1.0764 0.7343 68.2220 Application
--------------------------------------------
1.0764 0.7343 68.2220 Process 0
Send + Put
+--------------------------------------------------+
| ##########################| 52 % Process 0
+--------------------------------------------------+
Percentage of time waiting for a send and put
Received + Get
+--------------------------------------------------+
| ########| 16 % Process 0
+--------------------------------------------------+
Percentage of time waiting for a received and get
_________________________________________________________________
Procedures profile
**************************************************************
Application - procedures
**************************************************************
Cost % Start End
--------------------------------------------
83 100.0000 -- -- Application
--------------------------------------------
51 61.4458 -- -- __GI_memcpy
9 10.8434 369 397 IMB_ass_buf
7 8.4337 -- -- memcpy_nts_asm64a
3 3.6145 -- -- _LowLevel_MutexUnlock
2 2.4096 -- -- _LowLevel_Exchange4
1 1.2048 -- -- intra_Reduce
1 1.2048 -- -- mpigfc_
1 1.2048 -- -- PMPI_Sendrecv
1 1.2048 -- -- _GMP_StopSendTimer
1 1.2048 -- -- _GMP_Send
_________________________________________________________________
Loops profile
**************************************************************
Application - loops
**************************************************************
Copyright (C) RIKEN, Japan. All rights reserved.
- 122 -
example 3) MPI parallel job by batch job
1. Specify the fpcoll command's option as the –profopt option to collect profiling date. Items of
profiling date can be specified by the -I option. Profiling date is created in a directory specified by
the –d option.
When executing an application on Massively Parallel Cluster, transfer profiling data to Login Server
by FTL.
$ cat go.sh
#!/bin/sh
#------- qsub option -------#
#MJS: -pc
#MJS: -proc 64
#MJS: -eo
#MJS: -time 10:00
#MJS: -cwd
#------- FTL command -------#
#BEFORE: a.out
#AFTER: ALL@${MJS_REQID}_prof:profile-data/*
#------- Program Execution -------#
mpirun -profopt "-C -Icpu,mpi -d profile-data" ./a.out
2. Use the fprof command to display profiler information. Items of profiling date to display can be
specified by the -I option. Specify the directory of profiling data by the -d option.
$ fprof -Impi -d 1417379.jms_prof
Copyright (C) RIKEN, Japan. All rights reserved.
- 123 -
--------------------------------------------------------------------
Fujitsu Performance Profiler Version 3.1
Measured time : Wed Sep 2 15:50:18 2009
CPU frequency : Process 0 - 63 2933 (MHz)
Type of program : MPI
Average at sampling interval : 11.0 (ms)
Measured range : All ranges
---------------------------------------------------------------------
_____________________________________________________________________
Time statistics
Elapsed(s) User(s) System(s)
---------------------------------------------
59.3825 3701.1963 16.8100 Application
---------------------------------------------
59.3825 58.6720 0.1700 Process 14
59.3759 55.9310 0.3200 Process 25
59.3754 57.2190 0.4700 Process 36
59.3744 58.6740 0.1500 Process 50
59.3743 58.6620 0.1700 Process 13
59.3710 56.6740 0.2800 Process 42
59.3691 55.9420 0.4100 Process 24
59.3690 57.7630 0.2100 Process 34
59.3689 57.6010 0.2200 Process 18
59.3680 57.7970 0.3200 Process 8
59.3665 58.6170 0.2100 Process 48
59.3649 55.7580 0.4000 Process 27
59.3621 57.5100 0.2800 Process 32
59.3618 57.5390 0.2900 Process 16
59.3611 58.5850 0.2500 Process 12
59.3609 58.5590 0.1700 Process 47
_____________________________________________________________________
MPI libraries profile - based on the user procedure.
*********************************************************************
Application - MPI libraries
*********************************************************************
Elapsed(s) % Call to
---------------------------------------
59.3825 ---.---- ------------ Application
---------------------------------------
3.2200 5.4225 45312 jacobi_ ( 199 - 250)
2.6808 4.5144 226560 sendp1_ ( 577 - 629)
1.9381 3.2638 226560 sendp2_ ( 521 - 573)
0.7023 1.1827 226560 sendp3_ ( 465 - 517)
0.3752 0.6318 512 initcomm_ ( 254 - 332)
0.0840 0.1415 576 MAIN__ ( 38 - 142)
0.0000 0.0000 384 initmax_ ( 336 - 440)
Copyright (C) RIKEN, Japan. All rights reserved.
- 124 -
8.4 Network topology
Network topology for Massively Parallel Cluster, Multi-purpose Parallel Cluster and MDGRAPE-3
Cluster is fat-tree topology, which consists of 60 leaf switches connecting each computing node and 2
spine switches connecting each leaf switch (refer to Fig. 8-2 Network topology outline diagram).
Each leaf switch has 24 ports. 20 of them are connected to computing nodes and 4 of them are
connected to spine switches. Therefore, when 20 computing nodes connected to the same leaf switch
are concurrently communicating computing nodes connected to the other switches, communication
data of the 20 computing nodes is needed to transfer using 4 InfiniBand cables, and network bandwidth
can be limited up to 1/5.
Fig. 8-2 Network topology outline diagram
Job scheduler of RICC minimize the number of leaf switches connecting computing node, and allocates
the parallel job. However, allocated computing node might be distributed to more reef switches in a
situation where system usage ratio is high because computing nodes allocated to a job executed next
depend on the jobs which finished previously
This difference of allocateion computing nodes may not have an impact on normal job execution but it
may have an impact on job execution of high communication load such as network communication
benchmark test.
Spine switch
Leaf switch Leaf switch
Leaf switch
Leaf switch
Spine switch
Compute node x 20 Compute node x 20
Compute node x 20
Compute node x 20
・・・
・・・ ・・・ ・・・ ・・・ x 20
x 120 x 120
x 2
x 2
x 2 x 2 x 2 x 2 x 2 x 2
x 20 x 20 x 20
InfiniBand
Copyright (C) RIKEN, Japan. All rights reserved.
- 125 -
9. How to use Archive system
To transfer files to Archive system over network, use the file transfer special commnads (pftp, hsi
and htar).
* pftp is an extended command of ordinary ftp. A usage method of pftp is the same as ftp.
* hsi is an extended command of ordinary pftp. It is possible to transfer the directory.
* htar is an extended command of ordinary tar. A usage method of htar is the same as tar.
* Size of a file is restricted to 1.22TB on pftp, hsi and htar. When transferring files to Archive system
by htar, all transferred files are archived as one htar format file. Therefore, total size of transferred
files must be less than 1.22TB.
9.1 Configuration
If you use the hsi and the htar for the first time in RICC or you use them after RICC password is
updated, use the arc_keytab command to gererate Keytab file for authentication.
You don't need to generate Keytab file from the next time.
Example:
[username@ricc:~] arc_keytab
Getting a KEYTAB file for user: username
Please wait ....
...............
A KEYTAB file was generated successfully.
As above example, if "successfully" is displayed, configuration completes.
Copyright (C) RIKEN, Japan. All rights reserved.
- 126 -
9.2 pftp
9.2.1 Get file
9.2.1.1 Login
[username@ricc:~] pftp arc
Using /opt/hpss/etc/HPSS.conf
Connected to arc.
220 hpcore FTP server (HPSS 7.1 PFTPD V1.1.1 Tue Jan 19 07:16:29
JST 2010) ready.
Parallel stripe width set to (1).
Name (arc:username): Enter return key
331 Password required for username.
Password:********* Enter RICC password
230 User username logged in as [email protected]
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> Login completed
9.2.1.2 Transfer file
ftp>pget file_name Enter file name
remote: file_name local: file_name
200 Command Complete (4104704, "file_name", 0, 1, 4194304, 0).
200 Command Complete.
150 Transfer starting.
226 Transfer Complete.(moved = 4104704).
4104704 bytes received in 0.1400 seconds (27.961 MBytes/sec)
200 Command Complete.
9.2.1.3 Confirm transferred file and Logout
ftp> !ls –la Confirm transferred files
-rw-r--r-- 1 username groupname 4104704 Jul 28 20:40 file_name
ftp> bye Logout
221 Goodbye.
[username@ricc:~]
Copyright (C) RIKEN, Japan. All rights reserved.
- 127 -
9.3 hsi
9.3.1 Get file
9.3.1.1 Login
[username@ricc:~] hsi
Username: username UID: UID Acct: UID(UID) Copies: 1 Firewall: off
[hsi.3.5.3 Wed Jan 20 07:32:04 JST 2010]
A:[RICC]/home/username-> Login completed
9.3.1.2 Get file
A:[RICC]/home/username-> get -R testdir Enter file name
get '/home/username/testdir/testfile1' : /home/username/testdir/testfile1'
(2009/07/28 20:57:57 1048576 bytes, 7050.6 KBS )
get '/home/username/testdir/testfile2' :'/home/username/testdir/testfile2'
(2009/07/28 20:57:58 1048576 bytes, 10074.9 KBS )
get '/home/username/testdir/testfile3' :'/home/username/testdir/testfile3'
(2009/07/28 20:57:58 1048576 bytes, 17090.1 KBS )
9.3.1.3 Confirm got file and Logout
A:[RICC]/home/username-> !ls -l testdir Confirm got file
total 6144
-rw------- 1 username groupname 1048576 Jul 28 21:03 testfile1
-rw------- 1 username groupname 1048576 Jul 28 21:03 testfile2
-rw------- 1 username groupname 1048576 Jul 28 21:03 testfile3
A:[RICC]/home/username-> quit Logout
[username@ricc:~]
Copyright (C) RIKEN, Japan. All rights reserved.
- 128 -
9.4 htar
Followings are the restrictions of the htar command.
Size of a member file is up to 68,719,476,735(64G-1) byte.
Number of member files in a tar file is up to 1 milion.
Directory name is up to 154 characters, file name is up to 99 characters when path name of a
member file is divided into directory name / file name.
(Example) Path name : /home/username/dir1/dir2/test.data
Directory name: /home/username/dir1/dir2
File name : test.data
Link name of a symbolic link is up to 99 characters.
9.4.1 Confirm put file
Confirm contents of tar file with –tf option.
[username@ricc:~] htar -tf test.tar
......
HTAR: -rw-r--r-- username/groupname 1252 2004-06-18 09:45 work/test1
HTAR: -rw-r--r-- username/groupname 3390 2004-03-04 11:56 work /test2
HTAR: -rw-r--r-- username/groupname 20932 2004-11-09 17:49 work /test3
HTAR: HTAR SUCCESSFUL
To confirm tar file name, login with the hsi command and then use the ls command.
9.4.2 Get file
A usage method is the same as the tar command. Extract files with –xf option.
[username@ricc:~] htar -xf test.tar
HTAR: HTAR SUCCESSFUL
Files are extracted in the current directory.
Copyright (C) RIKEN, Japan. All rights reserved.
- 129 -
10. RICC Portal
10.1 RICC Portal
10.1.1 URL to access
On RICC Portal, users can operate files on Login Server, compile, link programs and submit jobs for all
computing server system using web interface.
Access the following URL to login RICC Portal
10.1.2 How to Login
In the following login window, enter RICC user account, RICC password and then click LOGIN button.
After authentication completed, RICC Portal is available.
https://ricc.riken.jp
https://ricc.riken.jp
2. Select Client Certification
Click [OK]
Copyright (C) RIKEN, Japan. All rights reserved.
- 130 -
Fig. 10-1 RICC login window
On usage of RICC Portal, please click help icons of functions or refer to online manual on RICC Portal.
11. Manual
Access RICC Portal from Web browser. On accessing RICC Portal, please refer to 10 RICC Portal.
After login, click links in [Documentation] in the left of menu to refer online manuals.
Available manuals are listed at next section.
1. Enter RICC user account
2. Enter RICC password
3. Click [LOGIN]
Copyright (C) RIKEN, Japan. All rights reserved.
- 131 -
Fig. 11-1 RICC Portal online manual window
1. Click [MAIN]
2. Click
[Documentation]
[Product Manual]
-> Product manuals
are available to refer
Copyright (C) RIKEN, Japan. All rights reserved.
- 132 -
11.1 Product manual
11.1.1 Common
RICC Portal User's Guide
11.1.2 Language
Fortran User's Guide
Fortran Language Reference
Fortran Compiler Messages
Fortran Runtime Messages
C User's Guide
C++ User's Guide
C++ Compiler Feature
XPFortran User's Guide
MPI User's Guide
11.1.3 Programming Tools
Debugger User's Guide
MPI Tracer User's Guide
Programming Workbench User's Guide
Profiler User's Guide
11.1.4 Scientific Subroutine Library II (SSL II)
List of Subroutines
How to use SSL II
How to link-edit SSL II
SSLII User's Guide
SSLII Extended Capabilities User's Guide
SSLII Extended Capabilities User's Guide II
How to compile (Thread-Parallel Capabilities)
Thread-Parallel Capabilities User's Guide
How to compile(C language)
SSLII User's Guide(C language)
How to use C-SSL II
How to compile Thread-Parallel Capabilities (C language)
Thread-Parallel Capabilities User's Guide(C language)
How to compile (MPI)
MPI User's Guide
11.1.5 BLAS LAPACK ScaLAPACK
User's Guide
Copyright (C) RIKEN, Japan. All rights reserved.
- 133 -
11.1.6 Intel Compiler
Fortran User's Guide
C User's Guide
Math Kernel Library(MKL) User's Guide
11.1.7 PGI Compiler
PGI User's Guide
PGI Tools Guide
PGI Fortran Reference
11.1.8 Message Passing Toolkit (MPT)
User's Guide
Copyright (C) RIKEN, Japan. All rights reserved.
- 134 -
Appendix
Copyright (C) RIKEN, Japan. All rights reserved.
- 135 -
1. FTL Examples
Job execution scripts using FTL are introduced in this appendix. FTL commands are in bold.
Following environment variables are used in this appendix.
Environment variable Value
$MJS_CWD /home/username/job
$MJS_DATA /data/username
$MJS_REQID REQUEST-ID of job
Copyright (C) RIKEN, Japan. All rights reserved.
- 136 -
1.1 Execute serial job
1.1.1 sample 1 (transfer output file to job execution directory)
Content of job execution
Execute execution module a.out. Transfer output file to job execution directory.
Item Value Remark
Job execution directory $MJS_CWD
Execution module a.out
Input file / directory (none)
Output file output
Destination of output file $MJS_CWD
Job execution script
#!/bin/sh
#MJS: -proc 1 -eo
#MJS: -cwd
#BEFORE: a.out
srun ./a.out
#AFTER: output
Copyright (C) RIKEN, Japan. All rights reserved.
- 137 -
Transfer input file
Transfer output file
Host: Login Server
$MJS_CWD
a.out
transfer
Host: node 0 (rank 0)
$MJS_CWD
a.out
Host: Login Server
$MJS_CWD
output
transfer
Host: node 0 (rank 0)
$MJS_CWD
output
Copyright (C) RIKEN, Japan. All rights reserved.
- 138 -
1.1.2 sample 2 (Transfer output file to /data)
Content of job execution
Execution module a.out. Transfer output file to /data/username/data.
Item Value Remark
Job execution directory $MJS_CWD
Execution module a.out
Input file (none)
Output file output
Destination of output file $MJS_DATA/data
Job execution script
#!/bin/sh
#MJS: -proc 1 -eo
#MJS: -cwd
#BEFORE: a.out
srun ./bin/a.out
#AFTER: 0@${MJS_DATA}/data: output
Copyright (C) RIKEN, Japan. All rights reserved.
- 139 -
Transfer input file
Transfer output file
Host: Login Server
$MJS_DATA/username/data
output
transfer
Host: node 0 (rank 0)
$MJS_CWD
output
Host: Login Server
$MJS_CWD
a.out
transfer
Host: node 0 (rank 0)
$MJS_CWD
a.out
Copyright (C) RIKEN, Japan. All rights reserved.
- 140 -
1.1.3 sample 3 (Transfer directory)
Content of job execution
Transfer directory which has necessary files for job execution.
Execution module a.out. Transfer output file to job execution directory.
Item Value Remark
Job execution directory $MJS_CWD
Transfer directory bin
Input file (none)
output file output
Destination of output file $MJS_CWD
Job execution script
#!/bin/sh
#MJS: -proc 1 -eo
#MJS: -cwd
#BEFORE_R: bin
srun ./bin/a.out
#AFTER: output
Copyright (C) RIKEN, Japan. All rights reserved.
- 141 -
Transfer input directory
Transfer output file
Host: Login Server
$MJS_CWD
output
transfer
Host: node 0 (rank 0)
$MJS_CWD
output
Host: Login Server
$MJS_CWD
bin
transfer
Host: node 0 (rank 0)
$MJS_CWD
bin
Copyright (C) RIKEN, Japan. All rights reserved.
- 142 -
1.2 Execute parallel job
1.2.1 sample 1 (16 cores in parallel job)
Content of job execution
Transfer input file which is necessary for each rank. Execute MPI execution module a.out of 16 cores
in parallel job. Transfer output file of each rank to Job execution directory.
Item Value Remark
Job execution directory $MJS_CWD
Execution module a.out
Input file input.0 - input.15 Input file is different with
each rank.
output file output.0 - output.15 Output for each rank
Destination of output
file $MJS_CWD
Job execution script
#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#<BEFORE>
# a.out
# 0: input.0, input.1, input.2, input.3, input.4, input.5, input.6,
input.7
# 8: input.8, input.9, input.10, input.11, input.12, input.13, input.14,
input.15
</BEFORE>
mpirun ./a.out
#<AFTER>
# 0: output.0, output.1, output.2, output.3, output.4, output.5,
output.6, output.7
# 8: output.8, output.9, output.10, output.11, output.12, output.13,
output.14, output.15
</AFTER>
Copyright (C) RIKEN, Japan. All rights reserved.
- 143 -
Transfer input file
Transfer output file
Host: Login Server
$MJS_CWD
transfer
Host:node 0(rank 0 - 7)
$MJS_CWD
transfer
output.
0
output.
15
output.
8
output.
7
output.
0
output.
7
Host:node 1(rank 8 - 15)
$MJS_CWD
output.
8
output.
15
..
.
..
.
..
.
..
.
Host: Login Server
$MJS_CWD
a.out transfer
Host:node 0(rank 0 - 7)
$MJS_CWD
input.
0
input.
15
input.
8
input.
7
a.out
input.
0
input.
7
Host:node 1(rank 8 - 15)
$MJS_CWD
a.out
input.
8
input.
15
transfer
..
.
..
.
..
.
..
.
Copyright (C) RIKEN, Japan. All rights reserved.
- 144 -
1.2.2 sample 2 (Use FTL variable)
Content of job execution
Use FTL variable ($MPI_RANK) for 1.2.1 sample 1 (16 cores in parallel job) case.
Item Value Remark
Job execution directory $MJS_CWD
Execution module a.out
Input file input.0 - input.15 Input file is different
with each rank
output file output.0 - output.15 Output for each rank
Destination of output file $MJS_CWD
Job execution script
Transfer input file
It is the same as 1.2.1 sample 1 (16 cores in parallel job)
Transfer output file
It is the same as 1.2.1 sample 1 (16 cores in parallel job)
#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#<BEFORE>
# a.out, input.$MPI_RANK
</BEFORE>
mpirun ./a.out
#<AFTER>
# output.$MPI_RANK
</AFTER>
Copyright (C) RIKEN, Japan. All rights reserved.
- 145 -
1.2.3 sample 3 (Transfer files of same file name avoiding overwriting)
Content of job execution
Execute MPI execution module a.out of 16 cores in parallel job. Transfer output files of the same
name on each rank to Job execution directory avoiding overwritng.
Item Value Remark
Job execution directory $MJS_CWD
Execution module a.out
Input file (none)
output file output 1 output file per rank
Destination of output file $MJS_CWD
Job execution script
#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#BEFORE: a.out
mpirun ./a.out
#FTL_SUFFIX: on
#AFTER: output
Copyright (C) RIKEN, Japan. All rights reserved.
- 146 -
Transfer input file
Transfer output file
Output file name is added rank number (first rank of the node) in advance of file transfer.
Host: Login Server
$MJS_CWD
output.
0
Host:node 0(rank 0 - 7)
$MJS_CWD
output
Host:node 1(rank 8 - 15)
$MJS_CWD
output output.
8
transfer
transfer
Host: Login Server
$MJS_CWD
a.out
transfer
Host:node 0(rank 0 - 7)
$MJS_CWD
a.out
transfer
Host:node 1(rank 8 - 15)
$MJS_CWD
a.out
Copyright (C) RIKEN, Japan. All rights reserved.
- 147 -
1.2.4 sample 4 (Use rank format)
Content of job execution
Execute MPI execution module a.out of 16 cores in parallel job. Transfer output file of each rank (rank
number is 3 digits) to Job execution directory.
Item Value Remark
Job execution directory $MJS_CWD
Execution module a.out
Input file (none)
output file output.000 - output.015 Output file for each
rank
Destination of output file $MJS_CWD
Job execution script
Input file transfer
It is the same as 1.2.3 sample 3 (Transfer files of same file name avoiding overwriting).
#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#BEFORE: a.out
mpirun ./a.out
#FTL_RANK_FORMAT: 3
#AFTER: output.$MPI_RANK
Copyright (C) RIKEN, Japan. All rights reserved.
- 148 -
output file transfer
Host: Login Server
$MJS_CWD
transfer
Host:node 0(rank 0 - 7)
$MJS_CWD
transfer
output.
000
Host:node 1(rank 8 - 15)
$MJS_CWD
output.
007
output.
008
output.
015
output.
000
output.
007
output.
008
output.
015
..
.
..
.
..
.
..
.
Copyright (C) RIKEN, Japan. All rights reserved.
- 149 -
1.3 FTL basic directory (FTLDIR command)
1.3.1 sample 1 (16 cores in parallel job)
Content of job execution
Transfer input file which is necessary for each rank. Execute MPI execution module a.out of 16 cores
in parallel job. Transfer output file of each rank to Job execution directory.
Item Value Remark
Job execution directory $MJS_CWD
FTL basic directory $MJS_CWD
Execution module a.out
Input file input
output file output.0 - output.15 Output file for each
rank
Destination of output file $MJS_CWD
Job execution script
#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#FTLDIR: $MJS_CWD
mpirun ./a.out
Copyright (C) RIKEN, Japan. All rights reserved.
- 150 -
Transfer input file
Transfer output file
Transfer only newly created files during job execution.
File name is added rank number.
Host: Login Server
$MJS_CWD/$MJS_REQID
transfer
Host:node 0(rank 0 - 7)
$MJS_CWD
transfer
output.
0.0
output.
15.8
output.
8.8
output.
7.0
output.
0
output.
7
Host:node 1(rank 8 - 15)
$MJS_CWD
output.
8
output.
15
..
.
..
.
..
.
..
.
Host: Login Server
$MJS_CWD
a.out
transfer
Host:node 0(rank 0 - 7)
$MJS_CWD
input
a.out input
Host:node 1(rank 8 - 15)
$MJS_CWD
a.out input
transfer
Copyright (C) RIKEN, Japan. All rights reserved.
- 151 -
1.3.2 sample 2 (File collect type: mtime)
Content of job execution
Transfer input file which is necessary for each rank. Execute MPI execution module a.out of 16 cores
in parallel job. Transfer output file of each rank to Job execution directory.
Item Value Remark
Job execution directory $MJS_CWD
FTL basic directory $MJS_CWD
Execution module a.out
Input file input
output file
output.0 - output.15 Newly created for each
rank
input Input file is updated
during job execution
Destination of output file $MJS_CWD
Job execution script
Transfer input file
It is the same as 1.3.1 sample 1 (16 cores in parallel job).
#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#FTL_COLLECT_TYPE: mtime
#FTLDIR: $MJS_CWD
mpirun ./a.out
Copyright (C) RIKEN, Japan. All rights reserved.
- 152 -
Transfer output file
Host: Login Server
$MJS_CWD/$MJS_REQID
transfer
transfer
output.
0.0
output.
15.8
output.
8.8
output.
7.0
Host:node 1(rank 8 - 15)
$MJS_CWD
output.
8
output.
15
input
Host:node 0(rank 0 - 7)
$MJS_CWD
output.
0
output.
7
input input.
0
input.
8
..
.
..
.
..
.
..
.
Copyright (C) RIKEN, Japan. All rights reserved.
- 153 -
1.4 Others
1.4.1 Execute job using temporary directory
Content of job execution
Execute MPI execution module a.out of 16 cores in parallel job. The a.out needs tmp directory in job
execution directory of each rank.
Item Value Remark
Job execution directory $MJS_CWD
Execution module a.out
Input file (none)
output file output 1 output file for each
rank
Destination of output file $MJS_CWD
Job execution script
#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#FTL_MAKE_DIR: $MJS_CWD/tmp
#BEFORE: a.out
mpirun ./a.out
#AFTER: output
Copyright (C) RIKEN, Japan. All rights reserved.
- 154 -
Make directory
Transfer input file
It is the same as 1.2.3 sample 3 (Transfer files of same file name avoiding overwriting).
Transfer output file
It is the same as 1.2.3 sample 3 (Transfer files of same file name avoiding overwriting).
Host: Login Server
make
Host:node 0(rank 0 - 7)
$MJS_CWD
tmp
make
Host:node 1(rank 8 - 15)
$MJS_CWD
tmp
Copyright (C) RIKEN, Japan. All rights reserved.
- 155 -
1.4.2 Execute job using meta character
Content of job execution
Transfer the same input file to each rank. Execute MPI execution module a.out of 16 cores in parallel
job.
Item Value Remark
Job execution directory $MJS_CWD
Execution module a.out
Input file input.0 - input.15 The same input file is
necessary for each rank
output file output only on MPI master node
Destination of output file $MJS_CWD
Job execution script
#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#<BEFORE>
# a.out, input*
#</BEFORE>
mpirun ./a.out
#AFTER: 0@$MJS_CWD:output
Copyright (C) RIKEN, Japan. All rights reserved.
- 156 -
Transfer input file
Transfer output file
Host: Login Server
$MJS_CWD
a.out
transfer
Host:node 0(rank 0 - 7)
$MJS_CWD
input.
0
input.
15
a.out
input.
0
input.
15
transfer
Host:node 1(rank 8 - 15)
$MJS_CWD
a.out
input.
0
input.
15
..
.
..
.
..
.
Host: Login Server
$MJS_CWD
output
transfer
Host:node 0(rank 0 - 7)
$MJS_CWD
output