Post on 26-Dec-2015
The BioBox Initiative:Bio-ClusterGrid
Gilbert Thomas
Associate Engineer
Sun APSTC – Asia Pacific Science & Technology Center
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 2
AgendaAgenda
• Introduction : Bio-ClusterGrid• Solaris 9 Operating Environment• Sun Grid Engine (SGE)• Grid Engine Portal (GEP)• Applications on Bio-ClusterGrid• Installation of Bio-ClusterGrid• Current and Future Developments• Questions and Answers
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 4
Introduction: Bio-ClusterGridIntroduction: Bio-ClusterGrid
• Grid-enabled Bioinformatics Package
• Consists of 4 major components– Solaris 9 Operating Environment (April 2003
version)– Collection of 28 Bioinformatics applications pre-
installed and pre-configured– Sun Grid Engine – Grid Engine Portal
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 5
Introduction: Bio-ClusterGridIntroduction: Bio-ClusterGrid
• Fast setup (2 ½ hours)• Avoid hassle of downloading, compiling and
installing biox applications. • Applications optimized for SPARC.
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 6
Solaris 9 Operating EnvironmentSolaris 9 Operating Environment
• Latest version of Sun Solaris
• Supports GNOME 2.0 Desktop Environment
• Improvements in Performance, Security
• Easy patch administration using Patch Manager
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 7
GNOME 2.0 Desktop GNOME 2.0 Desktop EnvironmentEnvironment
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 8
Sun Grid Engine Sun Grid Engine
• Distribute Resource Management Software
• Provides load balancing and resource management
• Supports running of parallel applications over a cluster
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 9
Grid Engine Portal Grid Engine Portal
• Integrated into Sun One Portal Server
• Provides a web interface to some applications running on Sun Grid Engine
• Remote access from anywhere, anytime and any computer with a Java-enabled browser.
• For users who dislike Command-Line Interface (CLI)
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 10
Grid Engine Portal Grid Engine Portal
• Job Submission done through customised forms for each application
• View results of jobs online and/or download to local machine.
• Email user when job is completed.
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 11
Grid Engine PortalGrid Engine Portal
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 12
Submitting BLAST job using Submitting BLAST job using GEPGEP
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 13
Blast Job Output on GEPBlast Job Output on GEP
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 14
Applications onApplications onBio-ClusterGridBio-ClusterGrid
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 15
1.Homology & Similarity Search1.Homology & Similarity Search
• Definition– Sequence similarity is observable, homology is
an hypothesis based on observation
• Applications– BLAST– FASTA– GlimmerM– Wise
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 16
2. Sequence Analysis2. Sequence Analysis• Definition
– Use of bioinformatics methods to determine the biological function and structure of genes and the proteins they code for
• Applications– ACT
– ClustalW
– EMBOSS
– HMMER
– IMAGE
– T-Coffee
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 17
3. Structural Prediction3. Structural Prediction• Definition
– Determines the 2D/3D structure of proteins
• Applications– Dowser – FastDNAml– LOOPP– Mapmaker/QTL– PAML– PHYLIP
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 18
4. Molecular Imaging/Modeling4. Molecular Imaging/Modeling
• Definition– Tools that allow user to make predictions of the secondary
structure of proteins arising from a given amino acid sequence.
• Applications– Artemis – Cn3D– GROMACS– RasMol– ReadSeq– TribeMCL– VMD
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 19
5. Development Tools 5. Development Tools
• Biojava• Bioperl• Biopython
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 20
6. Other Software6. Other Software
• Apache• SQL • GNU Compilers• Sun One Compilers (trial licence)• HPC ClusterTools (Sun’s implementation of
MPI)
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 22
Bio-ClusterGrid InstallationBio-ClusterGrid Installation
1.Flash Archive Installation
2.Sun Grid Engine Installation
3.Grid Engine Portal Installation
4.Grid Installation for Cluster
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 23
1. Solaris 9 Flash Archive Installation1. Solaris 9 Flash Archive Installation
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 24
1. Solaris 9 Flash Archive Installation1. Solaris 9 Flash Archive Installation
● Flash archive contains the entire OS Image of the machine.
● All applications, files on original machine will be replicated on the clone machines upon installation.
● Installation of flash archive is much faster than a normal Solaris OE installation.
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 25
1. Solaris 9 Flash Archive Installation1. Solaris 9 Flash Archive Installation
● Installed using Solaris 9 Installation CD 04/03 or later
● Can be installed from ftp server, DVD, http server.
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 26
2. Sun Grid Engine Installation2. Sun Grid Engine Installation
● Very fast; less than 5 minutes per host● ./inst_sge -m –fast in SGE directory● Must be run by root user.
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 27
3. Cluster Grid Installation: 3. Cluster Grid Installation:
● For every execution node, “run ./inst_sge -x -auto” in SGE directory.
● Installation time : Less than 5 minutes
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 28
4.4. Grid Installation: Requirements Grid Installation: Requirements ● Users using SGE must have unix account on
every execution node (e.g. By using NIS) ● Applications must be installed in all the nodes
in the same path (e.g. By using NFS Share)● Sun Grid Engine and Grid Engine Portal root
directory must be nfs shared.
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 29
3. Grid Engine Portal Installation3. Grid Engine Portal Installation
● 3 Step Procedure● Installation of Sun One Portal Server● Installation of Gateway for Secure Access● Installation of Grid Engine Portal
● Installation takes around 30-40 minutes
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 30
Current Developments Current Developments
● Improvement to the GEP Interface● Make it easier and comfortable for biologists to run
their applications using GEP● Biologists choose their application and
immediately run their job
04 December 2003 ©
Gilbert Thomas, Associate Engineer, Sun APSTC 31
Future Developments Future Developments
● Improvement to GEP Installation Procedure● Bio-Server ● Bio-Workstation