Linux 4 biology -...

18
Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB

Transcript of Linux 4 biology -...

Page 1: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

LinuxforBiologyDEDANGITHAE,BIOINFORMATICIAN

BECA-ILRIHUB

Page 2: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Importanceofcomputerstobiology

û Availability ofvast research datashared online.

û Automated analysis leading togeneration ofmassivedata

û Interactionwith other research communities andshared databases

û Speedandefficiency inprocessing,storage anddatamining

Page 3: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

BIGData:Volume,Variety,Velocity&Veracity

Volume:

◦Morecontentalreadygeneratedand

◦ isavailableoveropenaccess

◦Morecontentbeinggeneratedperrun

◦ asaresultoftechnologyadvancement

◦ Costscheaperovertime

Page 4: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Velocity:◦ Technologymakingdatagenerationfasterandhigherefficiency

Variety◦ Sequences,annotation,structures,imageprocessing

Veracity◦ Someambiguities,Inconsistencies,incomplete,modelapproximations

Page 5: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Othercomputationaltasks:AnalysisandinterpretationBiologyactivities:◦ Prediction– functionalandstructural◦ Patternrecognition:Domains,homology◦ Sequencealignments◦ Statisticalanalysis◦ Structuralmodelling◦ Geneticdiversityandinteractionsbetweenorganisms,betweenpopulations

Page 6: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Linux

Page 7: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Whatislinuxafamily

◦offreeandopen-sourcesoftware

◦operatingsystem

◦distributionsbuiltaroundtheLinuxkernel.

Page 8: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Whatislinuxafamily

Ubuntu?Fedora?Mint?Debian? openSUSE?

◦offreeanyoneisfreelylicensedtouse,copy,study,andchangethesoftwareinanyway

◦andopen-sourcesoftwarethesourcecodeisopenlysharedsothatpeopleareencouragedtovoluntarilyimprovethedesignofthesoftware

◦operatingsystemsystemsoftwarethatmanagescomputerhardwareandsoftwareresourcesandprovidescommonservicesforcomputerprograms.◦distributionsbuiltaroundtheLinuxkernel.partoftheoperatingsystemthatmediatesaccesstosystemresourceseginput/outputrequestsfromsoftware,translatingthemintodata-processinginstructionsforthecentralprocessingunit

Page 9: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Kernel

Page 10: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

SomeapplicationstobiologicaltasksRepetitivetasks– processingseveralsequencesAutomatinganalysisprocesses– scripts/pipingtoprogramsTextprocessingRegex;grep;sed;◦ extractingfieldsusingcut/awk◦ We’llseemoreofthisonthetutorial

Page 11: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

TheILRIHighPerformanceComputing(HPC)Cluster

Page 12: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

TheILRIHighPerformanceComputing(HPC)Cluster

userslogintoHPC(themaster)

Tologin:

ssh [email protected]

then“jump”to therestofthecluster(computingservers).

Todothis,type

interactive

Page 13: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Softwares:Toknowwhetherasoftware,andversionyouneedtouseisinstalled,type

module avail

Touseasoftware,eg BLAST,type

module load blast

Toseewhatsoftwares arereadyforuse(loaded),type

module list

Page 14: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

SLURM:SimpleLinuxUtilityforResourceManagement

Interactivejobshaveatimelimitof8hours.ifyouarerunningalongerjob,writeabatchscripttoscheduleit.

Howdowewritescripts?

Page 15: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

WritingaSlurm script◦ Availableoptions,type

sbatch –u [ man sbatch fordetailedexplanationofusage]

Page 16: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Exampleofabatchscript#!/usr/bin/env bash

#SBATCH -p batch

#SBATCH -J blastn

#SBATCH -n 4

# load the blast module

module load blast/2.6.0+

# run the blast with 4 CPU threads (cores)

blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt

ToRunthescript,type

sbatch [ scriptname.sbatch ]

Page 17: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Bestpractice;overviewRunthejobonthecomputingnode

interactive

Makeadirectoryinthescratchspace;and“go”there

mkdir –p /var/scratch/userX ; cd $_

Createthescript

Runthescript

sbatch [scriptname.sbatch]

Page 18: Linux 4 biology - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/AdvancedBFX2017/slides/Linux_4... · What is linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? of free anyone

Enjoy!