Making Use of NGS Data: From Reads to Trees and Annotations

João André Carriço, PhDMicrobiology Institute/Institute for Molecular MedicineFaculty of Medicine, University of LisbonPortugal

Making Use of NGS Data: from Reads to Trees and

Annotations

http://im.fm.ul.pthttp://imm.fm.ul.pthttp://www.joaocarrico.info

WORKSHOP 24:NGS FOR MICROBIAL

GENOMIC SURVEILLANCE AND MORE - ONE

TECHNOLOGY FITS ALL

Conflicts of interest

Nothing to disclose

Disclaimer This presentation is not intended to cover all available

software or databases (we would need several weeks or months to do that)

I’ll present what I use or intend to use in a near future

I gladly accept any suggestions to included on similar presentations in the future.

It is supposed to be interactive so ask away during the presentation.

Summary What is in the reads FASTQ files

Available Databases Virulence Factors and AMR DBs Sequence-based typing databases: Pubmlst.org / Enterobase

High Throughput Sequencing data analysis (freeware) Prokka Roary Nullabor Microreact.org PHYLOViZ

Commercial Solutions Bionumerics 7.5 CLC Genomics Workbench (CLC Bio) Ridom Seqsphere+

What is in the reads FASTQ files?

Isolate Genome*

Sequenced Reads

Slide Source: Nick Loman

Other isolates in the sequencing run

Contamination

* Chromosome + Plasmids + Phages

Databases

VF DatabasesVirulence Factor Databases VFDB (http://www.mgc.ac.cn/VFs/main.htm) Pathosystems Resource Integration Center (PATRIC)

VF (https)://www.patricbrc.org/) Victors (http://www.phidias.us/victors/) PHI-Base (http://www.phi-base.org/) MvirDB (http://mvirdb.llnl.gov/ )

To know more: - Presentation on the Controversies in interpreting whole genome sequence data session : http://eccmidlive.org/#resources/how-can-we-design-actionable-virulome-databases

Antibiotic Resistance Databases Comprehensive Antibiotic Resistance Database

(CARD) (https://card.mcmaster.ca/)

Repository of Antibiotic resistance Cassetes (RAC) (http://rac.aihi.mq.edu.au/rac/)

Integrall :The integron database (http://integrall.bio.ua.pt/)

Sequence Based Typing :Pubmlst /BIGSdb

http://www.pubmlst.org

http://bigsdb.web.pasteur.fr/

Sequence Based Typing :Enterobase

slide by @happy_khan

Martin SergeantMark AchtmanNabil-Fareed AlikhanZhemin Zhou

Sequenced my strain…now what?

To know more : http://www.slideshare.net/nickloman/eccmid-2015-so-i-have-sequenced-my-genome-what-now

Reads(fastq files)

contigs(fasta files)

Annotated contigs(gbk/gff files)

Roary :Pan Genome Analysis

Enterobase BIGSdb

Nullabor

PHYLOViZ:Tree + metada visualization

Microreact.org: Tree +metadata +vizualization

De novo assembler

Prokka Genome annotation made easy by

Torsten Seemann (slides by Torsten) Genome annotation: adding

biological information to the sequence, by describing features

To know more :http://www.slideshare.net/torstenseemann/prokka-rapid-bacterial-genome-annotation-abphm-2013

Available at: https://github.com/tseemann/prokka

Roary Pan genome analysis by Andrew Page Available at: https://sangerpathogens.github.io/Roary/

Core genome

Accessory genome

Pan-genome

Roary Inputs: Annotated de novo assemblies (GFF files)

• Typically from the annotation pipeline

Outputs:• Spreadsheet with presence and absence of genes• Multi-FASTA alignment of core genes so you can build a tree without a

reference• Multi-FASTA alignments for each gene• Plots for the open/closed genome, unique genes• Integrates with Phandango so you can visualise all structural variation• QC report from Kraken to help identify suspect samples

(Slide by Andrew Page)

Roary outputs

Core (n or n-1 strains)

Soft-Core (n-2 or n-3 strains)

Shell( 8(?) to n-3 strains)

Cloud( <8 (?) strains)

Core genome:Core + Soft-Core

Accessory genome:Shell + Cloud

Roary outputs

iCANDY output of presence and absence of genes in accessory genome.S. Weltevreden & public S. enterica genomes

(Slide by Andrew Page)

Nullarbor Complete pipeline from reads to reports by Torsten

Seemann

Objective is automate analysis for everyday use on public health labs /research settings

Uses and distills outputs by a lot of software

Avaliable at: https://github.com/tseemann/nullarbor

Nullarbor

Slide by Torsten Seeman

Nullarbor

From: https://github.com/tseemann/nullarbor

Some Nullarbor outputs in report

Slides by Torsten Seeman

PHYLOViZwww.phyloviz.net

PHYLOViZInputs:- Tab separated txt

(profiles)- Fasta files- Automatic database

retrieval (MLST) Outputs:• goeBURST and

goeBURST MST• Link quality assessment• High quality images

Can be easily applied to:- MLST/ cgMLST/wgMLST- MLVA- SNP data*- Gene Presence/absence

PHYLOViZ 2.0

New features: • Hierarchical clustering • Neighbor-Joining• Project Saving

PHYLOViZ Online Available at http://online.phyloviz.net

Web based version of PHYLOViZ

Allows users to create their own datasets, save them and share their data (privately or publicly)

REST API available

Scalable to thousands of nodes

Tree Analysis tools: Interactive distance matrix NLV graph

PHYLOViZ Online

Slide by @happy_khan

PHYLOViZ Online

NLV Graph

Tree cut-off

Full MST

microreact.org

Create Selections

Change tree options

microreact.org Available at http://microreact.org/

Presentation on session Harnessing whole genome sequence data for public health applications : Novel open access tools for WGS-based pathogen surveillance and the identification of high-risk clones

http://eccmidlive.org/#resources/novel-open-access-tools-for-wgs-based-pathogen-surveillance-and-the-identification-of-high-risk-clones

Meet The Experts (available on twitter by order of appearance)

Commercial solutions

• Ridom Seqsphere+ : http://www.ridom.de/seqsphere/ • Applied Maths Bionumerics 7.6: http://www.applied-maths.com/bionumerics• CLCBio Genomic Workbench : http://www.clcbio.com/blog/clc-genomics-workbench-7-5/

Take home messages• Huge variety of software and database

solutions

• There is no single One-Size-Fits-All solution (job security for bioinformaticians)

• Different questions require different approaches

• Always question the results and data provenance

ECCMID2015 Meet-the-expert session on “What bioinformatic tools should I use for analysis of High Throughput Sequencing data for molecular diagnostics? ”

Nick Loman: http://www.slideshare.net/nickloman/eccmid-2015-meettheexpert-bioinformatics-tools

João André Carriço: http://www.slideshare.net/joaoandrecarrico/eccmid-meet-theexpert2015

More references/presentations

Acknowledgments UMMI Members

Bruno Gonçalves Mário Ramirez José Melo-Cristino

INESC-ID Alexandre Francisco Cátia Vaz Marta Nascimento

EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi

FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)

Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC) INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS

Making Use of NGS Data: From Reads to Trees and Annotations

Science

Transcript of Making Use of NGS Data: From Reads to Trees and Annotations

Bioinformatics for Omics Sciences-AbstractBookbioinformatica.isa.cnr.it/BBCC/BBCC2012/B4OS-BBCC...NGS reads 27-09-2012 Joint day: BIOINFORMATICS FOR OMICS SCIENCES Course and ... computational

Bionano Genomics’ Next-Generation Mapping Identifies … · A manual inspection of NGS reads corresponding with the Bionano derived target regions and de novo ... and inversions

Introduction to NGS - GitHub · Introduction to NGS Tuesday, August 21st 2018 26 Generic alignment format Supports short and long reads Supports different sequencing platforms Flexible

Java Annotations Tutorial - IT Collegeenos.itcollege.ee/~jpoial/java/naited/Java-Annotations-Tutorial.pdf · Java Annotations Tutorial 6 / 27 Chapter 6 Where can be used Annotations

NGS Part I RNA-Seq Short Reads Sequence Analysis Feb 29, 2012

Annotations - isis.apache.org · Table of Contents 1. Annotations ...

Hands-on Tutorial Short reads alignment/mapping and de ...ccbb.jnu.ac.in/IUBDDJan2015/workshop_files/NGS... · Big Data Analysis and Translation in Disease Biology January 18-22,

Power with Simplicity - Gene Codes Corporation€¦ · Sequencher’s External Data Browser makes it easy to track and organize your NGS runs. Add free-form text annotations to each

2010 Annual Research Report - WordPress.com...2010/08/03 · 40 2010 Annual Research Report Approximately, NGS reads of 2 Gb were assembled with Newbler 2.5.3 software. ORF finding,

Alignment of raw reads in Avadis NGS

Dr. Bhagyashree S. Birla NGS Field Application Scientist ... · 5/3/2018 · Count and analyze single original molecules (not total reads) = digital sequencing. Five unique DNA molecules.

Next-Gen Sequencing Analysis and Algorithms for PDX and ......Apr 25, 2017 · application for filtering the contaminated reads or incorporated in the pipelines of routine NGS analysis.

Outreach at the NGS Gillian Sinclair NGS Liaison Officer.

Andrea M. Cabibbe · Next Generation Sequencing: definition “NGS” includes all the technologies that generate high throughput, massively parallel sequence reads allowing DNA and

Algorithms for high-quality mapping of NGS reads Paolo Ribecabioinformatica.isa.cnr.it/BBCC/BBCC2012/PDF/PRibeca-B4OS-Naples... · Algorithms for high-quality mapping of NGS reads

Technology on the NGS Pete Oliver NGS Operations Manager.

SureSeq NGS Library Preparation Kit - ogt.com · The SureSeq NGS Library Preparation Kit generates NGS libraries suitable for the ... SureSeq NGS Library Preparation Adaptors, ...

10/22/12 NGS Sequence data NGS Sequence datagorgonzola.cshl.edu/pfb/2012/lecture_notes/Stajich NGS...hyphaltip.github.com/CSHL_2012_NGS/lecture/NGS_DNA.slides.html#slide1 2/58 NGS

MS COCO image and annotations COCO-Text annotations

Android annotations