Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar...

Annovar

Variants Analysishttp://www.openbioinformatics.org/annovar/http://www.openbioinformatics.org/annovar/

Marin Vargas, Sergio Paul

Dicembre 2013

Variants Analysisto diagnosis of Genetic Disease

Extraction

DNA Sequencing

(Genome or Exome)

FASTQ files

Variants Calling

Genome reference

Illumina Hiseq

Variants Calling

(BWA + GATK)

VCF files

Variants Analysis

(Several softwares)

Annovar description� Annovar is a program for functional annotation of genetic

variants from high-throughput sequencing data.

� Efficient tool to functional annotation of genetic variants from

diverse genomes (human, mouse, worm, fly, yeast, etc).

Genetic ANNOVAR

The most likely

causal variants Genetic

variants(VCF format)

ANNOVARcausal variants

and their

corresponding

candidate genes

Annotated genomes(GFF3 format)

UCSC, ENSEMBL(human, mouse, cow, etc)

BiologicalKnowledge(Predictors)

Annovar goal

� Variants reduction, through a stepwise procedure is possible

excluded variants that are unlikely to be disease causal and so

identify the putative genes involved in the disease.

� Filtering synonymous SNP

for further analysis.

� Different prediction

algorithms use differentalgorithms use different

information, then we use

predictions from multiple

algorithms.

� Querying predictions from

different databases for

different algorithms is both

tedious and time

consuming.

Annovar functionality� Principal functionality is given three types of functional

annotation:

� Gene based: identify whether Single Nucleotide Variant(SNV), small Ins/Del or Copy Number Variation (CNV)

cause protein coding changes.

� Region based: identify variants in specific genomic regions.

� Filter based: identify variants in base to filters on diverse

databases.databases.

� Secondary functionality:

� Retrieve the nucleotide sequence in any user-specific

genomic positions in batch.

� Identify a candidate gene list for Mendelian diseases from

exome data.

� Other utilities.

Gene based� From a whole-genome sequencing experiment on a human

subject, given a list of SNVs and indels, it is of interest to

identify the genes that are disrupted.

� For intergenic variants, we are interested in knowing what are

the two flanking genes, and what are the distances between the

variants and the flanking genes.

� For exonic variants, we are interested in knowing the amino

acid changes.acid changes.

Region based� Identify variants at conserved genomic regions.

� Identify the subset of variants that either fall

within the conserved regions (for SNPs and short

in-dels), or overlap with these conserved regions

(for large-scale CNVs).

� Use phastCons program prediction to annotate

variants that fall within conserved genomic

regions.

� Use TFBS (Transcription Factor Binding Site)

database to annotate the respective region.database to annotate the respective region.

� Identify cytogenetic band for genetic variants.

� Identify variants located in segmental

duplications (SegDup).

� Identify previously reported structural variants in

DGV (Database of Genomic Variants).

� Identify variants reported in previously published

GWAS (Genome-wide association studies).

� Identify variants in ENCODE annotated regions.

� Identify non-coding variants that disrupt

enhancers, repressors, promoters.

Filter based predictors 1� Identify subsets of variants based on

comparison to other variant

databases, for example, dbSNP or

1000 Genome Project.

� 1000 Genomes Project: started

in January 2008, is an

international research effort to

establish by far the most detailedestablish by far the most detailed

catalogue of human genetic

variation. annovar use the last

version (2012 April).

� dbSNP: The Single Nucleotide

Polymorphism Database is a free

public archive for genetic

variation within and across

different species developed and

hosted by the NCBI.

Filter based predictors 2� dbNSFP is a database developed by LJB2 (Liu, Jian and

Boerwinkle version 2) for Functional Prediction and annotation

of all potential Non-Synonymous SNVs in the human genome.

� It compiles prediction scores along with a conservation score,

from several popular algorithms and other related information.

� Thus dbSNFP use two types algorithms prediction:� Thus dbSNFP use two types algorithms prediction:

� Protein variant functional prediction.

� Variant conservation prediction.

Filter based predictors 3� dbNSFP protein variant functional prediction:

� SIFT: Sorting Intolerant From Tolerant,predicts whether an amino acid substitution islikely to affect protein function based onsequence homology and the physico-chemicalsimilarity between the alternate amino acids.

� PolyPhen2: prediction of functional effects ofhuman nsSNPs.human nsSNPs.

� LRT: Likelihood Ratio Test identify a subset ofdeleterious mutations that disrupt highlyconserved amino acids within protein-codingsequences.

� MutationTaster: rapid evaluation of thedisease-causing potential of DNA sequencealterations.

� MutationAssesor: predicts the functionalimpact of amino-acid substitutions in proteins.

� FATHMM: Functional Analysis ThroughHidden Markov Models.

Filter based predictors 4� dbNSFP variant conservation prediction:

� PhyloP: assigns conservation p-values, scores reflect either

conservation (positive scores) or selection (negative scores).

� GERP++: Genomic Evolutionary Rate Profiling, measures base

conservation.

� SiPhy: models the pattern of substitutions, rather than just the

rate. Biased substitutions (e.g. conserved lysine: AAA <-> AAG).

Filter based predictors 5� ESP (Exome Sequencing Project) annotations

� The ESP is a NHLBI funded exome sequencing project aiming to identify genetic

variants in exonic regions from over 6000 individuals, including healthy ones as

well as subjects with different diseases.

� GERP++(Genomic Evolutionary Rate Profiling) annotations� GERP identifies constrained elements in multiple alignments by quantifying

substitution deficits.

� CG (Complete Genomics) frequency annotations� Each technical platform, such as Complete Genomics and Illumina HiSeq, may� Each technical platform, such as Complete Genomics and Illumina HiSeq, may

generate some platform specific sequencing artifacts. Complete genomics

provides whole-genome data for a relatively small group of healthy subjects, but

this data set can be quite useful to filter out technical artifacts for CG users.

� Population frequency ensembl annotations� The database popfreq_all integrates PopFreqMax, 1000G2012APR_ALL,

1000G2012APR_AFR, 1000G2012APR_AMR, 1000G2012APR_ASN,

1000G2012APR_EUR, ESP6500si_AA, ESP6500si_EA, CG46, NCI60 SNP137,

COSMIC65, DISEASE.

� Generic mutation annotations� Annovar users have the flexibility to supply a custom-made annotation file, and

let ANNOVAR perform filter-based annotation on this annotation file.

Annovar result

�Two output files will be generated:

�The first file contains annotation for all variants.

�The second output file, contains the amino acid

changes as a result of the exonic variant.

�Annovar use standardized nomenclature to

annotate non-synonymous SNV and indels on

cDNA or on proteins.

Example: NOD2:NM_022162:exon4:p.R702W

Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar...

Documents

Transcript of Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar...

SIMPLEX - User Manual · SIMPLEX Server ... annotation of variants using di erent ... The SIMPLEX cluster is running in your very own cloud ...

Genomica)Funzionale) Genomica II- lezione V... · Integration of metabolomics with other ‘omics’ fields • Integrating genomics and metabolomics for engineering plant metabolic

Using VAAST to Identify Disease-Associated Variants in ... · pipeline using its Variant Annotation Tool (VAT) and Variant Selection Tool (VST). A support protocol describes how to

Articulo de Genomica

Genomica - Microarreglos de DNA

Visualizing ENCODE Data in the UCSC Genome Browser · Variant Annotation Integrator 1. Go to the Variant Annotation Integrator • Tools -> V.A.I. 2. Select Variants: • Variants:

Genomica Sequenziamento del genoma

Integrative Annotation of Variants READ THE FULL ARTICLE ...yulab.icmb.cornell.edu/PDF/Khurana_Scienc2013.pdf · from 1092 Humans: Application to Cancer Genomics Ekta Khurana, Yao

Curso Medicina Genomica en Oncologia · Curso Medicina Genomica en Oncologia Junio 2017 Carlos Mackintosh, PhD Cancer Genetics Lead carlos.mackintosh@imegen.es. 2 The Human Genome

Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

Genomica Sequenziamento del genoma - Unifedocente.unife.it/silvia.fuselli/dispense-corsi/genomica-1/... · Fig. 8-24, Lodish et al (4 th edintion) ... o chromosome structure and dynamics.

578 ' # '4& *#5 & 6 · Project. This annotation tool, like so many others, is very useful for human variant annotation; however, it does not char acterize variants in other species.

Annotation and Evaluation - GATE · University of Sheffield, NLP Topics covered • Defining annotation guidelines • Manual annotation using the GATE GUI • Annotation schemas

Discovery and annotation of variants by exome analysis using NGS

English PropBank Annotation Guidelinesverbs.colorado.edu/propbank/EPB-Annotation-Guidelines.pdfChapter 1 Verb Annotation Instructions 1.1 PropBank Annotation Goals PropBank is a corpus

Functional Annotation of Human Ion Channel Variants · Congenital Long QT Syndrome Clinical Features • Congenital prolongation of the rate-corrected QT interval QTc 440 msec in

Genomica y Proteomica Del Cancer

An annotation dataset facilitates automatic annotation of ... · An annotation . dataset facilitates. automatic annotation of whole-brain activity imaging of . C. elegans. Author.

Functional annotation of genetic variants tabular format. • Select the ranked genes • Resulting training set, shared annotations, and the ranked gene(s) can be downloaded as an

Image Annotation with TagProp on the MIRFLICKR set · weighted nearest neighbour tag predictions. We evaluate diﬀerent variants of TagProp with experi-ments on the MIR Flickr set,