Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV...

78
Genome browsing, Genomic data mining and Genome data visualization with Ensembl, Biomart and IGV Alex Sánchez August 2005

description

Course: Bioinformatics for Biomedical Research (2014). Session: 1.3- Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV. Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.

Transcript of Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV...

Page 1: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Genome browsing, Genomic data mining and Genome data visualization

with Ensembl, Biomart and IGV

Alex Sánchez

August 2005

Page 2: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

2

What is Ensembl

• Ensembl is a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project.

• More than one decade later, Ensembl's aim remains to provide a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms.

• Ensembl is one of several well known genome browsers for the retrieval of genomic information.

Page 3: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

3

“Ensembl” is a genome browser

• Ensembl provides a genome browser that acts as a single point of access to annotated genomes for mainly vertebrate species.

• Information such as gene sequence, splice variants and further annotation can be retrieved at the genome, gene and protein level. This includes information on – protein domains, genetic variation, homology, syntenic regions

and regulatory elements.

• Coupled with analyses such as whole genome alignments and effects of sequence variation on protein, this powerful tool aims to describe a gene or genomic region in detail.

Page 4: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

4

Basic Genome Annotation

• Genes – Genomic location – Gene model structures

• Exons • Introns • UTRs

– Transcript(s) • Pseudogenes • Non-coding RNA

– Protein(s) – Links to other sources of information

Page 5: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

5

Advanced Genome Annotation

• Cytogenetic bands

• Polymorphic markers

– Sequence Tagged Sites (STS)

• Genetic variation

– Single Nucleotide Polymorphisms (SNPs)

– Deletion-Insertion Polymorphisms (DIPs)

– Short Tandem Repeats (STRs)

• Repetitive sequences

• Expressed Sequence Tags (ESTs)

• cDNAs or mRNAs from related species

• Regions of sequence homology

Page 6: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

6

Use ensembl if you:

• have a gene of interest, and you would like to know if there are homologues in other species, or any sequence variations in the gene;

• would like to know what the sequence is for your gene of interest, and what the sequences of the splice variants (transcripts) are;

• want to explore the region around a gene of interest, and find neighbouring genes;

• want to find sequences that may be involved in gene regulation (open chromatin signatures, transcription factor binding sites, etc.);

• are interested in how conserved a gene or region is across species;

• want to know a selection of sequence variants that have been associated with a disease, for example, diabetes;

• have questions about a gene, variant, or chromosomal region;

Page 7: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

7

Don’t use it if:

• you want to submit sequence files (see the course on ENA);

• you are looking for metabolic pathways (learn more about Reactome);

• your species of interest is not a chordate (see a sister project, Ensembl Genomes);

Page 8: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

The Ensembl web site

Ensembl … …

takes genomic sequence assemblies

human build 35, mouse, rat, mosquito…

adds annotation and links

automated process

presents all the data on a web site

Page 9: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

9

How to search Ensembl

• Search www.ensembl.org using:

– a gene name (for example, BRCA2);

– a UniProtaccession number (for example, P51587);

– a disease name (for example, coronary heart disease);

– a variation (for example, rs1223);

– a location - a genomic region (for example, rat X:100000..200000);

– a PDBe ID or a Gene Ontology (GO) term

• Most search results will take you to the appropriate Ensembl view through a results page.

• If you search using a location you will be directed straight to the location tab (this tab provides a view of a region of a genome).

Page 10: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

10

Example 1 Searching the BCRA2 gene

• Open the at www.ensembl.org

• Choose your species of interest (Human) using the pull-down menu to the left of the search box.

• Type in your search term of interest into the search box. In our example we are using the gene name 'BRCA2'.

– You could also use a UniProtKB accession number, for example 'P51587'.

• Click 'Go' to obtain the search results

• You should see the BRCA2 gene at the top of the list.

Page 11: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

11

Searching the BCRA2 gene: Results

Page 12: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

12

Searching the BCRA2 gene: Summary

Page 13: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

13

Searching the BCRA2 gene: View

Page 14: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

14

Searching sequences: BLAT/BLAST

• f you have a sequence, but you are not sure what the gene name or ID in Ensembl is, you can align it to the genome with BLAST or BLAT

• BLAT (The BLAST-Like Alignment Tool) is fast, but it demands more exact matches. BLAST will allow lower-scoring hits, and allows more gaps in alignments. You'll get more hits with BLAST (but it may be slower)

Page 15: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Searching Sequences Example: The MTAP4 gene

• CTCCGCACTGCTCACTCCCGCGCAGTGAGGTTGGCACAGCCACCGCTCTG TGGCTCGCTTGGTTCCCTTAGTCCCGAGCGCTCGCCCACTGCAGATTCCTT TCCCGTGCAGACATGGCCT

• Click on the BLAST/BLAT link at the top of the page (circled in red in figure).

• Paste your sequence into the box.

• Check the options are correct. For example, we have selected Homo sapiens as the species to search against and the BLAT search tool because we're looking for an identical match.

• Click 'Run'

Page 16: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Searching sequences: Results

• Alignment Locations vs. Karyotype. The alignment [A] shows all hits on the genome. The best hit is boxed. In this case, BLAT shows one hit.

• Alignment Locations vs. Query. The alignment [B] shows hits, or High Scoring Pairs (HSPs), as a red bar along the query sequence (the black and white bar below).

• Alignment Summary. The summary [C] shows a table of hits, with customisable columns. Links are provided from the table. The link 'A' shows an alignment of the query and target sequence. 'G' shows the hit on the genome. 'C' brings you to the location tab, where you can see the BLAT hit in context of genes in that region.

Page 17: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Regions, maps and markers

MarkerView

SNPView

GeneSNPView

ContigView

CytoView

SyntenyView

MultiContigView

Page 18: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Genes & gene products

GeneView

TransView

ExonView

ProteinView

FamilyView

DomainView

GOView

DiseaseView

Page 19: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Ensembl exercises

Type the name of your favorite gene (i.e. BRCA2) and explore all the sections of ensembl for this gene. •Has this gene an ortholog in mouse? •How many different transcript do we know of this gene? •How many exons has the longest transcript? •Which functional annotations has this gene? (hint: check at GO annotations •Can you find SNPs in this gene?

Page 20: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Data retrieval

BioMart

Data sets on ftp site

MySQL queries of databases

Perl API access to databases

Export View

Page 21: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

35

ExportView

Page 22: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Data Mining in Ensembl with Biomart

August 2005

www.biomart.org/biomart/martview

Page 23: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

37

Simple Text-based Search Engine

Page 24: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

38

‘Mouse Gene’ Gives Us Results

Page 25: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

39

A More Complex Query is Not as Useful

Page 26: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

40

BioMart- Data mining

• BioMart is a search engine that can find multiple terms and put them into a table format.

• Such as: human gene (IDs), chromosome and base pair position

• No programming required!

Page 27: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

41

General or Specific Data-Tables

• All the genes for one species

• Or… only genes on one specific region of a chromosome

• Or… genes on one region of a chromosome associated with a disease

Page 28: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

42

BioMart Data Sets

• Ensembl genes

• Vega genes

• SNPs

• Markers

• Phenotypes

• Gene expression information

• Gene ontology

• Homology predictions

• Protein annotation

Page 29: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Web Interface

With BioMart, quickly extract gene-associated information from the Ensembl databases.

Page 30: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

44

Information Flow

• Choose the species of interest (Dataset)

• Decide what you would like to know about the genes (Attributes)

(sequences, IDs, description…)

• Decide on a smaller geneset using Filters.

(enter IDs, choose a region …)

Page 31: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Web Interface

Three main stages: Dataset, Attributes and Filters.

Choose the species of interest

Choose what information

to view.

Choose the gene set using what

we know.

Page 32: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

46

The First Step: Choose the Dataset

Homo sapiens genes are the

default.

Page 33: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

47

The Second Step: Attributes

Attributes are what we want to know about the genes.

Four output pages.

Page 34: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

48

The SNP Attribute Page

Output variation information such as SNP reference ID and alleles.

Page 35: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

49

Filters Allow Gene Selection

Choose the gene set by region, gene ID(s), protein/domain type.

Page 36: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

50

Export Sequence or Tables

Genes and attributes are exported as sequence (Fasta format) or tables.

Page 37: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

51

Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.

• In the query:

Attributes: what we want to know.

Filters: what we know

Page 38: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

52

Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.

• In the query:

Attributes: what we want to know.

Filters: what we know

Page 39: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

53

Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.

• In the query:

Attributes: what we want to know.

Filters: what we know

Page 40: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

54

A Brief Example

Change dataset to mouse

Mus musculus

Page 41: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

55

A Brief Example

Dataset has changed.

Page 42: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

56

Attributes (Output Options)

Click Attributes.

Attributes allow us to choose what we wish to know.

IDs are found in the ‘Features’ page.

Click on ‘GENE’.

Page 43: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

57

Default options selected: Ensembl Gene ID and Transcript ID

Attributes (Output Options)

Ensembl Gene ID is selected

Page 44: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

58

Scroll down to select MGI symbol. Also select the accession number.

Attributes (Output Options)

‘Markersymbol ID’ will give us the MGI ID

Page 45: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

59

‘Results’ give us Gene IDs for all mouse genes in the Ensembl database.

The Results Table

Page 46: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

60

Select a Smaller Gene Set

Select ‘Filters’

Expand the REGION panel

Instead of all mouse genes, select protein coding genes on chromosome 10.

Page 47: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

61

Select Genes on Chromosome 10

Select chromosome

10

Instead of all mouse genes, select protein coding genes on chromosome 10.

Page 48: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

62

Select Protein Coding Genes

Filters are set to chromosome 10 and protein-coding genes. Genes must meet BOTH

criteria to be in the result table.

Gene type: protein coding

Page 49: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

63

Results (Preview)

This is a preview- if you are happy with the table, click ‘Go’.

For the full result table: Go

Page 50: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

64

Full Result Table

Ensembl Gene ID Transcript

ID MGI

symbol MGI Accession

Number

Page 51: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

65

Original Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.

• In the query:

Attributes: columns in the Result Table

Filters: what we know

Page 52: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

66

Other Export Options (Attributes)

• Sequences: UTRs, flanking sequences, cDNA and peptides, etc

• Gene IDs from Ensembl and external sources (MGI, Entrez, etc.)

• Microarray data

• Protein Functions/descriptions (Interpro, GO)

• Orthologous gene sets

• SNP/ Variation Data

Page 53: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Central Server

www.biomart.org

Page 54: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

WormBase

Page 55: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

HapMap

Population frequencies

Inter- population comparisons

Gene annotation

Page 56: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

DictyBase

Page 57: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Uniprot, MSD

Page 58: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

72

GRAMENE

Rice, Maize, Arabidopsis genomes…

Page 59: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

73

Integrated Genome Viewer

Page 60: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

74

IGV can use and display many file

formats http://www.broadinstitute.org/software/igv/FileFormats

Page 61: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

75

IGV: file formats, e.g. BAM

(binary version of SAM, or Sequence Alignment Formatted files)

Page 62: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

76

Page 63: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

77

Ask your question, and then gather the

data, the tools and hardware you need

• Data and Databases: you will take

workshops, you will read papers, and you

will go on-line: SeqAnswers & maybe the

bioinformatics.ca Links Directory

• Tools: you will take workshops, you will

read papers, and you will go on-line:

SeqAnswers & maybe the

bioinformatics.ca Links Directory

• Hardware: you need to decide?

Page 64: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

What can you do with IGV?

Visualization of different genomic data types:

aligned sequence reads

mutations

copy number

RNA interference screens

gene expression

methylation and genomic annotations

List of supported data formats: http://www.broadinstitute.org/software/igv/FileFormats

For this example:

*.bam for the alignment file

*.gtf for the genome annotation data

Page 65: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Here we have selected hg18

because it was used for the

alignment

Step1: Choose the genome in the list

(or import your own genome file)

Page 66: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Sample files source:

http://manuals.bioinformatics.ucr.edu/home/gui-ngs-analysis

and ftp://ftp.broad.mit.edu/pub/igv/INMEGEN2010/

Step 2: Import your alignment file

File->Load from File

You can also download file from a URL, a DAS or a server

Page 67: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Step 2: Import your sequence alignment file

If you download a *.bam file, it must be sorted and indexed, and

the index *.bai file must be in the same directory

You can visualize several alignment files at the same time for the

same species

Page 68: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Step 3: select the data to display

You can either:

select a chromosome

select the coordinates

search for a gene

Page 69: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Step 4: visualize the read alignments on the sequence

You will not see the alignment if the region your are looking at in too large

for IGV: Zoom in using the + sign (in red) or by double-clicking on the

display area

double-click here to zoom in and see the alignment

Page 70: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Track

names

Genomic annotations (default: RefSeq)

Cytoband Genomic coordinates

Data panel

Page 71: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

White reads: low alignment score

Other colors: depend on the color alignment code selected

(ex: insert size, pair orientation, read strand)

Annotated exons Annotated introns

Coverage of reads on the sequence

Page 72: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Reference sequence (here hg18)

2 examples of variation compared

to the reference sequence

Lighter color bases: low quality bases

Page 73: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Step 5.1: download genomic annotations file from UCSC table browser

Several ways of downloading gene annotation files can be used, for

example directly from the source sequence databases

1) Go on http://genome.ucsc.edu and click on Tables

Page 74: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Select the genome (here hg18)

Select the gene annotations

(here Ensembl)

Choose your file name and click on the “get output” button

Select the file format (here GTF)

Page 75: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

Select File->Load from file and choose the GTF file you have

downloaded

You have know access to RefSeq and Ensembl gene

annotations:

Step 5.2: load the genomic annotation file in IGV

The more data and annotations you load, the more memory you need You can

select a higher memory threshold if you need it when you launch IGV

Page 76: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

On this example you can visualize deletion (10kb, from IGV publication*)

Robinson et al., (2011) Nature Biotechnology 29: 24–26

Page 77: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

Using IGV to visualize sequence alignment

and genomic annotations

You can also visualize copy number variation data (from IGV publication*)

Robinson et al., (2011) Nature Biotechnology 29: 24–26

Page 78: Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV (UEB-UAT Bioinformatics Course - Session 1.3 - VHIR, Barcelona)

92

Following OpenHelix, UCSC, & SeqAnswers

• OpenHelix

– http://www.openhelix.com/

– Twitter: @openhelix

– Blog: http://blog.openhelix.com/

• UCSC

– http://genome.ucsc.edu/

– Twitter: @GenomeBrowser

– More tutorials: http://genome.ucsc.edu/training.html

• SEQanswers

– Forum for NGS technologies http://seqanswers.com/