2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s...
Transcript of 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s...
![Page 1: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/1.jpg)
Sequence alignmentBioinformatics MTAT.03.239
22.02.2018
Priit Adler
![Page 2: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/2.jpg)
![Page 3: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/3.jpg)
This lecture
• Reference genome
• Genomic variation
• Sequence alignment
• mapping reads to reference your self!
![Page 4: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/4.jpg)
• How long is human DNA ?
• How many “genes” do we have ?
• Describe the “Central dogma of molecular biology”
![Page 5: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/5.jpg)
Biology milestones
http://imihumangenomproject.blogspot.com.ee/2012/12/genome-sequencing.html
![Page 6: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/6.jpg)
http://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-5-206
Estimate the number of genes in Human
genome
![Page 7: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/7.jpg)
Genomic data
http://www.futuretimeline.net/blog/2014/01/16.htm#.VfsvUZ2qpBc
![Page 8: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/8.jpg)
Genomic data
http://www.ncbi.nlm.nih.gov/genbank/statistics
Growth of GenBank and WGS
![Page 9: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/9.jpg)
Analysis of sequences
• Sequence alignment
• Gene prediction
• Genome assembly
• Protein structure / domains
![Page 10: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/10.jpg)
Reference genome
A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' set of genes.
https://en.wikipedia.org/wiki/Reference_genome
![Page 11: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/11.jpg)
Genome• ︎Is the entirety of an organism’s hereditary information
• ︎The genome includes both the genes and non-coding sequences of DNA/RNA
• ︎In 1995, Haemophilus influenzae or was the first genome of a living organism to be sequenced in July 1995
• ︎1 830 140 base pairs of DNA in single circular chromosome that contains 1740 protein-coding gene, 58 transfer RNA genes and 18 other RNA genes
![Page 12: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/12.jpg)
![Page 13: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/13.jpg)
Genome sizes
![Page 14: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/14.jpg)
“Completely” sequenced genomes
![Page 15: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/15.jpg)
Human genome
![Page 16: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/16.jpg)
Human full genome: 3234,8 Mb
Tallinn - Jõgeva - Misso: 320 km
ATGCTCGTAC = 1mm
![Page 17: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/17.jpg)
DNA
• Protein coding genes cover only 1.5% of human genome
• Basepair variation between 2 genomes <~ 1%
• Structural variation accounts for more…
• What does the rest do ?
![Page 18: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/18.jpg)
MCF7 (cancer model) genomic rearrangement
bioinformatics.oxfordjournals.org/content/19/suppl_2/ii162.full.pdf+html
![Page 19: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/19.jpg)
Genomic variation
• SNPs — single(short??) nucleotide polymorphisms
• Indels — insertions / deletions
• CNVs — copy number variations
• Genomic rearrangements
![Page 20: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/20.jpg)
Graph genomehttps://www.sevenbridges.com/graph/
![Page 21: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/21.jpg)
DNA sequencing
• Read length
• Single reads
• paired end reads
https://biomedizin.unibas.ch/fileadmin/DKBW/redaktion/Group_Directories/Bioinformatics/IntroBioc2016/06_RNAseqRaw_html.html
![Page 22: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/22.jpg)
Questions
• Name sources of genetic variance
• Is human genome complete?
• What is the typical sequencing read length?
![Page 23: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/23.jpg)
Gene expression
preRNA
DNA
5’ 3’
mRNA
5’5’3’3’
![Page 24: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/24.jpg)
DNA vs RNA sequencing
reference genome
reference genome
DNA seq
RNA seq
![Page 25: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/25.jpg)
DNA complementarity
3’ - ATGCGGTAGGACGGCTAATGCCA - 5’
5’ - TACGCCATCCTGCCGATTACGGT - 3’
![Page 26: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/26.jpg)
DNA reverse complementarity
3’ - ATGCGGTAGGACGGCTAATGCCA - 5’
TGGCATTAGCCGTCCTACCGCAT
![Page 27: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/27.jpg)
Alignment problem
Find best fitting matching position from reference genome to a sequence read
![Page 28: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/28.jpg)
Alignement problem
• Exact matching
• Edit distance
• sequence alignment
![Page 29: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/29.jpg)
Sequence alignment
dynamic programming
http://avatar.se/lectures/molbioinfo2001/dynprog/dynamic.html
![Page 30: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/30.jpg)
Sequence alignment
Global alignment
Local alignment
Fitting alignment (global - local alignment)
![Page 31: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/31.jpg)
Rosalind glossary
Global alignment - http://rosalind.info/glossary/alignment/
Local alignment - http://rosalind.info/glossary/local-alignment/
Fitting alignment (global - local alignment) - http://rosalind.info/glossary/fitting-alignment/
![Page 32: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/32.jpg)
B L A S T
![Page 34: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/34.jpg)
Practice session
docker run -ti --rm -v /path/to/your/course/catalog/:/home/jovyan/bioinf/:rw -p 8888:8888 jupyter/base-notebook
Container will be deleted after use
where your data is:where notebook home is:read and writeopen port to access notbook
![Page 35: 2018 02 22 sequence alignment · 2018. 2. 26. · Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding](https://reader034.fdocuments.us/reader034/viewer/2022051920/600d51ea6e8843382824ccaf/html5/thumbnails/35.jpg)
• Write down 3 things you least understood in today lecture