CS174 Bioinformatics
Transcript of CS174 Bioinformatics
![Page 1: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/1.jpg)
CS174 Bioinformatics
Instructor: Xiaohui Xie
University of California, Irvine
![Page 2: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/2.jpg)
Today’s Goals
• Course information
• Challenges in bioinformatics/computational biology
• Brief intro to molecular biology
• Python tutorial
![Page 3: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/3.jpg)
Course Information
• Lecture: TT 3:30-4:50pm in DBH 1423
• TA: Lars Otten <[email protected]>
• Grading
– 40% Homework (4-5 problem sets)
– 25% Mid-term quiz (in class)
– 35% Final exam
• Office hours: TT after class
• Course Prerequisites:
– Basic programming skills, we will teach Python
– Statistics, Calculus, basic knowledge of biology
![Page 4: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/4.jpg)
Course Goals
• Introduction to the growing field of
bioinformatics/computational biology
– Fundamental problems in computational biology
– Statistical, algorithmic and machine learning techniques
– Overall survey of the field
![Page 5: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/5.jpg)
References
• Recommended Textbooks:
– N.C. Jones and P.A. Pevzner. An Introduction to Bioinformatics Algorithms
– R. Durbin, S. Eddy, A. Krogh and G. Mitchison. Biological Sequence Analysis
• Course Website:
http://www.ics.uci.edu/~xhx/courses/CS174/
where lectures, references and problem sets can be found.
![Page 6: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/6.jpg)
Why bioinformatics?
Bioinformatics = Biology + Information
![Page 7: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/7.jpg)
AGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCAT
ACATGCATGCTTCAATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAA
TGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCTCCTTATCCTTATAGTTCA
TACATGCTTCAACTACTTAATAAATGATTGTATGATAATTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCA
TGCTTCAACTGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCC
TTATAGTTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTA
TGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCTCCTTATCCTTAT
AGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAA
TGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAAT
GTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCT
AGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTTCAATGTAAGAGATT
TCGATTATCCTTATAGTTCATATGCTTCAACTACTTAATAAATGATCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTA
TAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGAATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT
TCAATGTAAGAGATTTCGATTATCTTATAGTTCATACACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTAT
AGTTCATACATGCATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTT
CAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAA
CTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGTATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGA
TGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTA
GCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGA
AGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTA
TGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTAT
TTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTG
TATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGA
AGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAA
ACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTG
CGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGA
CTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCATTCAGGTTGGTACGATAAACTTTACGAATGTTCTTGTCCAG
AGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAA
ATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAAC
CAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAAT
ACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACA
AACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGTATGATAATGATATGACTACCATTTTGTTATTGTA
CGTGGGGCAGTTGACGTCTTATCATATGTCAAAGAAAATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCA
ACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGC
GTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGT
TGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATGCAGGAGAACGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATT
GAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTA
TGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGTAATACGCTGAAAAACCTCAATACAGCTCATTCTGGAAGAAATAGTGTTTCTTGTACAACCAGGACTTGAAGC
CCGTCGAAAAAGAAAGGCGGGTTTGGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCA
TCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTA
TCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGC
CAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAA
CTTTAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACA
TGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATG
ATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAAATAAAGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCC
TTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTGTATGATTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA
ATGTTTTCAATGTAAGATTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAAT
GATTCATACATGCTTCAACTACTGTAAATAATTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATAGTTCA
TACATGCTTCAACTACTT
The human genome is 400,000 longer
than the sequence shown here.
![Page 8: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/8.jpg)
AGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATA
CATGCATGCTTCAATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATG
TTTTCAATGTAAGAGATTTCGATTATCCTTATGATTGTATGATAATGTTTTCTCCTTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAA
TTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTGAGATTTCGATTATCCTTATAGTTCATAC
ATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTACTTAATAAATGATTGTATGATAATGTTTTCA
ATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATA
GTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCTCCTTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGAT
AATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATC
CTTATAGTTCATACATGCATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACAT
GCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAAT
GTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATATGCTTCAACTACTTAATAAATGAT
CAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGAATTTC
GATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACACATGCTTCAA
CTACTTAATAAATGCAGATGCTGTTGGACTTCATGTCCCCAACCTAGCTTGGTGCACAGCATTTATTGTATGAAGAGATTTCGATTATCCTTATAGTTCATACATGCAT
AGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAAT
GATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATT
GTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGTATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTA
ATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCT
TTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCC
TCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATT
GGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGA
AGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCA
TAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA
TTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCT
GGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGA
TTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAG
ACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCATTCAGGTTGGTACGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTT
GTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCC
CTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAG
TCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTAT
AGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATA
TGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGAAAATTTGC
GAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAG
ATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTA
TTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAG
GGAATATGCAGGAGAACGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAA
ATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA
CAGCTCATTCTGGAAGAAATAGTGTTTCTTGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGT
TTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATG
GCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGA
CGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGA
TTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCT
TGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCC
TTATAGTTCATACATGCTTCAACTACTTAATAATGCACTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAAATAAAGCTTCAACTACTTAATAAAT
GATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTGTATGATTTATA
GTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGATTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGA
TTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTCATACATGCTTCAACTACTGTAAATAATTAATAAATGATTGTATGATAATGTTTTCAATGTAAG
AGATTTCGATTATCCTTATAGTTCATACATGCATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATTAT
AGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAAATAAAATGTAAGAGATTTCGATTATCCTTATAGTTCATACEEEEEEECATGCGTTG
ACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTTGATTG
TATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTGTATGATTTATATCG
![Page 9: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/9.jpg)
Why bioinformatics?
• Lots of data
• Pattern finding, rule discovery
• Allowing analytic and predictive methodologies that
support and enhance lab work
• Informatics infrastructure (data storage, retrieval)
• Data visualization
• Life itself is a computer!
![Page 10: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/10.jpg)
StochasticDeterministic
Histone codeEncapsiluation
VirusVirus
ModularityClass
Regulatory codeMethod
GeneData
RedundantPrecise
GenomeComputer Program
Genome as a computer program
![Page 11: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/11.jpg)
![Page 12: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/12.jpg)
Four Aspects
• Biology
– What’s the underlying problem?
• Algorithm
– How to solve the problem efficiently?
• Learning
– How to model biology systems and learn from observed data?
• Statistics
– How to differentiate true phenomena from artifacts?
![Page 13: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/13.jpg)
Topics to be covered
• DNA/RNA/Protein sequence analysis– Gene discovery
– Pattern finding (motif discovery, EM-algorithm)
– Sequence alignment (Smith-Waterman, BLAST)
– Models of sequences (HMM)
– RNA folding (Stochastic context-free grammar SCFG)
• Algorithms for large-scale data analysis– Clustering algorithms (Hierarchical clustering, K-means)
– Inferring gene networks (Regression, Bayesian networks)
• Evolutionary models– Phylogenetic trees
– Comparative Genomics
• Protein world (if time allows)– Secondary & tertiary structure prediction
![Page 14: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/14.jpg)
Introduction to Molecular Biology and
Genomics
![Page 15: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/15.jpg)
![Page 16: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/16.jpg)
Different Life Forms Share a Common Genetic Framework
![Page 17: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/17.jpg)
![Page 18: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/18.jpg)
Deoxyribonucleic acid (DNA)
• can be thought of as the “blueprint” for an organism
• composed of small molecules called nucleotides
– four different nucleotides distinguished by the four bases:
adenine (A), cytosine (C), guanine (G) and thymine (T)
• is a polymer: large molecule consisting of similar units (nucleotides in this case)
• DNA is digital information
• a single strand of DNA can be thought of as a string composed of the four letters: A, C, G, T
AGCGGTTAAGGCTGATATGCGCTTTAA
TCGCCAATTCCGACTATACGCGAAATT
![Page 19: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/19.jpg)
The Double Helix
DNA molecules usually consist of two strands arranged in the famous double helix
![Page 20: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/20.jpg)
Genomes
• The term genome refers to the complete complement of
DNA for a given species
• The human genome consists of 46 chromosomes
– Male: 22 pairs of autosomes + XY
– Female: 22 pairs of autosomes + XX
• Every cell (except sex cells and mature red blood cells)
contains the complete genome of an organism
![Page 21: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/21.jpg)
Human Genome (Male)
22 pairs of autosomes + sex chromosomes (XY)
![Page 22: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/22.jpg)
Human Genome (Female)
22 pairs of autosomes + sex chromosomes (XX)
![Page 23: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/23.jpg)
Human Chromosomes
Karyogram
![Page 24: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/24.jpg)
The Central Dogma
![Page 25: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/25.jpg)
RNA
• RNA is like DNA except:
– backbone is a little different
– usually single stranded
– the base uracil (U) is used in place of thymine (T)
• A strand of RNA can be thought of as a string
composed of the four letters: A, C, G, U
![Page 26: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/26.jpg)
The Genetic Code
64 combinations: 20 amino acids + stop codon
![Page 27: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/27.jpg)
Proteins
• Proteins are molecules composed of one or more
polypeptides
• A polypeptide is a polymer composed of amino
acids
• Cells build their proteins from 20 different amino
acids
• A polypeptide can be thought of as a string
composed from a 20-character alphabet
![Page 28: CS174 Bioinformatics](https://reader031.fdocuments.us/reader031/viewer/2022013012/61cf59c6fd523310567f7eb3/html5/thumbnails/28.jpg)
Readout from the genome