Summer 2005 Show Me the CpG Islands! Alicia Laughton (Mathematics ‘06) Jessica Minnier...
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of Summer 2005 Show Me the CpG Islands! Alicia Laughton (Mathematics ‘06) Jessica Minnier...
Summer 2005
Show Me the CpG Islands!
Alicia Laughton (Mathematics ‘06)
Jessica Minnier (Mathematics ‘07)
Guided by Yung-Pin Chen (Mathematics/Statistics)
(With Statistical Significance)
Summer 2005
Outline
• DNA Overview
• CpG Islands
• Methods– Traditional Method
– Our Method
• Future plans
Summer 2005
DNA
• Deoxyribonucleic acid
• Double-helix• Chain of nucleotide
subunits• Contains genetic
information
Summer 2005
Nucleotides
• Made up of sugar, Phosphate, and bases
• Four bases– Adenine (A)
– Cytosine (C)
– Guanine (G)
– Thymine (T)
• CpG represents a C directly followed by a G in the DNA sequence
Summer 2005
Methylation
• Causes C to turn into T• Accounts for low
occurrence of CpG dinucleotides in vertebrates– Expectation is 6.25%
randomly
– Actually 1% of total sequence (Bird 1986)
Summer 2005
Sequence AL031723
• Human DNA sequence on chromosome 16• 3 known CpG Islands• Percentage of Content:
– A - 22.7%
– C - 29.5%
– G - 28.3%
– T - 19.5%
– CpG - 3.1%
Summer 2005
CpG Islands
• “regions of DNA with a high G + C content and a high frequency of CpG dinucleotides relative to the bulk genome” -- Gardiner-Garden and Frommer (1987)
Summer 2005
CpG islands & Genes
Gene
5’ end
CpGi
Gene
Promoter CpG islands
Gene CpG islands in body
Gene 3’ end CpG islands
Summer 2005
What is important about CpG Islands?
• Useful in identifying protein-coding regions (Yoon and Vaidyanathan, 2004)
– Associated with “housekeeping genes” and 40% of tissue-specific genes
• Aberrant methylation of CpG sites may cause silencing of tumor-suppressor genes (Deng, Zhou et al, 2002)
Summer 2005
aggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttgagacggagtcttgctctgtcacccaggctggagtgtagtggtgcaatctcggctcactgcacctctgcctcccgggttcaagcgattttcctgcctcagcccccggagtagctgggattacaggtgcccgccaccacacccagctaatttttgtatttttagtagagacggggtttcgccatgttggccaggctggtctcgaactcctgacctcaggtgatccgcctgcctcagcctcccaaagtgctgggattacagacgtgagccactgcggctggcctctctccccgtctttaactgtagccctgtgaattctcatcagcctgggcctggactcagcaggccaaaaagttaccagcagagcccagcacatgtgaggaaagtcggagacgtggcggcgccggccggaggatccttcccaagaccctgggccgctgtggccccctagatcttgcaggttgccagggtgccaggccagggagggggcctttctgagattctcctcattctgacacaggagaggagggcactgacccagtcccaaggtcccgggggaatcagccgaccacagcccaggactgtcccacctgggcagagagcccattctgggtgcccagcccgggcaggcccaggcacccccagcagtgccccgggcagcacctgccagccaggtagtgcagggtgaggttgggcagggcagggcgtggtaggtcagctgagcaaacagctcggagggagagctggggagggctgggaactaggtcgatagaaacacagggactgtgttagggaggggatgccttgccagtcacgcccagccctgactcctgccctctgagggggcttcccccacccctgctgacagccccaggaccggcccctgccaggaggctgacctgccaggagtgaccgccccagacttgagcccttgggaggcaggttctgagtccccttttcctgctcagacccccagggaaacgcaggctgggccagaggcagctgcacagacccctgcagtggggtgctcggtggagagcgctggaggtgggagggaggatgtgtgaggcagcgggagagaatccaggcttcccccacaacacccaccatgagcggtgcagagtaggggtgggcggcacgggagccttcccaccccgcagaaccaggccctgggcagagctggcctacagacgataccggacaagtcctcctccgtcttggtgacagagggagctgggactccctccacccacccactgccacttcagaagcagccacagggagactgggaggggcaggggtgctggggatgagcgtggggctcagccctccctcttcccaccctggagggctgcctccttccagcccacctggaagggtggtgtcagtcccagagcccctgcactccccgccccacctcctgcagctggaacccgcgtgggagccgcacccagcgtcccagggacaaacacagaggccttgggtggtggcggtaccaaggtctgaggcctggcagctcaggggcacccccgtccctgagagaggtcaagaaggggaggcaccaccccccaccacgggacctcgctgacgatgcccatagagagaaaccaggccagtgctgggaggggaaagaccccaggcctcatgagaagtcactgcctgcttttcccctcggccaggaaggaagccccaggcccttccctcccgtctcgggcatactgaccccaggcaccaagcgagaccaggagcccacccctttcctttcccagatggcacaccagtgactctgaatatcggagcgcacccctgctccctgggaggcaggatatcgtgccgctgctccctggggcgcacgataccctccccaggaaggcgccggtcagggcggacgggccagggtgctcaccggtaccaggcgaggccgcgctcgtagcacctgtcgaagaagtggggctcagagcccagcgcgcggacgtcggggtgcagccgcagaaactccagcagggcgcgcgtgccgcccttcttcacgccaacgatgagcgcttgcgggaagcgccggcggccgggaccgctggccaaaggcaggccgggtgctcccgggcggtggacggagctggacggctcggagggcgcgggggccggcgcgggggcgcgggcggccggcgggcagcggccggggagggcgcagaggcagtaggcgccgagcaccagggccacgagcagcatcggcgcgcgggacgcccgcagagcggccccttgcccggcccctgcgccctggccgcccccggccccgccgcccaggccgccgctacctgccatggggtcgcgccgctccaggcccgggagcgggggcagcaggcgggcgcgcatctcggcccgcgcgccgctcagtccgtgggtgcccggcttgtgctctgcgcccggcggtcccgcagcctgggagcgggcgcggggcgggaccgggggcggggtctggacgccctcccccctccccctcccccgcccactccgcctccgaggccactgcctgggctggacccgccggcagccgccaccacccgggcgcgactcgagctgccgggaccaccaggacgctcctgctccgagatcccaggccctggctcgcttgactccggcatcttcacctctgcgcggggaggatgcggcggcggtggccgttcgggacgcagggcagggacagggcggcgcgcgggcctcgggaccctctgtttgaagaccgatccccttccccccccaccccactccgggacgtgcgcggcaggtgcataggccaagccttggcctgcaggagcgggagcctcatcgccaggccaaggggacccaggaaaagcgtcgatccgggcactcggcctgccaagggagaaagaggccgggacagcaccctagtgtgcagagagggatcccagaacgtgtggggggagtctgcggccgggaatggcgtgcgcctcctcttcctgcctgctggagggaccagcaccaaaacaggaaagttcaccctgccaggccttctctccaaagagtcagagggagctccgtagggggatggggttcccggaccccctgccgtggaaggggagtgggaacacagacaggcggcaagggctttcgaggccccctcttgcacaaaccagctcagagatcggagatctttgggatcaattactttccctccccaggcatccgaagcctatcctagcccaggtgtggatgagggtgggagagacgggggaggagggagaggagcaggactggacccccgtgtgacaaacatctgacaagttgctctgaggactgcccccctccttgtggagcccacctcatctggtgtgcatttccctgcggctttcatccagccctgggcgaccctccctcctccatctcagcctccctcctcctgccccacacctcaggcctgggactcgcagatgccaaaagggcctggcagatgccaaagccagaaagtgcagggggactgcatcccccacaggagaccgggttcttccccactacatactcagaccccactccctgcacccactgctcttgcaaaccaggaactaaggggttcccctacccaccccgctccttgcctcctcttgcttttcttttgttttgtttgtttttgagacagagctgcactccagctgactcttgtcgcccaggctggagtgcagtggcacaatctcagctcactacaacctctgcctcccgggttcaagcgattctcctgcctcagcctcccaagtagctgggaatacaggcacccatcaccacgcctggctaatttttgtatttttagtagagatggggtttcaccatgttagtcaggctggtctcaaactcctgacctcaggtaatctgcccacctcagcctcccaaagggctgggattacaggcgtgagccactgtgccccaccctcctcttgcttttctaaaagatgatggtcaaagtacagcccccatttgcccccagacagggcacccttcccagatcgagaccttggggagtctgcgtgacccccacacctggcagacacaggtgcttcactagtgggggaacggctgagcatgtgctgagctcgggggcactagtgggctacagtccccaagtgggaggcccctcaagagcctggatgagctgactgacggtggagaggagggaaggagggcctatggccaaagtcaatccaggacccaactgccgaggccacaggaaggccgggtcaccgcctggaactaggtcggtcacagcccagtgggagccgtggcccggagactcaactgggggccctggttactctgctcgcctccccgcgtcggcacccagaacagagcttgcaggcactgggggcccagtccagggtctcaagagcagacaatgctgccttgcagttggggaaactgagacagggtgagaactttcagaggctcattgcaggctcctagcaggctgaaaggacggaggcacaggcacctaggagcacaccagccccacgtggccacggcccctcggagagcatgaggacacttgcaatgcggaagctcagcaggcccagctctactggctctgcaccgcccagtgaggggtcagcacagttggtccaagggacaataccagattaatgaggcagaagccacgggactgaccccttggaattctccacacccacactgtgcatccttaacccaaagcttctagcttggtagcccctcctaccctcctccctgcagcagggattagggatgcattctgacccctgcctgccgtcaggggagtgaggtctctccctggagcctgagctgaggatgcccaattcagccaggtgagccccgggatggactccatgtcccctagccaccacctgacttccccagcaccccacactggcaccagcccttcagatctcagaagcgagccaccctattctcacggagccccttcctgcctgccctccaaacccaagagtagttttagtacaaaaggcaaagttaacaaataggggtaggcgtcagggaaggaagaggatcagaggatcgggaacggagaaactggagcacctggagaagcgtctgggtcctgccacccccactgactccccaactggccttgggcagggtcctctctgcaggcgctgggtccaagcttggggatgagcagccaccagcgcgggctgcttcagctgaggctgccgcacccccacgtccatcctgggtagaggcaggacagccacagagccccatgcacggggctggactcaccctgggcactcacctaaaggcagtctcctcctttccaaagcccagactttctccggactcccaggaccaccaacaagggttcctgtgcgcagactcgggggtcttggggaggaaggacgctttctaggtggctgcctggaacctggaggcccctttctacagtacctggccagcggtcggtcacacctgagtgcccagagtgagcgggcggcagaggcatttctgacgctgccaggtaatcccacgggctggaaacgacctctgggctgggaagccaccgcctcccccagtcctgctgggtccctcagcagagagaacggaaccggggctttccccacagttttcaaagtttcagggaatcctagccaagtatcattccttcttccggagccgggaccccaggtcaagcctggggcccccacagggcggtcccaaccccactgcccggagcgcacccctgctccctgggaggcaggatatcgtgccgctgctccctggggcgcacgataccctccccaggaaggcgccggtcagggcggacgggccagggtgctcaccggtaccaggcgaggccgcgctcgtagcacctgtcgaagaagtggggctcagagcccagcgcgcggacgtcggggtgcagccgcagaaactccagcagggcgcgcgtgccgcccttcttcacgccaacgatgagcgcttgcgggaagcgccggcggccgggaccgctggcgtttccctcccaggggcccagtggtgaactgaattcaggcctgagacatactctgtctactaagtcaccccatctgcccagccttggtccacctggcactgcccagagacatcagtgatgcatttcggaagctggcaaagtggaccccactggagtacaaaggactcagggacccctgtgctggggaagagaaggagcccaggacctcccccaggggctgcctctgaggggcgtgagattcaggggcctctcgggtgggacctgcgggggccgctagacactgcgggaacttcacatccccaacgcccagcagcagcctgcagggaaggcaggggaggcgagccgggctcagagagggcgagcaacttgccccatccgaaggcaaaggtggtatgagacccgggtcctctctccacctctgccccagccttcctggccacagggctggcgccaggcaggcacggcacaggctcccggcagaggccacggtctcagccatccccacggtctcaggagtccccacggtctcagccgtccccacggtctgagtccccacggtctcagctgttcccacggtctcaggagtccccacaggttcagcagtccccacggtctcagccatccccacggtctcagccgtccccacagtctcagccatccccacggtctcagcagtccctactcaggacttgaaattccagcactggttccgtgatggctcctccagccccctgcccagcccagcatggtcatttccatctcctggcctttccgctgccgtctctctgctggatgctttatccttagtccccgctgagggcagaaggactttccaggaggaattgaccagaacgcagaacagcaggatgtggaatggactggggacagggagagagagatgcagggaccaggagtcggctcggagggttctcctggaagctgacccctccctccatcaggcactcggctgacggtggctacacacctcggggcgcccaggatggcagcactggggctgttcattcaccagtggatccccagcacctaacagagcctggcacgcagtggacattccattaatgtcgctcagtggaagggtatacgtgggaggagaggtcgggaaggctttctggaggtgacggccaggtgaagacgaggagaacagcattccaggccaaggaaccgtgtgggtgaaggctcagcagcagagagcccgggcagtagaggatggggtggagcttaaggccctgcgggaacaggggcggggcttagagtctggcctgaggctggtccagccccgcctcctcctcaggctcccaccaactctgagccaccagaccctcctttgtaaaatgaagacctcagtcatgactcgcatgagtctctgaagagtaacagctttattgtgatgtaattcacacaccactcaatccagccatttgtcgcatgcaaatcaatggttttcagtatattcatagtcgtgcaatcacaatcaattttagaacatttctatcaccccaaaaagaaatcctgtgtccattagcaatgacgccctcttctccccttcccacagcccctggcaaccacgaatctactttctgtctctatgggtttgcctattctggacatttcacaaaaagagaatcattgcttgaagccaggagttcaagaccaacctgggcaacaaagcgagaaccccgtctgtacaaaatattttaaatttagccaggcacagtggcgcacaccagtagtcccagcactttggaagtctgaggcaggaggttcacttgaggcggggaattcaaaaccagcctgggcaacatagggagtaccagtctctacaaaaaatttcaaaatttgccaagcgtgatggtatgcacctatagtcctagcttactcaggaggctgaggtgggaggatcgcttgagcccaggagtacgaggctgcagtgagccatgatcataccactgcattccagcctgggcgacagagtgagagcccatctctaaaacagaaagaaagaaagaaagaaatatggccagtcacagtggctcatgcctgtaatcccagcattttgggaggccaaggcaggtggatcacttgaggtcaggagttcgagaccagcctggccaacatggtgaaaccctgtctctaccaaaaatacataaattagccaggtgtgggccaggcgccatggcttacacttgtaatcccagcactttgggaggccgaggtgggcagatcacctgaggttgagagttcgagaccagcctgaccaacatgaagaaaccctgtctctactaaaaatacaaaaaattagctgggtgtggtggtgcatgcctgtaatctcagctacttgggaggctgaggaaggagaatggcttgaacccgggaggcagaggttgtggtgagccgagatcgcgcgattgcactccagcctgggcaacaacagcaaaactccatctcaaataataataataataaattagccaggtgtggtggtgcacgcctgtagtcccagctactcgggaggctgaggcacaagaaacccttgaacccgggaggcagaggttgcagtgaagctgaaattgcaccattccactccagcctgggagacagagtgagacaccatctctaaaatgaaaaaaaaaaaagagaatcatacaatgttcgtccttttgtgtctgggtctcttactcagcatgttctccaggttcatcaacactgtggcatgtgccagtacctccttcctcttcctgactgagtaatactccatcgtatggatggaccaccttttgttgattccctcattcgttgatggacatctaggttgtttccactgcggggttcttagtaacggtattacagggaaccatagattaccaggtatt
How do you locate the CpG island in a DNA sequence?
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.492 Observed/Expected: 0.548
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.501 Observed/Expected: 0.568
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.500 Observed/Expected: 0.560
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.712 Observed/Expected: 0.604 200 steps later…
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.598 Observed/Expected: 0.421 600 steps later…
Summer 2005
Just a couple formulas…
G+C content =
(# of C’s) + (# of G’s) length of window
Obs/Exp ratio =
Observed # of CpGs # of CpG’s in windowExpected # of CpGs (# of C’s)x(# of G’s) length
=
From window
Summer 2005
Traditional Methods• Gardiner-Garden and Frommer (1987)
– Window size 100 bp and Shift size 1bp– Criteria
• At least 200 base pairs• G + C content greater than 50%• Expected portion of the Obs/Exp ratio calculated over the window• Obs/Exp ratio greater than 0.6
• Takai and Jones (2002)– Window size 200 bp and Shift size 1bp– Criteria
• At least 500 base pairs• At least 7 CpG dinucleotides in 200 base pair sequence• G + C content greater than 55%• Obs/Exp ratio calculated in same fashion as above method• Obs/Exp ratio greater than 0.65
Summer 2005
The Traditional Method
C+G content Obs/Exp ratio
C+
G c
onte
nt/O
bs-E
x p r
ati o
Base Position
Sequence AL031723
Summer 2005
• Modifying the traditional methods
– Window size 200 bp and Shift size 1 bp
– Expected portion of the Obs/Exp ratio is based on whole sequence
• And….
Our Method
Observed # of CpGs # of CpG’s in windowExpected # of CpGs (# of C’s)x(# of G’s) length
=
From entire sequence
Summer 2005
• Cutoffs greater than 97th percentile of observed sequence
Obs/Exp Ratio G+C Content
Mean: 0.0018
Standard Deviation: 0.0014
97th percentile: 0.0058
Mean: 0.5815
Standard Deviation: 0.0818
97th percentile: 0.7350
G+C ContentObs/Exp Ratio
Num
ber
of O
bser
vati
ons
Num
ber
of O
bser
vati
ons
Sequence AL031723
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
Kullback-Leibler: 0.508 Our Obs/Exp: 0.0029 C+G: 0.492
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
Kullback-Leibler: 0.509 Our Obs/Exp: 0.0030 C+G: 0.501
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
Kullback-Leibler: 0.507 Our Obs/Exp: 0.0029 C+G: 0.500
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
200 steps later…Kullback-Leibler: 0.520 Our Obs/Exp: 0.0033 C+G: 0.712
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
600 steps later…Kullback-Leibler: 0.510 Our Obs/Exp: 0.0030 C+G: 0.598
Summer 2005
Our MethodK
L D
iver
gen c
e*1 2
/Obs
-Exp
rat
io*1
6 0/C
+G
Co n
tent
Base Position
Kullback-Leibler Divergence Observed/Expected Ratio C+G Content
Sequence AL031723
Summer 2005
Comparison of AL031723
Traditional
Method
Possible CpG Islands
3878-4534 5849-6136
6541-6820 8479-8698
10745-11049 18435-19580
25131-26359 35182-35441
36245-36576 36827-37606
Actual CpG Islands
18928-19547
25201-26371
36997-37693
Summer 2005
Comparison of AL031723
Our Method
Possible CpG Islands
19227-19435
25197-26147
36982-37420
Actual CpG Islands
18928-19547
25201-26371
36997-37693
Summer 2005
Cons
• Traditional Method
– Criteria not stringent enough
– If the expected part of the Obs/Exp ratio is unusually high then a high CpG count may not bring ratio above the cutoff
• Our Method
– Criteria sometimes too stringent
Summer 2005
Future Plans• CpG Islands
• Linkage Disequilibrium and SNPs
–Statistical analysis of the linkage disequilibrium coefficient
–Kullback-Leibler Divergence II