Kyle Tretina w ith a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK

Click here to load reader

download Kyle Tretina w ith a team led by Dr.  Pattle  P. Pun in collaboration with Mr. Ross Leung of CUHK

of 26

description

Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16. Kyle Tretina w ith a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK. Introduction: Story of Evolutionary History. - PowerPoint PPT Presentation

Transcript of Kyle Tretina w ith a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK

Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16

Kyle Tretinawith a team led by Dr. Pattle P. Punin collaboration with Mr. Ross Leung of CUHKAnalysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16

Introduction: Story of Evolutionary History

Bacteria < Fish < Primate < HumanStory: increasing organismal complexity as evolution proceedsWHY?But little Mouse, you are not alone, In proving foresight may be in vain: The best laid schemes of mice and men Go often askew, And leave us nothing but grief and pain, For promised joy! Robert Burns (1785)

Fellow mortalman's dominionmous is blessed compared to man, they can only see the present. I look at the future and fear.GeneticsCentral Dogma: DNA RNA Protein

Complexity ~ Number of Genes?Humans ~30,000Flies ~ 14,000

G-Value Paradox

Complexity (K) ~ Gene Number (N)?Relationship?proportional:K~Npolynomial:K~Naexponential:K~aNfactorial:K~N!

Jean-Michel Claveries: ON/OFF states

230,000 / 214,000 3x104816

GoalDetermine the role of non-coding DNA in gene regulation by looking at the functions of non-coding SNPs that are positively selected or non-positively selected on chromosome 16DefinitionsSNP: single nucleotide polymorphismVariable between populationsImportance likely due to stability of variation

Selection: description of phenomena that only organisms best adapted to their environment tend to survive and create progenyGene-selection algorithm and neutral selection theory (wrench)Methods OverviewHapMap Database Selection Data List of Chr16 SNPs

UCSC Genome Database Mirror SNP flanking sequence

TRANSFAC related transcription factor data for each SNP flanking sequence

PReMod confirm results

HapMap Phase I DataHapMap Project: an international effort to identify and catalog genetic similarities and differences in human beings (Haplotype Maps), also includes:

Selection Data List of Chr16 SNPs~25,000 non-positively selected~5,000 positively selected

UCSC Genome BrowserGenome.UCSC.edu: a website containing several reference sequences and tools for visual and computational analysis

Methods:Enter in each from list of RSIDs (SNP Identifiers)Note intersecting sequencesCopy/Paste Sequences

UCSC Genome Browser MirrorEfficiency~70seq/hr for 1.5yrs = ~1/3 sequences gathered2hrs

Online Instructions, but Complicated Data Structure

Henry Ford: 1.1 million lines source code

Many thanks to the Dr. Hayward (Wheaton College CS Faculty)We have directions, but we need to alter the machine, and know WHERE to alter it-> Dr. Hayward15Sequences CollectedGraph 1. The distributions of the positively selected SNPs used in the study across human chromosome 16

Graph 2. The distributions of the non-positively selected SNPs used in the study across human chromosome 16

TRANSFACTRANSFAC: a relational database, available via the web as six flat files including various data concerning transcription factors, DNA-binding sites, and target genes

Automation at CUHK

PReModPReMod: a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes.

Enter ranges for SNP sequencesLook for same pattern as TRANSFACexploits the fact that many known CRMs are made of clusters of phylogenetically conserved and repeated transcription factors (TF) binding sites20

AnalysisMySQL Tables

Programmed Scripts:Word Patterns: i.e. keywords, recurring identifiersUnique EntriesProgress StatisticsOverlap between N+ selected and + selected SNPsResultsSNP SelectionRS NumbersSequence GatheredNon-Positive25,6226173 (24%)Positive47504750 (100%)

Table 1. A summary of the manual SNP flanking sequence gathering from the UCSC Genome Browser ResultsSNP SelectionTotalNo SitesUniqueMatches in Other DatasetTRANSFAC Entries to Be Looked UpNon-Positive25,5941,611 (6%)3,218 (13%)20,765 (81%)82 (