Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.

14
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof

Transcript of Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.

Computational Identification of Drosophila microRNA GenesJournal Club 09/05/03

Jared Bischof

Computational Identification of Drosophila microRNA Genes

What are microRNAs?

Evolutionarily conserved genomic sequences ~22nt non-coding RNAs They form extended stem-loop structures They are presumed to have post-

transcriptional regulatory activity

Computational Identification of Drosophila microRNA Genes

microRNA structure

Computational Identification of Drosophila microRNA Genes

Reference Set

24 Drosophila pre-miRNA sequences These were analyzed in order to derive rules

and parameters that could be used to describe microRNAs in anonymous genomic sequence

By examining D. melanogaster and D. pseudoobscura they determined that the reference set of pre-miRNAs was highly conserved

What does this mean?

Computational Identification of Drosophila microRNA Genes

Alignment of Drosophila melanogaster and Drosophila pseudoobscura Program used: AVID (a global alignment tool) Sequence included:

Intronic and intergenic sequence Sequence omitted:

Exons, transposable elements, snRNA, snoRNA, tRNA and rRNA genes

Result: 51.3 out of 90.2 megabases of intronic and

intergenic D. melanogaster sequence was aligned

Computational Identification of Drosophila microRNA Genes

Narrowing the Search

51.3 Mbases (Dm) 436,000 regions of ~100 bases each Each region was analyzed with the use of

mfold for both the forward and reverse complement of each region

436,000 regions x 2 = 872,000 mfolds For the Dm regions scoring in the top 25%,

the corresponding Dp regions were also analyzed with mfold

Computational Identification of Drosophila microRNA Genes

Narrowing the Search (cont.)

The scores from the mfold analysis of the corresponding Dm and Dp regions were averaged and ranked

21/24 members of the reference set were in the top 600 results from this scoring

Next, they filtered out regions that did not meet the following criteria: Perfectly conserved sequence >= 22 nt Located < 10 nt from the terminal loop

Computational Identification of Drosophila microRNA Genes

Narrowing the Search (cont.)

Next, they evaluated some of the divergence patterns between the known pre-microRNAs in Dm and Dp.

Computational Identification of Drosophila microRNA Genes

Narrowing the Search (cont.)

Computational Identification of Drosophila microRNA Genes

Top Scoring Set

23/24 members of the Drosophila reference set fell into one of classes 1-3

Thus, candidates that did not fall into one of these classes were eliminated

This filtered the set down to the top 208 scoring candidates

18/24 (75%) of the reference set were in the top 124 candidates

Computational Identification of Drosophila microRNA Genes

Top Scoring Set (cont.)

42 novel candidates out of the top 208 candidates were found to be conserved in a third species (Anopheles, Apis, nematodes, or vertebrates)

Computational Identification of Drosophila microRNA Genes

Overview

Computational Identification of Drosophila microRNA Genes

Experimental Verification

They attempted to validate the predicted miRNAs by northern blot analysis

38 candidates were tested 27 candidates were conserved outside of Drosophila 11 candidates were Drosophila specific

Expression was observed for: 20/27 (74%) 4/11 (36%)

25 novel verified microRNA genes 20 + 4 != 25 25th? expression was not obtained but this candidate

was orthologous to miR-137

Computational Identification of Drosophila microRNA Genes

The End