Critique
-
Upload
amer-t-wazwaz -
Category
Technology
-
view
232 -
download
3
Transcript of Critique
Paper CritiquePaper CritiqueComparative genomics beyond sequence-based alignments:
RNA structures in the ENCODE regions
Amer Talal [email protected]
18/4/2011
IntroductionIntroduction
ENCODE ENCyclopedia Of DNA Elementspilot project to identify the functional elements in the genomes’ sequences.
non-coding RNAs (ncRNAs)A major challenge in these projects is to annotate the large number of non-coding RNAs.
The steadily increasing number of the discovered ncRNAs has dramatically changed views on the roles and importance of ncRNAs.
ncRNAs difficult to find by computational or experimental means .
IntroductionIntroduction
Computationally finding ncRNAs is difficult becauseone has to consider secondary structure as well as nucleotide sequence .
But structure can be detected more reliably from a set of related sequences, if available.
The recent approach is to align the sequences first, then do RNA structure inference based on the alignment.
IntroductionIntroduction
This study describes the first large-scale search for structured ncRNAs in several
vertebrate genomes
through using
a local structural motif finding algorithm, which has identified several thousands novel
candidate ncRNAs.
Materials and MethodsMaterials and Methods They used CMfinder: a structure-oriented RNA motif prediction tool, to search the ENCODE regions of certain vertebrate multiple alignments.
CMfinder built as a complement to the RNAz/ EvoFold scans of the ENCODE regions.
They obtained their candidates from multiple alignmentblocks of the UCSC MULTIZ ; one block at a time
(155 nt long on average.)
Materials and MethodsMaterials and Methods
A group of 11 high-scoring ncRNA candidates chosen for experimental verification. ncRNA candidates that were tested by RT-PCR and Northern blotting.
10 were confirmed to be present as RNA transcripts in certain tissues.
Their experimental verification show evidence of significant differential expression across tissues.
ResultsResults
They found a large number of potential ncRNAs in the ENCODE regions.
They reported 6587 candidate regions with an estimated false-positive rate of 50%.
With their new candidates they increased the number of ncRNA candidates in the ENCODE regions by 32%.
DiscussionDiscussion
To demonstrate accuracy of the possible benefits of structure-aware alignment,
they examined MULTIZ multiple alignment blocks identified by Wang et
al. (2007)
with good matches to the Rfam model in all species in the same region of the alignment.
And reported that CMfinder’s alignment of the region differs from the MULTIZ alignment in only 13% of the positions.
DiscussionDiscussionAlso it is an alignment-independent
CMfinder ignore a sequence if it does not contain
the motif, and the program still report a high-scoring motif for the rest of the
sequences
CMfinder, alsodoes not remove individual sequences with
>25% and 20% gaps, respectively, as compared to RNAz and EvoFold
DiscussionDiscussionAlthough MULTIZ is most frequently shown to be quite accurate in these challenging cases, as a
rationalproof of cross-species conservation of each motif
instance .
several studies occasionally revealed compelling evidence of misalignment .
Even small misalignments have adverse effects on drawing any biological inferencesTwo main misalignment categories
"partial alignments“ "chimeric alignments “
"partial alignments"
Comprise 5.1% of the MULTIZ sequences.
What is aligned to the ncRNAs includes a large gap within the same or among species.
The aligned fragment by itself does not pass the threshold of certain tests for ncRNA family membership.
"chimeric alignments"
Comprise 5.4% of the MULTIZ sequences .
What is aligned to the ncRNAs not a contiguous sequence. Instead, it is composed of sequence fragments from different regions or even different chromosomes.
None of these fragments individually passed the threshold of certain tests of ncRNA family membership .
Structural approaches to distinguish ncRNAs
CMfinder and other structural programs classify transcripts as ncRNAs are likely to
lead to significant false positive rates or discoveries.
Since conserved secondary structures are also commonly found in mRNAs (especially
3’ UTRs).
functional ncRNAs may contain secondary or tertiary structures with non-canonical base interactions, that are not considered
by structural prediction programs.
Machine LearningMachine LearningCMfinder
Integrated motif features for scoring
by machine-learning algorithmsSupport Vector MachineSupport Vector Machine
BUT these methods did not perform well because of
heterogeneity of the features
limitations of available training data
SuggestionsSuggestions
Limiting the search to the most promising regions.
I suggest the CFTR region (syntenic, few duplications, higher quality of annotation and well conserved)
Using longer blocks (local aligned sequences)
>300 nt
Thank YouThank You
QuestionsQuestions