PREditor Predictive RNA Editor for Plant Mitochondrial Genes Jeff Mower.

32
PREditor P redictive R NA Editor for Plant Mitochondrial Genes Jeff Mower

Transcript of PREditor Predictive RNA Editor for Plant Mitochondrial Genes Jeff Mower.

PREditorPredictive RNA Editor for Plant Mitochondrial Genes

Jeff Mower

What is RNA Editing?– A process that alters the RNA sequence

– Nt insertion, deletion, or conversion

– Does not include RNA maturation processes

RNA Editing in Plants– Occurs in mitochondria and chloroplasts

– C to U and U to C conversions

– Mechanism is not known

RNA Editing in Plants– In seed plants (conifers, flowering plants, etc.)

• Widespread in mitochondrion• Rare in chloroplast• Predominantly C to U

– In non-seed plants (mosses, ferns, etc.)• Frequent in mitochondrion and chloroplast• Both C to U and U to C are common

AUG UGAAGACGGUC CAAAAUCGU UCU UGCGGCGUA

M R N S V G C Q *

UGGC

AUG UGAAGACGGUC CAAAAUCGU UCU UGCGGCGUA

M R N S V G C Q *

UGGC

AUGAUG CAAAAUCGU UCU UGCGGCGUA

M V M R N S V G C Q *

GUC

Creation of new start codon

AG UGA UGGC

AUG UGAAGACGGUC CAAAAUCGU UCU UGCGGCGUA

M R N S V G C Q *

UGGC

AUGAUG CAAAAUUGU UUU UGCGGCGUA

M V M C N F V G C Q *

GUC

Alteration ofprotein sequence

Creation of new start codon

AG UGA UGGC

AUG UGAAGACGGUC CAAAAUCGU UCU UGCGGCGUA

M R N S V G C Q *

UGGC

AUGAUG CAAAAUUGU UUU UGCGGUGUA

M V M C N F V G C Q *

GUC

Alteration ofprotein sequence

Creation of new start codon

AG UGA UGGC

No effect onprotein sequence

AUG UGAAGACGGUC CAAAAUCGU UCU UGCGGCGUA

M R N S V G C Q *

UGGC

AUG UGAUGGCAUG UAAAAUUGU UUU UGCGGUGUA

M V M C N F V G C *

GUC

Alteration ofprotein sequence

Creation of new stop codon

Creation of new start codon

AG

No effect onprotein sequence

Identifying Edit Sites1. Determine experimentally

• Need to isolate and reverse transcribe RNA• Need multiple reads (editing is not always complete)

Identifying Edit Sites1. Determine experimentally

• Need to isolate and reverse transcribe RNA• Need multiple reads (editing is not always complete)

2. Predict based on sequence context• Upstream and downstream regions are important• Unambiguous motifs have not been identified

Identifying Edit Sites1. Determine experimentally

• Need to isolate and reverse transcribe RNA• Need multiple reads (editing is not always complete)

2. Predict based on sequence context• Upstream and downstream regions are important• Unambiguous motifs have not been identified

3. Predict based on protein conservation• Proteins are more conserved after editing • Editing tends to “correct” amino acid sequences

PREditor Methodology – Aligned sequence database (ASD) Construction

• 363 DNA sequences RNA editing information is known Organisms do not perform RNA editing

• Proteins were translated using the editing information

• Homologous proteins were aligned 42 different alignments All known mt proteins are covered

PREditor Methodology – Input Sequence Manipulation

• Accept a protein-coding DNA sequence as input• Translate input sequence• Align translation to homologous proteins in ASD

PREditor Methodology – The Underlying Principle

• ASD sequences translated from edited RNA• Input sequence translated from unedited DNA• “Where can RNA editing in the input sequence increase

conservation to the ASD sequences?”

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G

Input E (GAA) T (ACG) P (CCU) G (GGC)

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G

Input E (GAA) T (ACG) P (CCU) G (GGC)

Unedited E (GAA) 0.33

Edited N/A N/A

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G

Input E (GAA) T (ACG) P (CCU) G (GGC)

Unedited E (GAA) 0.33 T (ACG) 0.33

Edited N/A N/A M (AUG) 0.67

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G

Input E (GAA) T (ACG) P (CCU) G (GGC)

Unedited E (GAA) 0.33 T (ACG) 0.33 P (CCU) 0.0

Edited N/A N/A M (AUG) 0.67 S (UCU) 0.0

L (CUU) 1.0

F (UUU) 0.0

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G

Input E (GAA) T (ACG) P (CCU) G (GGC)

Unedited E (GAA) 0.33 T (ACG) 0.33 P (CCU) 0.0 G (GGC) 1.0

Edited N/A N/A M (AUG) 0.67 S (UCU) 0.0 G (GGU) 1.0

L (CUU) 1.0

F (UUU) 0.0

Performance Analysis – Remove one protein sequence from the database

– Use the unedited DNA sequence as input

– Calculate statistics • Accuracy = (TP + TN) / (TP + FP + TN + FN)• Sensitivity = TP / (TP + FN)• Specificity = TN / (TN + FP)

– Repeat for each sequence

Performance Analysis– Total # C’s = 58,982

– True edited sites = 3,548 (6.0%)• TP = 2,922• FN = 626

– True non-edited sites = 55,434• TN = 54,829• FP = 605

Performance Analysis – Sensitivity = 82.4%

• Proportion of true edited sites that were predicted correctly• Increases to 94.6% if you ignore missed silent edited sites

– Specificity = 98.9%• Proportion of true non-edited sites that were predicted correctly

– Accuracy = 97.9%• Proportion of all sites that were predicted correctly• Increases to 98.7% if you ignore missed silent edited sites

Limitations– Cannot predict editing at silent sites

• 458 of 626 FN are at silent sites

Limitations– Cannot predict editing at silent sites

• 458 of 626 FN are at silent sites

– Not a major problem in practice• Silent editing sites do not affect the protein sequence • Many silent sites are only occasionally edited• Only 13% of editing sites are silent (expect ~38%)

Limitations– Aligned sequence database is skewed

Origin of

RNA editing

Angiosperms 266 (74%)

Gymnosperms 2 (1%)

Ferns 0

Horsetails 0

Hornworts 0

Mosses 0

Liverworts 32 (9%)

Charophytes 59 (16%)

Chlorophytes ―

Limitations– The skewed database effect

ASD1 Z

ASD2 Z

ASD3 Z

ASD4 Z

ASD4 X

Input X or Z?

Ongoing Work– Reducing the skewed database effect

• Weighted sequences and phylogenetics

ASD1 Z

ASD2 Z

ASD3 Z

ASD4 Z

ASD4 X

Input X!

Ongoing Work – Making the online resource more appealing

– Making the online resource more user-friendly

Future Directions – Increase diversity in the ASD

– Analyze sequence context using the ASD

– Apply methodology to editing in chloroplasts

– Apply methodology to U to C editing

Thanks – Jeff Palmer

– Sun Kim

– Danny Rice and other members of the Palmer lab