Bioinformatics Topics Not Covered in this Course BMI 730

24
Bioinformatics Topics Not Covered in this Course BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University

description

Bioinformatics Topics Not Covered in this Course BMI 730. Kun Huang Department of Biomedical Informatics Ohio State University. Non-coding RNA MicroRNA Related Bioinformatics Issues MicroRNA prediction and recognition Second order structure prediction Target prediction - PowerPoint PPT Presentation

Transcript of Bioinformatics Topics Not Covered in this Course BMI 730

Page 1: Bioinformatics Topics Not Covered in this Course  BMI 730

Bioinformatics Topics Not Covered in this Course

BMI 730 Kun Huang

Department of Biomedical InformaticsOhio State University

Page 2: Bioinformatics Topics Not Covered in this Course  BMI 730

Non-coding RNA MicroRNA Related Bioinformatics Issues

MicroRNA prediction and recognition Second order structure prediction Target prediction

Microbial Related Bioinformatics Metagenomics

Other Omics

Other Informatics

Page 3: Bioinformatics Topics Not Covered in this Course  BMI 730

Non-coding RNA• Non-coding DNA

• Junk DNA• Pseudogenes• Retrotransposons - Human Endogenous

Retroviruses (HERVs)• C-value enigma (e.g., Amoeba dubia genome

has more than 670 billion bases; pufferfish genome is 1/10 of human genome)

• Findings from ENCODE – nearly the entire genome is transcribed

Page 4: Bioinformatics Topics Not Covered in this Course  BMI 730

Non-coding RNA (ncRNA)• Any RNA molecule that is not translated into

a protein. • sRNA, npcRNA, nmRNA, snmRNA, fRNA• Also including tRNA, rRNA, snoRNA,

microRNA (miRNA), siRNA, piRNA, long ncRNA (e.g., Xist), shRNA

• Note the difference between siRNA and miRNA

Page 5: Bioinformatics Topics Not Covered in this Course  BMI 730

Non-coding RNA (ncRNA)• RNA-induced silencing complex (RISC)• RNA-induced transcriptional silencing (RITS)

Page 6: Bioinformatics Topics Not Covered in this Course  BMI 730

MicroRNA (miRNA)• Another level of regulation

Page 7: Bioinformatics Topics Not Covered in this Course  BMI 730

a

p

m

1

2

b

E2F1

E2F2

E2F3 Myc

17-5p 17-3p 18a 19a 20a 19b 92-1

c

Myc E2F

mir-17-92

Reviewed by: Coller et al. (2008), PLoS Genet 3(8): e146Figures from Dr. Baltz Agula

MicroRNA (miRNA)

Page 8: Bioinformatics Topics Not Covered in this Course  BMI 730

Non-coding RNA

MicroRNA Related Bioinformatics Issues• Secondary structure prediction• MicroRNA prediction and recognition• Target prediction

Databases

Page 9: Bioinformatics Topics Not Covered in this Course  BMI 730

Secondary structure prediction• Applications

• RNA folding dynamics• ncRNA discovery• Microarray probe validation/comparison

Wang et al. Genome Biology 2004 5:R65  

Page 10: Bioinformatics Topics Not Covered in this Course  BMI 730

Secondary structure prediction - Physics-based models

- Minimizing free energy / Dynamical programming / other optimization schemes

- Parameters come from empirical studies of RNA structural energetics (e.g., nearest neighbor interactions in stacking base pairs using synthesized oligonucleotides)

- Restricted from experimental procedure- Scoring models are used- Most ignore sequence dependence of hairpin, bulge,

internal, and multi-branch loop energies- Multi-branch loop energies rely on ad hoc scores- Still top performance- Mfold, ViennaRNA, PKnots, RDfold, etc

Page 11: Bioinformatics Topics Not Covered in this Course  BMI 730

Secondary structure prediction - Probabilistic approach- Stochastic context-free grammars (SCFG) – e.g., QRNA

- Specify grammar rules that induce a joint probability distribution over possible RNA structures and sequences

- Parameter easily learnt without experiments- Parameters may not have physical meanings- Performance inferior to physics-model methods

- Extensions: Conditional log-linear model (CLLM) – e.g., CONTRAfold

- Integrate the learning procedure with energy-based scoring systems

Page 12: Bioinformatics Topics Not Covered in this Course  BMI 730

Secondary structure prediction

CONTRAfold PKnotRG

Page 13: Bioinformatics Topics Not Covered in this Course  BMI 730

Secondary structure prediction - Comparative approach- Single sequence prediction (physics-based, SCFG) have

difficulty in searching all configurations- Structures that have been conserved by evolution are far

more likely to be the functional form

Page 14: Bioinformatics Topics Not Covered in this Course  BMI 730

MicroRNA prediction and discovery- Experimental approach - cloning- MicroRNA array (OSU microarray facility)- Massive sequencing

- Select segments in the range of 20-25nt- Using Solexa/SOLiD sequencer- Map to genome- Enrichment analysis / peak calling- Experimental validation

Page 15: Bioinformatics Topics Not Covered in this Course  BMI 730

MicroRNA prediction and discovery- Bioinformatics / machine learning approach

Wang et al. Genome Biology 2004 5:R65  

Page 16: Bioinformatics Topics Not Covered in this Course  BMI 730

MicroRNA prediction and discovery- Bioinformatics / machine learning approach

- Using evolutionary information

Nam, J.-W. et al. Nucl. Acids Res. 2005 33:3570-3581; doi:10.1093/nar/gki668

Page 17: Bioinformatics Topics Not Covered in this Course  BMI 730

MicroRNA prediction and discovery- Bioinformatics / machine learning approach

- Support vector machine / need features

• Features: • Sequence features

• Nucleotide frequency counts• Total G/C content

• Folding features• Pairing propensity• Minimum free energy (MFE)

• Topological features• Packing ratio

Page 18: Bioinformatics Topics Not Covered in this Course  BMI 730

MicroRNA target Prediction- Experimental / bioinformatics approach

- Blast can identify thousands potential targets – how to pin down the real ones?

Page 19: Bioinformatics Topics Not Covered in this Course  BMI 730

MicroRNA target Prediction- Computational / bioinformatics approach

- Mutually exclusive transcription pattern between miRNA and its targets

- Microarray screening- Existing of complementary sequence- Context score – features - Machine learning approaches (e.g., SVM,

regression, etc)

Cell, Volume 136, Issue 2, 215-233, 23 January 2009MicroRNAs: Target Recognition and Regulatory Functions

David P. Bartel

Page 20: Bioinformatics Topics Not Covered in this Course  BMI 730

Non-coding RNA

MicroRNA Related Bioinformatics Issues• Secondary structure prediction• MicroRNA prediction and recognition• Target prediction

Databases

Page 21: Bioinformatics Topics Not Covered in this Course  BMI 730

Databases• MicroRNA.org:

http://www.microrna.org/microrna/getMirnaForm.do• MirBase: http://microrna.sanger.ac.uk• …

Target prediction• MIRDB• TargetScan (http://targetscan.org)• PicTar (http://pictar.bio.nyu.edu)• miRanda (part of Sanger database)• MirTarget • …

Softwares• List at

http://en.wikipedia.org/wiki/List_of_RNA_structure_prediction_software

Page 22: Bioinformatics Topics Not Covered in this Course  BMI 730

Non-coding RNA MicroRNA Related Bioinformatics Issues

MicroRNA prediction and recognition Second order structure prediction Target prediction

Microbial Related Bioinformatics Metagenomics

Other Omics

Other Informatics

Page 23: Bioinformatics Topics Not Covered in this Course  BMI 730

Metagenomics study of genetic material recovered directly from

environmental samples a community of spieces – e.g., microbial from the

stomach of cow Challenges:

Who are there? How many?

16S riRNA – universal primer, highly conserved, used for profiling

forward: AGA GTT TGA TCC TGG CTC AG reverse: ACG GCT ACC TTG TTA CGA CTT

Next generation sequencing – more genes (chicken-and-egg)

Community metabolism – identify metabolic pathways within the community

New challenges: comparative study

Page 24: Bioinformatics Topics Not Covered in this Course  BMI 730

Non-coding RNA MicroRNA Related Bioinformatics Issues

MicroRNA prediction and recognition Second order structure prediction Target prediction

Microbial Related Bioinformatics Metagenomics

Other Omics

Other Informatics