RNA-seq experiences and plans LUMC

Post on 23-Feb-2016

52 views 0 download

description

RNA-seq experiences and plans LUMC. Peter A.C. ’ t Hoen Human Genetics, LUMC. Pipelines. PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data Bioinformatics 28:479-86 (2012) - PowerPoint PPT Presentation

Transcript of RNA-seq experiences and plans LUMC

RNA-seq experiences and plansLUMC

Peter A.C. ’t HoenHuman Genetics, LUMC

Pipelines

• PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data

Bioinformatics 28:479-86 (2012)

• eMiR: pipeline for mapping, 5p-3p resolution and annotation of miRNAs

BMC Genomics 11:716 (2010)

PASSion

PASSion: performance simulated data

eMiR

eMiR

eMiR results LUMC samples

0.0E+00

5.0E+06

1.0E+07

1.5E+07

2.0E+07

2.5E+07

3.0E+07

3.5E+07

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Not truncatedTruncated not alignedTruncated and aligned sequences

eMiR results LUMC samples (2)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Mt_tRNA_p

seudogen

e

miRNA

miRNA_p

seudogen

e

misc_R

NA

misc_R

NA_pse

udogene

rRNA

rRNA_p

seudogen

e

scRNA_p

seudogen

e

snRNA

snRNA_p

seudogen

e

snoRNA

snoRNA_p

seudogen

e

tRNA_p

seudogen

eother

Other studies: Methods

• Tag-based: one read per transcript• DeepSAGE most 3’ CATG• DeepCAGE 5’-end

• RNA-Seq: multiple reads per transcript• Whole mRNA sequencing after fragmentation

DeepSAGE – sample preparation

PCR enrichment and gel purification (~85bp)

Example gene: Gapd

14542

12555

Example gene: alternative polyadenylation

97

99

Expression profiling in a human cohort

• 105 subjects with GWAS and phenotype data• RNA isolated from total blood• Expression profiling by deep-SAGE• 95 passed all QC

Analysis pipeline

• Trimming / addition of nucleotides• Genome alignment (Bowtie)• UCSC genome browser .wiggle files for visualization

• Annotation (ENSEMBL/Biomart)• Reads summed per gene• OR tagwise analysis

SNPs: sample swaps detected

Gender-specific gene expression

male

female

Normalizedexpressionof Y-chrgenes

NormalizedXISTexpression

Contaminated samples detected

Genes associated with BMI

• Differential expression analysis

1. In edgeR (designed for count data)

2. In limma (designed for microarray data; voom: mean-variance model)

• Gender as confounder

Limma and edgeR reasonably consistent

-log10 P-value

In red: high expressed genes

Genes associated with BMI (N=9, FDR 0.05)

Allele specific expression detected for some genes

Helicos single molecule sequencing

Example polyA profiling on Helicos

Eleonora de Klerk

Oculopharyngeal muscular dystrophy: General switch to shorter 3’-UTRs

Eleonora de Klerk

Example RNA-Seq (Helicos)ADAMTS8

ADAMTS15

NOV

Peter Henneman

Analysis of pre-mRNA processing

Irina Pulyakhina

splicing

pre-mRNA

mature mRNA

intermediate

mRNA

Pre-mRNA analysis tools

• map to both exon-exon junctions and introns;• prioritize intronic alignments;• report multiple alignments;• deal with both low and high coverage;• deal with indels and mismatches;• find novel exons and splice sites;• look for both canonical and non-canonical splice sites

GSNAP

T T T T T T T G T

T T T T T T T G T . . . G T A T C G A GSNAP

TopHat

Difference between TopHat and GSNAP results:

T T T T T T T G TT T T T T T T G T . . . T T T T T T T G C . . .G T A T C G A

T T T T T T T G T

GSNAP alignment: TopHat alignment:

Normal (standard) insert size

intint

intex

inte-i

inte-e

exex

exe-i

exe-e

e-ee-e

e-ie-e

e-ie-i

pre-splicing

Intermediate (pre+post)

post-splicing

Extremely large insert size

intint

intex

inte-i

inte-e

exex

exe-i

exe-e

e-ee-e

e-ie-e

e-ie-i

?

Insert size cut-off

Plans for GEUVADIS

• Transcription of repeat sequences such as Macro Satellite Repeats

• Study effect on local and global gene expression• Study heterogeneity of transcripts expressed from repeats

FSHD: disease mechanism

Lemmers et al. Science 329:1650-3 (2010)

No FSHD

11-100 units

4q35

D4Z4 (3.3 kb units)

4qA

D4Z4 Contraction

FSHD4-type D4Z4

A

AAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAA

4qB No FSHDB

4qAPAS

PAS

DUX4

AcknowledgementsShoaib AminiIrina PulyakhinaEleonora de KlerkHenk BuermansYanju ZhangKai YeJeroen LarosJohan den DunnenGertjan van Ommen

Rick JansenJeroen van ZantenGerard van GrootheestBrenda PenninxJan Smit

Joukejan Hottenga Gonneke Willemsen Dorret Boomsma Eco de Geus

NTR