GAO Feng (Tsinghua SEM) HE Ping (Tsinghua SEM) HE Xi (MIT Economics)
Introduction to the Tsinghua University ENCODE Journal Club
description
Transcript of Introduction to the Tsinghua University ENCODE Journal Club
Introduction to the Tsinghua University
ENCODE Journal Club
Monica C. Sleumer (苏漠 )2012-09-24
Tsinghua ENCODE Journal Club Objectives
• Read and discuss all 31 ENCODE papers• Discuss the 13 “Threads” in the ENCODE explorer• Discuss the overall meaning of the ENCODE project– Media reactions
• Understand how to apply ENCODE findings to our own research
• Generate a long-term repository for our findings on our journal club website: bioinfo.au.tsinghua.edu.cn/encode/
Human Genome
• 3,101,804,739 base pairs• 22 chromosomes plus X and Y• 21,224 protein-coding genes• 15,952 ncRNA genes• 3–8% of bases are under selection– From comparative genomic studies
• Question: What is the genome doing?
ENCODE Project Objectives• Find all functional elements
– Bound by specific proteins– Transcribed– Histone modifications– DNA methylation
• Use this information to annotate functional regions– Genes (coding and non-coding)– Promoters– Enhancers– Specific transcription factor binding sites– Silencers– Insulators– Chromatin states
• Cross-reference data from other studies– Comparative genomics– 1000 Genomes Project– Genome-wide association studies (GWAS)
Different combination in each cell type
ENCODE projects• ENCODE pilot project: 1% of the genome 2003-2007• modENCODE: Drosophila and C. elegans • Mouse ENCODE in progress?• ENCODE main project 2007-2012
– 1649 dataset-generating experiments– 147 cell types– 235 antibodies and assay protocols– 450 authors– 32 institutes
• 31 publications 2012-09-06– 6 in Nature – all discussed on 2012-09-19– 18 in Genome Research– 6 in Genome Biology – one of these discussed today– 1 in BMC Genetics www.nature.com/encode/category/research-papers
Materials• 147 types of human cell lines, 3 priority levels• Tier 1 cell lines: top priority for all experiments
• Tier 2 cell lines to be done after Tier 1 (next slide)• Tier 3: any other cell lines
Name Description Lineage Tissue Karyotype
GM12878B-lymphocyte, lymphoblastoid, Epstein-Barr Virus, 1000 Genomes Project
mesoderm blood normal
H1-hESC embryonic stem cells inner cell mass embryonic stem cell normal
K562leukemia, 53-year-old female with chronic myelogenous leukemia
mesoderm blood cancer
Tier 2 Cell LinesName Description Lineage Tissue Karyotype
A549 lung carcinoma epithelium, 58-year-old caucasian male endoderm epithelium cancer
CD20+ donor B cells: RO01778 and RO01794 mesoderm blood normal
CD20+_RO01778 B cells, caucasian mesoderm blood normalCD20+_RO01794 B cells, African American mesoderm blood normal
H1-neurons neurons derived from H1 embryonic stem cells ectoderm neurons normal
HeLa-S3 cervical carcinoma ectoderm cervix cancerHepG2 hepatocellular carcinoma endoderm liver cancerHUVEC umbilical vein endothelial cells mesoderm blood vessel normalIMR90 fetal lung fibroblasts endoderm lung normal
LHCN-M2 skeletal myoblasts from pectoralis major muscle, 41 year old caucasian mesoderm skeletal muscle
myoblast
MCF-7 mammary gland, adenocarcinoma ectoderm breast cancerMonocytes-CD14+
Monocytes-CD14+, leukapheresis from RO 01746 and RO 01826 mesoderm monocytes normal
SK-N-SH neuroblastoma, 4 year old ectoderm brain cancer
http://encodeproject.org/ENCODE/cellTypes.html
MethodsRNA-Seq Different fractions of RNA -> sequencing
CAGE 5’ Capped RNA sequencing
RNA-PET Sequencing 5’ Cap plus poly-A tail
ChIP-seq Chromatin immunoprecipitation of a DNA binding protein -> sequencing
DNase-seq Cut exposed DNA with DNase I -> sequencing
FAIRE-seq Nucleosome-depleted DNA -> sequencing
RRBS Bisulphite treatment: unmethylated C->U -> sequencing
3C,5C, ChIA-PET
Chromatin interactions -> sequencing
Wu Dingming2012-09-19
Ma Xiaopeng2012-09-19
Guo WeilongHe Chao
2012-09-19
Li Yanjian2012-09-19
• All methods (DNA or RNA sequencing) can be traced back to a genomic location• Findings vary between cell types
Primary Findings• 80.4% of the human genome is doing at least one of the following:
– Bound by a transcription factor– Transcribed– Modified histone
• 99% is within 1.7 kb of at least one of the biochemical events • 95% within 8 kb of a DNA–protein interaction or DNase I footprint• 7 chromatin states:
– 399,124 enhancer-like regions– 70,292 promoter-like regions
• Correlation between transcription, chromatin marks, and TF binding• Functional regions contain lots of SNPs
– Disease-associated SNPs in non-coding regions tend to be in functional elements
Applications
• Visible as genome tracks in UCSC• Gene or pathway of interest• Mutation from – Cancer sequencing– Genome-wide association studies– Find out what that part of the genome is doing
• Compare with your cancer data (RNA-seq)• Comparative genome analysis
Online Resources• Interactive app on Nature ENCODE main page
• Journal club website: bioinfo.au.tsinghua.edu.cn/encode/
www.nature.com/encode/
bioinfo.au.tsinghua.edu.cn/encode/
Next ENCODE Journal Club Meeting
Suggested meeting day:Thursday (周四 ) 2012-10-11
LIANG Zhengyu?One more volunteer speaker needed