scit.nju.edu.cn · Web view2021. 3. 4. · 江苏省重点实验室2020年度报告. 重点实验室名称:江苏省机动车尾气污染控制重点实验室. 承 担 单 位:南京大学.
Introduction to the Tsinghua University ENCODE Journal Club Monica C. Sleumer ( 苏漠 ) 2012-09-24.
-
Upload
antony-payne -
Category
Documents
-
view
222 -
download
1
Transcript of Introduction to the Tsinghua University ENCODE Journal Club Monica C. Sleumer ( 苏漠 ) 2012-09-24.
Tsinghua ENCODE Journal Club Objectives
• Read and discuss all 31 ENCODE papers• Discuss the 13 “Threads” in the ENCODE explorer• Discuss the overall meaning of the ENCODE project– Media reactions
• Understand how to apply ENCODE findings to our own research
• Generate a long-term repository for our findings on our journal club website: bioinfo.au.tsinghua.edu.cn/encode/
Human Genome
• 3,101,804,739 base pairs• 22 chromosomes plus X and Y• 21,224 protein-coding genes• 15,952 ncRNA genes• 3–8% of bases are under selection– From comparative genomic studies
• Question: What is the genome doing?
ENCODE Project Objectives• Find all functional elements
– Bound by specific proteins– Transcribed– Histone modifications– DNA methylation
• Use this information to annotate functional regions– Genes (coding and non-coding)– Promoters– Enhancers– Specific transcription factor binding sites– Silencers– Insulators– Chromatin states
• Cross-reference data from other studies– Comparative genomics– 1000 Genomes Project– Genome-wide association studies (GWAS)
Different combination in each cell type
ENCODE projects• ENCODE pilot project: 1% of the genome 2003-2007• modENCODE: Drosophila and C. elegans • Mouse ENCODE in progress?• ENCODE main project 2007-2012
– 1649 dataset-generating experiments– 147 cell types– 235 antibodies and assay protocols– 450 authors– 32 institutes
• 31 publications 2012-09-06– 6 in Nature – all discussed on 2012-09-19– 18 in Genome Research– 6 in Genome Biology – one of these discussed today– 1 in BMC Genetics www.nature.com/encode/category/research-papers
Materials• 147 types of human cell lines, 3 priority levels• Tier 1 cell lines: top priority for all experiments
• Tier 2 cell lines to be done after Tier 1 (next slide)• Tier 3: any other cell lines
Name Description Lineage Tissue Karyotype
GM12878B-lymphocyte, lymphoblastoid, Epstein-Barr Virus, 1000 Genomes Project
mesoderm blood normal
H1-hESC embryonic stem cells inner cell mass embryonic stem cell normal
K562leukemia, 53-year-old female with chronic myelogenous leukemia
mesoderm blood cancer
Tier 2 Cell LinesName Description Lineage Tissue Karyotype
A549 lung carcinoma epithelium, 58-year-old caucasian male endoderm epithelium cancer
CD20+ donor B cells: RO01778 and RO01794 mesoderm blood normal
CD20+_RO01778 B cells, caucasian mesoderm blood normalCD20+_RO01794 B cells, African American mesoderm blood normal
H1-neurons neurons derived from H1 embryonic stem cells ectoderm neurons normal
HeLa-S3 cervical carcinoma ectoderm cervix cancerHepG2 hepatocellular carcinoma endoderm liver cancerHUVEC umbilical vein endothelial cells mesoderm blood vessel normalIMR90 fetal lung fibroblasts endoderm lung normal
LHCN-M2 skeletal myoblasts from pectoralis major muscle, 41 year old caucasian mesoderm skeletal muscle
myoblast
MCF-7 mammary gland, adenocarcinoma ectoderm breast cancerMonocytes-CD14+
Monocytes-CD14+, leukapheresis from RO 01746 and RO 01826 mesoderm monocytes normal
SK-N-SH neuroblastoma, 4 year old ectoderm brain cancer
http://encodeproject.org/ENCODE/cellTypes.html
MethodsRNA-Seq Different fractions of RNA -> sequencing
CAGE 5’ Capped RNA sequencing
RNA-PET Sequencing 5’ Cap plus poly-A tail
ChIP-seq Chromatin immunoprecipitation of a DNA binding protein -> sequencing
DNase-seq Cut exposed DNA with DNase I -> sequencing
FAIRE-seq Nucleosome-depleted DNA -> sequencing
RRBS Bisulphite treatment: unmethylated C->U -> sequencing
3C,5C, ChIA-PET
Chromatin interactions -> sequencing
Wu Dingming2012-09-19
Ma Xiaopeng2012-09-19
Guo WeilongHe Chao
2012-09-19
Li Yanjian2012-09-19
• All methods (DNA or RNA sequencing) can be traced back to a genomic location• Findings vary between cell types
Primary Findings• 80.4% of the human genome is doing at least one of the following:
– Bound by a transcription factor– Transcribed– Modified histone
• 99% is within 1.7 kb of at least one of the biochemical events • 95% within 8 kb of a DNA–protein interaction or DNase I footprint• 7 chromatin states:
– 399,124 enhancer-like regions– 70,292 promoter-like regions
• Correlation between transcription, chromatin marks, and TF binding
• Functional regions contain lots of SNPs– Disease-associated SNPs in non-coding regions tend to be in functional
elements
Applications
• Visible as genome tracks in UCSC• Gene or pathway of interest• Mutation from – Cancer sequencing– Genome-wide association studies– Find out what that part of the genome is doing
• Compare with your cancer data (RNA-seq)• Comparative genome analysis
Online Resources• Interactive app on Nature ENCODE main page
• Journal club website: bioinfo.au.tsinghua.edu.cn/encode/
www.nature.com/encode/