Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical...

Transcription factor binding sites and Transcription factor binding sites and gene regulatory networkgene regulatory network

Victor JinVictor JinDepartment of Biomedical InformaticsDepartment of Biomedical Informatics

The Ohio State UniversityThe Ohio State University

Transcription in higher eukaryotesTranscription in higher eukaryotes

Gene Expression

1. Chromatin structure

2. Initiation of transcription

3. Processing of the transcript

4. Transport to the cytoplasm

5. mRNA translation

6. mRNA stability

7. Protein activity stability

Transcriptional Regulation

Nuclear membrane

Binding site/motifCCG__CCG Genome-wide mRNA

transcript data (e.g. microarrays)

Nuclear membrane

Binding site/motifCCG__CCG

• Understand which regulators control which target genes

• Discover motifs representing regulatory elements

Learning problems:

Some common approaches

• Cluster-first motif discovery – Cluster genes by expression profile, annotation, …

to find potentially coregulated genes– Find overrepresented motifs in promoter

sequences of similar genes (algorithms: MEME, Consensus, Gibbs sampler, AlignACE, …)

(Spellman et al. 1998)

Training data – Features

promoter sequence

regulator expression

feature vector

What is PWM?

Transcription factor binding sites (TFBSs) are usually slightly variable in their sequences.

A positional weight matrix (PWM) specifies the probability that you will see a given base at each index position of the motif.

NCCAGTNNNACTGGNCon165231426973424447T61034441915111089343113G1839431001415214339338C611391077729145818A151413121110987654321Pos

PWM for ERE

1. acggcagggTGACCc

2. aGGGCAtcgTGACCc

3. cGGTCGccaGGACCt

4. tGGTCAggcTGGTCt

5. aGGTGGcccTGACCc

6. cTGTCCctcTGACCc

7. aGGCTAcgaTGACGt ...

41. cagggagtgTGACCc

42. gagcatgggTGACCa

43. aGGTCAtaacgattt44. gGAACAgttTGACC

c45. cGGTGAcctTGAC

Cc46. gGGGCAaagTGAC

1. acggcagggTGACCc

2. aGGGCAtcgTGACCc

3. cGGTCGccaGGACCt

4. tGGTCAggcTGGTCt

5. aGGTGGcccTGACCc

6. cTGTCCctcTGACCc

7. aGGCTAcgaTGACGt ...

41. cagggagtgTGACCc

42. gagcatgggTGACCa

43. aGGTCAtaacgattt44. gGAACAgttTGACC

c45. cGGTGAcctTGAC

Cc46. gGGGCAaagTGAC

Given N sequence fragments of fixed length, one can assemble a position frequency matrix (number of times a particular nucleotide appears at a given position). A normalized PFM, in which each column adds up to a total of one, is a matrix of probabilities for observing each nucleotide at each position.

Position frequency matrix (PFM)

(also known as raw count matrix)

PFM should be converted to log-scale for efficient computational analysis. To eliminate null values before log-conversion, and to correct for small samples of binding sites, a sampling correction, known as pseudocounts, is added to each cell of the PFM.

Position weight matrix (PWM)(also known as position-specific scoring matrix)

Position Weight Matrix for ERE

Converting a PFM into a PWM

,log),(

22 bpNN

ibpibw

– raw count (PFM matrix element) of nucleotide b in column i

N – number of sequences used to create PFM (= column sum)

- pseudocounts (correction for small sample size)

p(b) - background frequency of nucleotide b

For each matrix element do:

A 18 8 5 4 1 29 7 7 7 0 1 39 1 1 6C 8 3 3 9 33 4 21 15 14 0 0 1 43 39 18G 13 31 34 9 8 10 11 15 19 4 44 3 0 1 6T 7 4 4 24 4 3 7 9 6 42 1 3 2 5 16

A 0.58-

2.29 1.22-

2.29 1.62-

2.29 -0.72

0.30 1.39-

1.21 0.78 0.34 0.25-

2.29 1.76 1.62 0.46

G 0.16 1.31 1.44-

0.06 0.34 0.65-

1.21 1.79-

2.29 -0.64

1.21 0.96-

0.78 1.73-

0.98 0.23

G G G T C A G C A T G G C C A

Absolute score of the site

Max 0.58 1.31 1.44 0.96 1.39 1.22 0.78 0.34 0.65 1.73 1.79 1.62 1.76 1.62 17.20Min -0.60 -1.49 -1.49 -1.21 -2.29 -1.49 -0.60 -0.60 -0.78 -2.96 -2.96 -2.29 -2.96 -2.29 -24.02

scoreMinimumscoreMaximum

scoreMinimumscoreAbsolutescorerelative

02.2420.17

02.2457.11

),( =11.57

Scoring putative EREs by scanning the promoter with PWM

Row Sum

A 0.58 -0.44 -0.98 -1.21 -2.29 1.22 -0.60 -0.60 -0.60 -2.96 -2.29 1.62 -2.29 -2.29 -0.72

C -0.44 -1.49 -1.49 -0.30 1.39 -1.21 0.78 0.34 0.25 -2.96 -2.96 -2.29 1.76 1.62 0.46

G 0.16 1.31 1.44 -0.30 -0.44 -0.17 -0.06 0.34 0.65 -1.21 1.79 -1.49 -2.96 -2.29 -0.64

T -0.60 -1.21 -1.21 0.96 -1.21 -1.49 -0.60 -0.30 -0.78 1.73 -2.29 -1.49 -1.84 -0.98 0.23

Yeast ESR: Biological Validation

STRE element

Universal stress repressor motif

Previous work: “Structure learning”

• Graphical models (and other methods)– Learn structure of “regulatory network”, “regulatory

modules”, etc. – Fit interpretable model to training data– Model small number of genes or clusters of genes– Many computational and statistical challenges; often

used for qualitative hypotheses rather than prediction

(Segal et al, 2003, 2004)

(Pe’er et al. 2001)

Signaling networks in a cell

• Regulator-motif associations in nodes can have different meanings:

• Need other data to confirm binding relationship between regulator and target (e.g. ChIP-chip)

• Still, can determine statistically significant regulator-target relationships from regulation program

Direct binding Indirect effect Co-occurrence

Network inference

Example: oxygen sensing and regulatory network

• ChIP-chip: genome-wide protein-DNA binding data, i.e. what promoters are bound by TF?

• Investigate regulatory network model: use ChIP-chip data in place of motifs (no motif discovery)– Features: (regulator, TF-

occupancy) pairs

TFP2P1

Binding data for regulatory networks

Inferring regulatory networks from the combination of expression data and binding data

FOSMYC

CEBPXBP1

IVNS1ABP

CHAF1B

C140RF43

TXNIPPAWR

ZNF394

RUVBL1

RFC1ZNF500 TTF2

RAB18 ZKSCAN1

HDAC1ZBTB41

THRAP1

VPS72TLE3

BHLHB2

ZNF239

HIF1AHEY2

An extended ER regulatory network in MCF7 cells

Signaling molecules -- Networks

• Find all SMs that associate as regulators with a particular TF’s ChIP occupancy in ADT features

• e.g.

• Hypothesis: Glc7 phosphatase complex interacts with Hsf1 in regulation of Hsf1 targets (Interaction supported in literature)

Hsf1Gac1Gip1Sds22

Glc7 phosphatase

complex

TFSM mRNA

Input Data

Ab initio Motif Discovery Programs

Statistical Methods

STAMP Matching

Results

•SeqLog

•PWM

•P-value

•Known or novel motifs

•Bootstrap re-sampling

•Fisher test

•Weeder

•MaMf

•MEME

•FASTA file

•Contact Info

•Control data (optional)

http://motif.bmi.ohio-state.edu/ChIPMotifs/

http://motif.bmi-ohio-state.edu/HRTBLDb

Software Demo

• W-ChIPMotifs• HRTargetDB

Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical...

Documents

Transcript of Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical...

Chapter 4 · Chapter 4 Decision Trees: Theory and Algorithms Victor E. Lee ... Kent State University Kent, OH lliu@cs.kent.edu Ruoming Jin Kent State University Kent, OH jin@cs ...

Jin Mao Tower - Derya Dincelusers.metu.edu.tr/archstr/BS536/documents/Projects/Jin... · 2016. 2. 17. · JIN MAO TOWER Case Study: Jin Mao Tower by Derya Dinçel Submitted to: Günel,

Migration Motif: A Spatial-Temporal Pattern Mining Approach for Financial Markets Xiaoxi Du, Ruoming Jin, Liang Ding, Victor E. Lee, John H.Thornton Jr.

Departmentally Managed Research Server Security Audit ... · Dr. Victor Jin, Molecular Medicine Associate Professor Dr. Jack Lancaster, Research Imaging Institute Associate Director

Transcription, Reverse Transcription, and Analysis of … reverse transcription and... · Transcription, Reverse Transcription, and Analysis of RNA ... the raw data suggest that this

Overview Transcription Detail Another Transcription Animation.

GIRLS INCORPORATED OF THE ISLAND CITY...Tracy Hart Loretta Huahn George Humphreys Victor Jin Marilyn Johnson Ken Kofman Joye Korth Myra Lander Ingrid Larmirault Joann Lem Laura Miles

Transcription · Transcription Set Transcription basics Advanced technology doesn’t have to be complicated. The SpeechExec Transcription Set 7177 is a digital transcription system

Oranda Jin

SINGAPORE INSTITUTE MANAGEMENT AND ITS SUBSIDIARIES ...€¦ · Mr Tan Soo Jin Mr Victor Liew Cheng San 24 March 2016 Report on the financial statements We have audited the accompanying

Sung Jin Woo Jin Sung Hoon

Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.

Jin stirrup

Transcription RNA Polymerases and General Transcription Factors.

NUTCRACKER JINGLES · 2020. 4. 21. · jin gl- -ing jin - gle, jin gle- bells, jing, jin gl- -ing jin - gle, 3 3 &b Jin mf - gle bells, jin gle- bells, jin gle- all the &b 33 &b mf

Jin Dynasty

Jin Unidad4

Chromatin Structure & Dynamics Victor Jin Department of Biomedical Informatics The Ohio State University.

Victor Insulators, Inc. We have the very last, limited ...victormug.com/Victor.pdfA SHORT HISTORY OF VICTOR, VICTOR INSULATOR AND THE VICTOR MUG VICTOR, NEW YORK The Village of Victor

Transcription vs Translation. Central Dogma Transcription Translation.