PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a...

18
PPARgamma in adipocyte differentiation - a ChIP-Seq case study Example analysis using Genomatix technologies to study a ChIP-Seq data on PPARgamma. Intention and extent This case study shows an example of an analysis workflow suitable for ChIP-Seq data. It is intended to show options and approaches. This study will cover topics such as: peak finding and analysis for known transcription factor binding sites, definition of de novo binding site matrices from cluster sequences, identification and analysis of potential target genes including associated pathways, promoter analysis and identification of a common regulatory framework in a gene subset and subsequent scan of all annotated promoters for matches for this framework, positional correlations for different data sets, data visualization. Data source This study is based on data from a publication studying PPARgamma, a key regulator in adipocyte differentiation. Using ChIP-Seq Nielsen et al. (Genes Dev. 2008; 22(21): 2953–2967, PMID: 18981474) followed the changes in the genome-wide profile of PPARgamma, RXR and PolII binding sites during adipocyte differentiation over 6 days. For demonstration we will focus on the changes in PPARgamma binding sites between day 0 and day 6, analyze these and extract associated genes and pathways. For both time points, 3 replicates for the PPARgamma ChIP are available. For correlations, data sets for RXR and PollI from the same publication will be included. Workflow overview Figure1: workflow for this case study © Genomatix 2012

Transcript of PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a...

Page 1: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

PPARgamma in adipocyte differentiation - a ChIP-Seq case study

Example analysis using Genomatix technologies to study a ChIP-Seq data on PPARgamma.

Intention and extent

This case study shows an example of an analysis workflow suitable for ChIP-Seq data. It is intended to show options and approaches. This study will cover topics such as:

• peak finding and analysis for known transcription factor binding sites,

• definition of de novo binding site matrices from cluster sequences,

• identification and analysis of potential target genes including associated pathways,

• promoter analysis and identification of a common regulatory framework in a gene subset and subsequent scan of all annotated promoters for matches for this framework,

• positional correlations for different data sets,

• data visualization.

Data source

This study is based on data from a publication studying PPARgamma, a key regulator in adipocyte differentiation. Using ChIP-Seq Nielsen et al. (Genes Dev. 2008; 22(21): 2953–2967, PMID: 18981474) followed the changes in the genome-wide profile of PPARgamma, RXR and PolII binding sites during adipocyte differentiation over 6 days.

For demonstration we will focus on the changes in PPARgamma binding sites between day 0 and day 6, analyze these and extract associated genes and pathways. For both time points, 3 replicates for the PPARgamma ChIP are available. For correlations, data sets for RXR and PollI from the same publication will be included.

Workflow overview

Figure1: workflow for this case study

© Genomatix 2012

Page 2: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Mapping

The first step in NGS data analysis is the alignment (also called "mapping") of the raw sequences against reference sequences such as genomes or transcriptomes. The mapping on the Genomatix Mining Station (GMS) is performed in two steps: first all potential mapping positions for the reads are identified through short unique sequence stretches (anchors) followed by a whole read alignment to find the best match.

Sequence type detection and nucleotide statistics calculation are automatically performed on a GMS during data upload and quality control. Statistics include number of reads, GC content and nucleotide distribution over read length.

Using the graphical user interface (GUI) on a GMS, several mappings can be started at the same time. Figure 2 shows the setup screen for the PPARgamma samples from day 0. The 32 nt raw sequences were mapped against the mouse genome library (NCBI_build37) allowing one point mutation in the first mapping step (deep) and requiring at least 92% alignment quality for the whole read. The alignment results are reported for uniquely mapping reads but also for reads with up to 50 hits (multiple hits) in bigBED and BAM file format. These files can be converted to BED and SAM format during result export.

Figure 2: Settings for genomic mapping of day 0 PPARgamma-ChIP data.

After completion of the mapping the results can be accessed from the interface and a mapping statistics is shown. In total, 7 and 6 million reads were mapped uniquely for day 0 and day 6, respectively (Figure 3). Only these were used for further analysis on the Genomatix Genome Analyzer (GGA).

© Genomatix 2012

Page 3: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Figure 3: Mapping statistics for PPARg day0 (sample 2): Unique hits - reads mapping only once in the genome; multiple hits - reads mapping between 2 and 50 times in the genome; ambiguous hits - reads mapping more than 50 times in the genome; insufficient quality hits - reads which could not be mapped fulfilling the alignment quality; ignored hits - reads where no anchor seed could be found.

Downstream analysis

The downstream analysis was performed on the Genomatix Genome Analyzer (GGA) which provides a user friendly interface to the whole Genomatix Software Suite and the NGS-Data analysis module. Data generated on the GMS are directly accessible from the GGA.

Data import

The data were imported via the file upload page which can be accessed from all tasks (use the „Add BED files ...“ button) and allows direct upload from the GMS, mounted storage devices or local computers. All BED or bigBED files uploaded for the active project are then be displayed in the project management and are available for further analysis.

ChIP-Seq workflow

To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can be found in the ‘NGS Analysis’ menu of the navigation bar on top of the page. The workflow comprises the following steps:

• peak finding (clustering) using three algorithms (NGSAnalyzer, MACS, SICER) for samples with and without replicates and controls and a subsequent evaluation using DESeq, edgeR or the Audic & Claverie approach.

• read and cluster classification for overlap with genomic features such as exons, introns, promoters and intergenic regions.

• analysis of TF binding sites for overrepresentation in the peak sequences

• extraction of sequences underlying the peaks (from reference genome)

• de novo motif definition for generation of a new or confirmation of a known site.

All these tasks can be setup in one go (Figures 5 & 7):

For this example, the replicates for PPARgamma day 6 were selected as experiment and replicates from day 0 as control. PPARgamma should not be expressed at this stage so that these samples can be considered as background.

© Genomatix 2012

Page 4: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Figure 5: ChIP-Seq workflow setup: All BED files uploaded within the active project are available for analysis and can be selected as treatment or control samples.

For clustering, default settings (NGSAnalyzer with 100bp window size and automatic read density threshold calculation based on Poisson distribution) were used.

Only clusters which were present in at least 2 replicates (65%) with an overlap of 100 bp were considered. For statistical evaluation of the remaining clusters edgeR was used (default).

Further options, like ‘Cluster Classification and Statistics’, ‘Extraction of Sequences for all Clusters’, ‘Transcription Factor Binding Site Overrepresentation’, and ‘Definition of new Binding Sites in Clusters’ are selected by default.

Figure 6: ChIP-Seq workflow setup: Selection of peak finding algorithm and parameter setup for replicate treatment and statistical analysis.

As a last step, the analysis was named and submitted.

© Genomatix 2012

Page 5: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Figure 7: Naming and submitting the analysis.

After completion of the analysis, the result can be accessed through the link provided in the notification email or via the ‘Project Management’ under ‘Project & Accounts’ in the navigation bar.

The result page lists the parameters and programs used and the results of the subtasks selected. All results can be downloaded or saved in the ‘Project Management’.

The clustering results

In this example, more than 10,000 clusters were called in the single samples, but only 8,291 are detected in at least two PPARgamma-day6-ChIP replicates. Of these 7,747 clusters show a statistical significant enrichment compared to the day0 controls. This number is comparable to the results from Nielsen et al. who report about 7,000 PPARgamma enriched regions.

11.6% of these are located in promoter regions, which corresponds to an 4.5 fold enrichment.

All BED containing the positional information for the different cluster categories can be downloaded or saved in the „Project Management“ for further (more detailed) analyses. For this example it is sufficient to save the BED file for the significant enriched regions in the "Project Management"(PPARg_day6_vs_day0_enriched_regions.bed).

Figure 8: ChIP-Seq workflow resu l ts : C lus ter ing resu l t overview shows that 8,291 PPARgamma peaks are found in at least 2 samples in day 6 but not in day 0. All detailed results can be downloaded.

© Genomatix 2012

Page 6: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Transcription Factor Binding Site Overrepresentation in clusters

The analysis of predicted transcription factor binding sites in the cluster regions shows a clear enrichment for the V$PERO binding site family, which comprises the PPAR/RXR heterodimer binding sites (DR1 elements). TF-binding site families combine binding sites from transcription factors with similar matrix and biology and thereby avoid unnecessary large and confusing outputs. The top scoring of V$PERO shows that the ChIP enrichment was successful (Figure 9).

Also among the top scoring families is V$RXRF, which contains binding sites for other RXR heterodimers.

Figure 9: ChIP-Seq workflow results:Overrepresentation analysis for transcription factor binding sites. Top ranking family V$PERO contains the PPARgamma/RXR heterodimer binding sites (DR1 elements). The links underlying the family abbreviations provide comprehensive information on members and the generation of the matrix family.

Finding new binding sites in clusters: de novo motif definition

The last part of the workflow, the de novo binding site definition, yields the IUPAC consensus motif NNAGSNSAGNN with S standing for C or G. The Workflow uses fixed parameters and is optimized for compact binding sites, thus it picks up only one conserved half site of the PPARg/RXR binding site. To improve the results, the analysis can be rerun with refined parameters using the task ‘CoreSearch’ (see below) accessible under ‘Pattern Definition’ in the navigation bar. Therefore, it is recommended to save the sequences of the top 1,000 regions and/or all clusters.

Extended TF- binding site analysis

Overrepresentation of TF families has been covered as part of the workflow. The same analysis can be performed for individual matrices or TF-modules with one fixed partner using the ‘Overrepresented TF binding sites’ task under ‘NGS Analyses’. For this analysis the previously saved BED file (PPARg_day6_vs_day0_enriched_regions.bed) containing the positions of the significant regions can be used.

The top scoring individual matrix is V$PPAR_RXR.0.1, which describes the PPAR/RXR heterodimer binding sites (DR1), with matches in more than 50% of the input sequences (Figure 10).

Figure 10: Overrepresentation analysis for individual matrices within the enriched peak regions yieldsV$PPAR_RXR binding sites as top scoring.

© Genomatix 2012

Page 7: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

The "Module overrepresentation" subtask searching for combinations of other binding sites with V$PERO (i.e. potential interaction partners) within 50 bp distance returns with frequent combinations of V$PERO with V$NF1F, V$NR2F, the well-known partner V$RXR but also with V$CEBP. These results are in line with the original publication where the authors report a high overlap between PPARg, RXR and C/EBP binding sites.

Figure 11: Analysis of transcription factor combinations with V$PERO between 10 to 50 bp shows an overrepresentation of V$NF1F binding sites. The underlying distances are displayed in a graph behind the ‘list‘-link (see figure 12 left). The distance score can be used as indicator for a preferential distance between two transcription factor binding sites.

Support for a functional interaction between the PPAR/RXR site binding protein and one or more V$NRF1 family members comes from the distance relation of the binding sites (Figure 12, left). A quick check for literature cocitations in GePS revealed that PPARgamma can inhibit NF-I binding (Figure 12 right).

Figure 12: left: display of observed distances between the V$PERO and the V$NF1F site show a preference at about 15 bp, hinting to a functional interaction.right: Cocitation analysis for PPARgamma and RXRalpha with members of the V$NF1F binding site family (human).

Refined de novo motif definition

With the background knowledge that PPARgamma binds the direct repeat of AGGTCA the motif definition task can be rerun with a 9 bp alignment core (instead of the 7bp used in the workflow) and a reduced sequence constraint (at least 50% of sequences must contain the motif instead of 75%) for the sequences of the top 1,000 clusters. Using these parameters the program returns a matrix with the consensus “N NGGNCA G AGGNN” which resembles the DR1 element and the matrix presented in the publication. Figure 13 shows the nucleotide distribution matrix and the sequence logo.

© Genomatix 2012

Page 8: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Figure 13: Nucleotide distribution matrix and sequence logo for de novo binding site generated from the top 1,000 cluster sequences.

Biological classification of neighboring genes

The aim of most ChIP-Seq experiments is to identify potential target genes which can then be associated with pathways to explore the underlying mechanisms. Although long distance regulation occurs, proximal effects play an important role in gene regulation. Genes located in proximity of the binding sites can be identified by either correlation of primary transcripts with enriched regions (using GenomeInspector) or by annotation of these regions for overlap with promoters or nearby genes (using ‘Annotation and statistics‘ under ‘NGS Analysis‘, Figure 14).

Figure 14: Setup screen for ‘General annotation and statistics‘ used to identify regions overlapping with various genomic features ncluding genes and promoters but also for identification of gene located up- and downstream of the enriched regions.

© Genomatix 2012

Page 9: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

After submission, the regions will be annotated for overlap with loci, exons, introns, promoters, transcription start sites, intergenic regions, microRNAs and repeats but also for the next neighboring genes up- and downstream from the region for both sense and anti-sense strand. A statistic will be displayed and the results can be downloaded completely or filtered for one or more of the categories. The results can be browsed (Figure 15) and GeneIDs of all genes overlapping with the input region or with their promoter can be extracted (Figure 16).

Figure 15: ‘Annotation and Statistics‘ result page: neighboring genes and overlapping features are listed for each region, links to further gene information and the GenomeBrowser for visualization are provided.

Figure 16: ‘Annotation and Statistics‘ result page: regions can be filtered by overlap and geneIDs of nearby genes can be extracted.

For this example, the geneIDs of genes where promoters overlapped with PPARgamma enriched regions were downloaded as text file. To analyze the corresponding genes, the gene IDs can then be transferred to the Genomatix Pathway System by simple copy and paste or upload of the saved file.

© Genomatix 2012

Page 10: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Pathway analysis with GePS

The Genomatix Pathway System uses information from public sources combined with proprietary databases to characterize gene lists based on statistical analysis of literature, pathways and GO- and MeSH-terms. Pathways and networks can be generated and superimposed with user data. GePS can be accessed from the navigation bar under ‘Genomes & Data’.

Figure 17: Genomatix Pathway System (GePS) overview screen showing the different entry options.

To analyze the genes with PPARgamma binding sites in the promoter region, the file containing the geneIDs was uploaded and the organism was selected (Figure 18). Alternatively, the geneIDs could have been pasted into the setup screen.

Figure 18: Genomatix Pathway System setup screen. GeneIDs or symbols can be entered via copy and paste or file upload. Available annotation types are listed. These will be used for classification and can be used as data filter for the analyzed genes.

The first result GePS delivers is a characterization of the gene list based on pathways, Gene Ontology, MeSH-term and Genomatix proprietary annotation. Overrepresentation of biological terms associated with genes from the input list are calculated and listed in the left panel together with the respective p-value.

© Genomatix 2012

Page 11: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Canonical pathways are only available for human but for other organisms genes can be mapped to the human orthologs before the analysis. Here literature based pathways (from Genomatix Literature Mining) were considered and show PPARgamma and alpha pathways as top scorers. The top ranking processes and diseases are related to metabolism. The tissue filter shows peroxisomes and adipocytes and even the cell line used in the experiment (3T3 L1). Reassuring is that PPARgamma is the most cocited transcription factor for the genes analyzed, indicating an enrichment for potential PPARgamma targets. The results fit well with PPARgamma being a key player in lipid metabolism.

The results can be used as filters for networks or to construct new ones. The network below was generated by clicking on the top ranking pathway ‘Peroxisome proliferative activated …’. It shows PPARgamma as central transcription factor and known target genes such as Lpl. Dotted connection lines indicate automatically retrieved literature cocitations while solid lines indicate expert curated annotation. The latter ones show for example that Lpl and Sod1 are activated and Adipoq is inhibited by PPARgamma. Ucp2 and Rxra are greyed out since these two genes do not fulfill the additional filter ‘lipid metabolic process’ under ‘Biological Processes’ applied (Figure 20).

Comprehensive information about genes and connections can be retrieved by double click on the gene symbol and the line, respectively (Figure 21).

Figure 19: Gene classification results for genes with PPARgamma binding in the promoter based on Genomatix literature Mining, GO- and MeSH-terms.

© Genomatix 2012

Page 12: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Figure 20: Network generated for genes assigned to the literature pathway ‘lipid Peroxisome proliferative activated receptor alpha‘ and filtered for additional assignment to the biological process GO-term ‘lipid metabolic process‘ based on literature cocitations. Genes in yellow boxes fulfill both criteria, genes in grey boxes are not assigned to the GO-term ‘lipid metabolic process‘. Solid and dotted lines represent expert curated and literature retrieved interactions, respectively. Arrows indicate direct activation, diamonds modulation, and line/circle indicated inhibition.

Figure 21: Additional information that can be browsed in the Genomatix Pathway System upon double click on the gene or connection of interest.

© Genomatix 2012

Page 13: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Identification of common regulatory elements in promoters

Transcription factors often act synergistically to achieve and coordinate cell type specific gene expression. These functional combinations are often conserved in terms of organization, distance, and orientation of the individual elements forming so-called modules or frameworks.

The GePS network (Figure 20) shows that PPARgamma activates Lpl (lipoprotein lipase), Ucp2 (uncoupling protein 2) and Scd1 (stearoyl-CoA desaturase 1), all expressed in adipocytes. To investigate whether these three genes share regulatory elements their promoters were extracted and searched for common frameworks.

Promoter sequence extraction

The promoters for all alternative transcripts were extracted from the Eldorado database using ‘Gene2Promoter’ under ‘Genomes & Data’ (Figure 22). Mus musculus was selected as organism and the three gene symbols were entered into the keyword search section.

Figure 22: Gene2Promoter input page.

The summary on top of the result page lists a total of 36 transcripts and 14 promoters for the three input genes which are shown in the table below (Figure 23)

© Genomatix 2012

Page 14: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Figure 23: Interactive Gene2Promoter result page listing all alternative transcripts and promoters for selected genes. Additional information such as conservation and CAGE tag support are provided together with links for more comprehensive information and visualization.

10 of the 36 promoters belong to relevant transcripts (2 for Lpl and Scd1, 6 for Ucp2). Only these were selected for further analysis with FrameWorker.

Figure 24: Interactive Gene2Promoter result page: Promoters can be selected and tested for presence of transcription factor binding sites, corresponding sequences can be extracted and directly analyzed in serval subtasks.

Identification of common regulatory elements

The low number of sequences allowed an exhaustive analysis in FrameWorker, meaning that all promoter combinations for the three genes will be tested separately, resulting in 24 combinations. The analysis was run with default parameters except that the maximum distance variance was increased to 20. One of the 24 combinations returned a framework consisting of three transcription factor binding sites: V$RXRF, V$KLFS

© Genomatix 2012

Page 15: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

and V$EGRF with distances of roughly 80 and 100 bp between the single sites (Figure 25). The model does not contain a PPARgamma site but members of the three families, while not directly linked to adipocytes, are associated with lipid homeostasis, glucose transport and response to glucose and insulin stimulus, respectively.

Figure 25: FrameWorker result: Transcription factor combination (framework) common to promoters from the three input genes (Lpl, Ucp2 and Scd1) consisting of three transcription factor binding site families with defined distance and orientation. The framework was saved and all mouse promoters were subsequently scanned for matches.

The model was saved and subsequently used for a ModelInspector analysis.

Identification of genes sharing the identified model and overlay with meta-data

ModelInspector is a program that performs a sequence scan for presence of predefined TF-combinations, called frameworks or modules. For this example, all mouse promoters of annotated genes were scanned for the presence of the V$RXRF-V$KLFS-V$EGRF-framework returning 271 matches in promoters of 199 genes. The included GO-term analysis showed ‘metabolic process’ as top category with 115 associated genes and a very low p-value, indicating that the module can enrich for genes associated with metabolism.

The 199 geneIDs were extracted and imported into GePS. Figure 26 shows the network which was generated by starting with PPARgamma and the option to extend networks by frequently cocited genes. The dots on both sites of the gene boxes are the visualization of the ChIP-Seq enrichment (in promoter regions) which have been imported as metadata. Absence of PolII clusters in promoters can indicate reduced gene transcription but can also indicate a very short initiation time, thus not leading to enrichments.

Figure 26: Network generated from genes fulfilling two criteria: a) being identified in the ModelInspector run as harboring the V$RXRF-V$KLFS-V$EGRF framework in at least one promoter and b) being cocited with PPARgamma in PubMed abstracts. The dots besides gene boxes i n d i c a t e t h e p r e s e n c e o f PPARgamma, RXR or PolII clusters called in the data from Nielsen et al. (2008)

© Genomatix 2012

Page 16: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Correlation between different data sets

PPARgamma binds to peroxisome proliferator response elements as a heterodimer with retinoic X receptor (RXR) and RXR binding sites have been found to be overrepresented in the TF analysis (see above). Therefore, it would be interesting to analyze the overlap between PPARgamma and RXR binding sites. The RXR-ChIP data are derived from the same publication and have been processed similar to the PPARgamma set.

Positional correlations between genomic elements and/or user data can be performed in the task ‘GenomeInspector’ which can be accessed from ‘NGS Analysis’ in the navigation bar. Using the PPARgamma set as an anchor and calculating the distance distribution profile for the RXR data set results in the curve shown in Figure 27. Regions contributing to the correlation can be extracted from both sets and used for further analysis (e.g. annotation and pathway analysis or framework analysis).

Figure 27: Positional correlation of PPARgamma enriched regions (aligned with their middle at 0) with the RXR enriched regions generated in GenomeInspector. The graph shows a clear overlap between the two data sets. Regions contributing to the correlation can be extracted.

Data visualization

In the genome browser the data can be visualized in the genomic context, overlayed with general annotation, proprietary data from Genomatix or other ChIP-Seq or RNA-Seq data sets. This allows an integration of different datasets and a quick assessment of the state at the locus of interest. Figure 27 shows the Scd1 locus (located on the antisense strand) with PPARgamma, RXR and PolII raw reads and the positions of the called clusters. The graph shows only background for the PPARgamma data at day 0 but a strong enrichment at 5‘ promoter and several upstream and downstream regions, indicating potential enhancer regions. The RXR data show a similar picture. At day 0, PolII is found at the potential enhancer regions and the promoter. After adipocyte differentiation at day 6, PolII is no longer enriched at the promoter and enhancers but spreads over the whole gene body - reflecting the PPARgamma expression.

© Genomatix 2012

Page 17: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

Figure 28: Visualization of the Scd1 locus in the genome browser. Alternative transcripts are shown in black. Single reads are shown for day 0 and day 6 for PPARgamma (blue), RXR (read) and PolII (green). For day 6 these are overlayed with the called clusters in the same but lighter color.

Summary

Based on the data published by Nielsen et al. (2008) we showed comprehensive ChIP-Seq analysis pipeline from mapping down to pathway analysis.

The raw reads were mapped to the mouse genome and unique alignments were clustered to identify regions of enriched read density indicating PPARgamma, RXR or PolII binding, respectively. The 7,747 regions identified in the PPARgamma data set showed a strong overrepresentation of in silico predicted PPARgamma binding sites indicating the successful ChIP experiment. Further analysis showed frequent co-occurrence of V$NF1F binding sites in about 15 bp distance and CEBP binding sites. The latter being in agreement with the publication. De novo motif definition extracted the “N NGGNCA G AGGNN“ consensus sequence, which resembles parts of the DR1 element, the known PPARgamma/RXR heterodimer binding site.

To identify potential PPARgamma targets, genes up- and downstream of the enriched regions were determined. Genes with PPARgamma binding within their promoter were extracted and analyzed with the Genomatix Pathway System. Overrepresented pathways, GO- and MeSH-terms indicated PPAR pathways and general metabolic processes. The TF most frequent cocited with these genes is PPARgamma, again confirming the experiment. In the network generated from the top scoring pathway ‘Peroxisome proliferative activator …’. expert curated annotation shows direct activation of the three genes (Lpl, Scd1, Ucp2) by PPARgamma. The 10 relevant promoters from the three genes were exhaustively analyzed for common regulatory motifs. A V$RXRF-V$KLFS-V$EGRF was detected and used to scan all mouse promoters. This scan yielded 271 matches in promoters of 199 genes. GO-term analysis for these genes revealed an association with ‘metabolic processes’. Furthermore, the overlap between the PPARgamma and RXR enrichment was determined. And finally, the data sets were visualized in the genomic context.

© Genomatix 2012

Page 18: PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

For more information on Genomatix solutions and services, please visit:

http://www.genomatix.com

Visit

http://www.youtube.com/user/GenomatixWebcasts

for tutorials and demo videos.

Find us on facebook at:

http://www.facebook.com/genomatix

© Genomatix 2012

Contact USA

Genomatix Software Inc.3025 Boardwalk, Suite 160Ann Arbor, MI 48108USA

phone +1 877 436 6628email [email protected]

Contact Germany

Genomatix Software GmbHBayerstr. 85a80335 MunichGermany

phone +49 89 599766 0email [email protected]

http://www.genomatix.com