Many Sample Size and Power Calculators Exist On-Line 1 Day 3 Sessions 16-21 JCF.pdfPathways vs....
Transcript of Many Sample Size and Power Calculators Exist On-Line 1 Day 3 Sessions 16-21 JCF.pdfPathways vs....
http://homepage.divms.uiowa.edu/~rlenth/Power/
Many Sample Size and Power Calculators Exist On-Line
Day 3Session 16: Questions and follow-up….James C. Fleet, PhDDistinguished ProfessorDepartment of Nutrition Science
Pete Pascuzzi, PhDAssistant ProfessorPurdue Libraries
Day 3Session 17: Visualization III: NetworksJames C. Fleet, PhDDistinguished ProfessorDepartment of Nutrition Science
Pete Pascuzzi, PhDAssistant ProfessorPurdue Libraries
Creixell etal.(2015)NatureMethods12:615
Pathways vs. Networks
Pathway: • Small scale• Well-studied• Known linear relationship• Easily visualized and
interpreted
Network: • Large scale• Integration of multiple studies• Hard to visualize and interpret• Contain novel information not
covered in pathways
Both aggregate molecular events across multiple genes . Increases statistical detection threshold by the number of hypotheses tested
Fleet2016
PatternsofRegulationinGenomicData:GuiltbyAssociation
• Human primary fibroblast cultures
• Serum starvation and refeeding
•9600 transcripts, spotted cDNA array
•Hierarchical clustering
* Genes in common cluster = common molecular regulation?
Iyer etal., 1999,Science283:83 Fleet2016
Creixell et al. (2015) Nature Methods 12: 615
Gene set enrichment
Subnetwork construction and
clustering
Network-based modeling
Simple but discard known biological network information
Fleet2016
Networks Integrate Information
DongandHan(2008)CellRes18:224 Fleet2016
http://cytoscape.org/
……an open source software tool for integrating, visualizing, and analyzing data in the context of networks.
This does not do primary network building from your dataset.
Fleet2016
Data Format: *.txt or Excel (1st worksheet only)
Valid Expression Value type Expected Values
Ratio (0, +INF)Fold Change (-INF, -1) (1, +INF)LogRatio (-INF, +INF).p-value (0, 1)FDR, q-value (0, 100)Intensity (0, +INF)RPKM/FPKM (0, +INF)
(INF = infinity)
Identifier G1 Value G1 FC G1 FDR G2 value G2 FC G2 FDR
Header Row
Up to 20 observations per treatment/group
* IPA can average
Stats done prior to IPA
Fleet2016
Ingenuity Network AnalysisExpression
Dataset
IPA Knowledge
Base
Network Generation
Network
DEG
Genes in Network
Network Scoring
DEG fit to a probabilistic fit
to networksScored Genes
Associated Functions
DEG fit involved in a
biological function
DEG = differentially expressed genes
Wednesday
BREAK #1
Day 3Session 18: Flexible time for reinforcement…James C. Fleet, PhDDistinguished ProfessorDepartment of Nutrition Science
Pete Pascuzzi, PhDAssistant ProfessorPurdue Libraries
Day 3 Session 19:
Patrick FinneganHardware EngineerPurdue University
Tour of Data Center and Conte Cluster
Day 3Session 20: Introduction to NGSJames C. Fleet, PhDDistinguished ProfessorDepartment of Nutrition Science
Pete Pascuzzi, PhDAssistant ProfessorPurdue Libraries
GenomicsWGS, WES
Transcript-omics
RNA-Seq
Epigenomics Bisulfite-Seq
ChIP-Seq
Indels
SNP
CNV
Structural
DGE
Fusion
Splicing
Editing
Methyl DNA
Histones
TF binding
Functional effect of
mutation
Network + pathway analysis
Integrative analysis
Discovery and
Application
TechnologyData
Analysis Integration and
interpretation
Modified from Shyr D, Liu Q. Biol Proced Online. (2013)15,4 Fleet2016
Voelkerding et al., J Mol Diagn (2010) 12,539-51.
Library preparation
Library amplification
Parallel sequencing
Read 2
Read 1
NGSSequencingPipelineInput Fragment Add adapter
Fragment library
Reads
Fleet2016
Sims et al. (2014) Nat Rev Genet 15:121
HowMuchSequencingisEnough?
Read Length
# Re
ads/
Rxn
Target Coverage # reads
DNA variation 10-30X N/A
ChIP-seq 100X N/A
RNA-seq (DEG)(rare)
N/A 20 million100+ million
https://genohub.com/next-generation-sequencing-guide/Fleet2016
http://apps.bioconnector.virginia.edu/covcalc/
Stephen Turner, PhD; Director. University of Virginia Bioinformatics CoreFleet2016
UnderstandingRNA-seq
Fleet2016
Modified from Wang et al. (2009) Nat Rev Genet 10:57
Correlation between RNA-seqand Microarray Analysis
Analysis from two different S. cervasisiae papers using the same growth conditions
Tilin
g A
rray
(log
2)
RNA-seq (log2)
Zhao et al. (2014) PLOS One Activated T cells
Arr
ay (l
og2)
RNA-seq (log2)Fleet2016
https://bioinfomagician.wordpress.com/2014/01/28/rna-seq-vs-microarray-what-is-the-take/
RNA-seq vs.Microarray:Whichis“better”?
Issue Microarray RNA-seq
Reproducibility High High
Dynamic Range Modest Wide
Sensitivity Low/Medium High
Accuracy High High (but better for FC)
Cost Low High
Complexity of analysis
Low High
Species Limited to available platforms
Any species possible
Fleet2016
Wednesday
BREAK #2
Day 3Session 21: Visualization IV: IGVJames C. Fleet, PhDDistinguished ProfessorDepartment of Nutrition Science
Pete Pascuzzi, PhDAssistant ProfessorPurdue Libraries
Fleet2016
Data for Visualization Lives Here……
Fleet2016
File type Description
SAM Tab-delimited text file of sequence alignment data (i.e. primary read data)
BAM Binary version of the SAM file
Bedgraph Display of continuously valued data (e.g. transcriptome)
Wiggle (Wig) Display of continuously valued data (e.g. transcriptome)
bigWig Displays dense continuous data from Wig or bedgraphfiles for faster viewing
BED Tiled data file that defines a feature track
TDF Binary tiled data file that has been preprocessed for faster displays in IGV (e.g. for ChIP- and RNA-seq data)
narrowPeak Called peaks of signal enrichment based on pooled, normalized data
Types of Files Commonly Used
http://www.broadinstitute.org/igv/FileFormats Fleet2016
BroadPeaksNarrowPeaksbigWig
Bam
Refseq genes
ChromosomalLocation
IGV Displays Various File TypesMouse Large Intestine DNAse-Seq Data from ENCODE
Fleet2016
Visualizing RNA-seq Data
Cell line
Tissue
?