RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as...

75
RNA-seq Differential analysis

Transcript of RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as...

Page 1: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

RNA-seqDifferential analysis

Page 2: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

DESeq2

Page 3: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

DESeq2

http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

Page 4: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Input data

Page 5: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Why un-normalized counts?

• As input, the DESeq2 package expects count data as obtained, e.g., from RNA-seq or another high-throughput sequencing experiment, in the form of a matrix of integer values. The value in the 𝑖𝑖-th row and the 𝑗𝑗-th column of the matrix tells how many reads can be assigned to gene 𝑖𝑖 in sample 𝑗𝑗.

• The values in the matrix should be un-normalized counts or estimated counts of sequencing reads (for single-end RNA-seq) or fragments (for paired-end RNA-seq).

• It is important to provide count matrices as input for DESeq2’s statistical model (Love, Huber, and Anders 2014) to hold, as only the count values allow assessing the measurement precision correctly.

Page 6: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Sample annotation

Page 7: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

htseq-count input

• You can use the function DESeqDataSetFromHTSeqCount if you have used htseq-count from the HTSeq python package.

Page 8: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

htseq-count input

Page 9: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Pre-filtering

• There are two reasons which make pre-filtering useful: by removing rows in which there are no reads or nearly no reads, we reduce the memory size of the dds data object and we increase the speed of the transformation and testing functions within DESeq2. Here we perform a minimal pre-filtering to remove rows that have only 0 or 1 read.

Page 10: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factor levels

• A factor is a type of vector for which the elements are categorical values.

Page 11: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factor levels

Page 12: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factor levels: reordering

Page 13: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factor levels: relabeling

Page 14: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factor levels

• By default, R will choose a reference level for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels.

• Setting the factor levels can be done in two ways, either using factor, or using relevel, just specifying the reference level.

Page 15: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factor levels

Page 16: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Differential expression analysis

Page 17: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

DESeq()

• The standard differential expression analysis steps are wrapped into a single function, DESeq.

Page 18: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

results()

• Results tables are generated using the function results, which extracts a results table with log2 fold changes, p values and adjusted p values.

• With no additional arguments to results, the log2 fold change and Wald test p value will be for the last variable in the design formula, and if this is a factor, the comparison will be the last level of this variable over the first level.

• However, the order of the variables of the design do not matter so long as the user specifies the comparison using the name or contrast arguments of results.

Page 19: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

results()

• Details about the comparison are printed to the console, above the results table. The text, condition ZIKA vs MOCK, tells you that the estimates are of the logarithmic fold change log2(ZIKA/MOCK).

Page 20: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Ordering results

• We can order our results table by the smallest adjusted p value:

Page 21: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

More info on results

• Information about which variables and tests were used can be found by calling the function mcols on the results object.

Page 22: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Note on p-values set to NA

• Some values in the results table can be set to NA for one of the following reasons:

• If within a row, all samples have zero counts, the baseMean column will be

zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.

• If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance.

• If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA.

Page 23: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

How many adjusted p-values?

• How many adjusted p-values were less than 0.1?

Page 24: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

summary()

• We can summarize some basic tallies using the summary function.

Page 25: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

summary()

• Note that the results function automatically performs independent filtering based on the mean of normalized counts for each gene, optimizing the number of genes which will have an adjusted p value below a given FDR cutoff, alpha.

• By default the argument alpha is set to 0.1. If the adjusted p value cutoff will be a value other than 0.1, alpha should be set to that value.

Page 26: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

summary()

Page 27: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Exploring results

Page 28: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

MA-plot

• In DESeq2, the function plotMA shows the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples.

• Points will be colored red if the adjusted p value is less than alpha. Points which fall out of the window are plotted as open triangles pointing either up or down.

Page 29: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

MA-plot

Page 30: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

MA-plot

• It is also useful to visualize the MA-plot for the shrunken log2 fold changes, which remove the noise associated with log2 fold changes from low count genes without requiring arbitrary filtering thresholds.

• Here we provide the dds object and the number of the coefficient we want to moderate.

Page 31: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

MA-plot

Page 32: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

MA-plot

Page 33: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

plotCounts()

• It can also be useful to examine the counts of reads for a single gene across the groups. A simple function for making this plot is plotCounts, which normalizes counts by sequencing depth and adds a pseudocount of 1/2 to allow for log scale plotting. The counts are grouped by the variables in intgroup, where more than one variable can be specified.

• Here we specify the gene which had the smallest p value from the results table created above. You can select the gene to plot by rowname or by numeric index.

Page 34: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

plotCounts()

Page 35: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Exporting results to CSV files

• A plain-text file of the results can be exported using the base R functions write.csv.

Page 36: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Design of experiments

Page 37: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Statistical design of experiments

The process of planning the experiment so that appropriate data will be collected and analyzed by statistical methods, resulting in valid and objective conclusions.

Page 38: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Explanatory and response variables

𝑋𝑋 𝑌𝑌

- Response variables- Explanatory variables- Factors

Page 39: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factors

𝑋𝑋 𝑌𝑌

𝑍𝑍

Response variablesTreatment factor or design factor

Levels: 𝑋𝑋 = 𝑥𝑥Treatment combination or treatment: a particular combination of factor levels(e.g. 𝑥𝑥1, 𝑥𝑥2 if there are two treatment factors)

- Noise factor- Blocking factor

Page 40: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Three basic principles of experimental design• Randomization

• Replication

• Blocking

Page 41: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Randomization

By randomization we mean that both the assignment of treatments to units and the order in which the individual runs of the experiments are to be performed are randomly determined.

A completely randomized design is an experimental design in which treatments are assigned to all units by randomization.

Page 42: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Replication

By replication we mean an independent repeat run of each treatment combination.

Page 43: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Experimental units

• An entity receiving an independent application of a treatment is called an experimental unit.

• An experimental run is the process of applying a particular treatment combination to an experimental unit and recording its response.

• A replicate is an independent run carried out on a different experimental unit under the same conditions.

Page 44: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Example: Two pots

Experimental unit: plant on the potNo replication

Page 45: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Example: Randomized

Experimental unit: plant on the pot4 replicates for each treatment

Page 46: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Blocking

Blocking is an experimental design strategy used to reduce or eliminate the variability transmitted from nuisance factors, which may influence the response variable but in which we are not directly interested.

Blocking is the grouping of experimental units that have similar properties. Within each block, treatments are randomly assigned to experimental units. The resulting design is called a randomized block design. This design enables more precise estimates of the treatment effects because comparisons between treatments are made among homogeneous experimental units in each block.

Page 47: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Blocking

𝑋𝑋 𝑌𝑌

𝑍𝑍

Page 48: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Blocking example

Blocking removes the variation in response among chambers, allowing more precise estimates and more powerful tests of the treatment effects.

Page 49: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Blinding

The process of concealing information from participants and researchers about which of them receive which treatments is called blinding.

Single-blind experiment: participants area unaware of the treatment they have been assigned. It prevents participants from responding differently according to their knowledge of their treatment.

Double-blind experiment: researchers administering the treatments and measuring the response are also unaware of which subjects are receiving which treatment.

Page 50: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factorial design

Many experiments in biology investigate more than one treatment factor, because:

1. answering two questions from a single experiment rather than just one makes more efficient use of time, supplies, and other costs

2. the factors might interact.

Page 51: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factorial design

An experiment having a factorial design investigates all treatment combinations of two or more treatment factors. A factorial design can measure interactions between factors.

An interaction between two (or more) explanatory variables means that the effect of one variable on the response depends on the state of the other variable.

Page 52: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Factorial design

𝑋𝑋1 𝑌𝑌

𝑋𝑋2

Page 53: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

A unified model: general linear model

𝐸𝐸[𝑦𝑦] = 𝛽𝛽0 + 𝛽𝛽1𝑥𝑥1 + ⋯+ 𝛽𝛽𝑝𝑝−1𝑥𝑥𝑝𝑝−1

Page 54: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Basic linear models

Model formula Model Design

𝑦𝑦~𝑥𝑥 Linear regression Dose-response

𝑦𝑦~t One-way ANOVA Completely randomized

𝑦𝑦~t + b Two-way ANOVA Randomized block

𝑦𝑦~t1 + t2 + t1t2 Two-way, fixed-effect ANOVA

Factorial design

𝑦𝑦~𝑡𝑡 + 𝑥𝑥 ANCOVA Observation study with one known noise factor

𝑦𝑦~𝑥𝑥1 + 𝑥𝑥2 + 𝑥𝑥1𝑥𝑥2 Multiple linear regression

Dose-response

𝑥𝑥: numerical, t: categorical treatment factor, b: categorical blocking factor

Page 55: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Randomized complete block design

How does fish abundance affects the abundance and diversity of prey species?

Page 56: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Design

30 fish 90 fish

Control Low High

3𝑚𝑚 × 3𝑚𝑚

Page 57: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Data: Zooplankton diversity in three fish abundance treatments

1 2 3 4 5

Control 4.1 3.2 3.0 2.3 2.5

Low 2.2 2.4 1.5 1.3 2.6

High 1.3 2.0 1.0 1.0 1.6

Page 58: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Model: 𝑦𝑦~t + b

𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑡𝑡𝑖𝑖 + 𝛽𝛽2bi + 𝜖𝜖𝑖𝑖

H0: Mean zooplankton diversity is the same in every abundance treatment

𝑦𝑦~b

H1: Mean zooplankton diversity is not the same in every abundance treatment

𝑦𝑦~t + b

Page 59: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Fitting the model to data

Page 60: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Adjusting for a known confounding factor

Page 61: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Adjusting for a known confounding factorMole rats are the only known mammals with distinct social castes. - A single queen and a small number of males are the only reproducing individuals in a colony.

- Workers gather food, defend the colony, care for the young, and maintain the burrows.

- Two worker castes in the Damaraland mole rat: - “Frequent workers”: do almost all of the work in the colony - “Infrequent workers”: do little work except on rare occasions after rains

Page 62: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Adjusting for a known confounding factorTo assess the physiological differences between the two types of workers, researchers compared daily energy expenditures of wild mole rats during a dry season.

Known noise factor: Energy expenditure appears to vary with body mass in both groups, but infrequent workers are heavier than frequent workers

Research question: How different is mean daily energy expenditure between the two groups when adjusted for differences in body mass?

Page 63: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Data

Page 64: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Data

Page 65: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Model: 𝑦𝑦~𝑡𝑡 + 𝑥𝑥H0: Castes do not differ in energy expenditure

𝑦𝑦~𝑥𝑥

H1: Castes differ in energy expenditure

𝑦𝑦~𝑡𝑡 + 𝑥𝑥

Page 66: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Fitting the model to data

Page 67: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Multi-factor designs

Page 68: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Multiple factors

• Experiments with more than one factor influencing the counts can be analyzed using design formula that include the additional variables.

• In fact, DESeq2 can analyze any possible experimental design that can be expressed with fixed effects terms (multiple factors, designs with interactions, designs with continuous variables, splines, and so on are all possible).

• By adding variables to the design, one can control for additional variation in the counts. For example, if the condition samples are balanced across experimental batches, by including the batch factor to the design, one can increase the sensitivity for finding differences due to condition. There are multiple ways to analyze experiments when the additional variables are of interest and not just controlling factors.

Page 69: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Including type

Page 70: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Accounting for type

• We can account for the different types of sequencing, and get a clearer picture of the differences attributable to the treatment.

• As condition is the variable of interest, we put it at the end of the formula.

• Thus the results function will by default pull the condition results unless contrast or name arguments are specified. Then we can re-run DESeq.

Page 71: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Accounting for type

Page 72: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Accounting for type

Page 73: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Accounting for type

Page 74: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Accounting for type

• It is also possible to retrieve the log2 fold changes, p values and adjusted p values of the type variable. The contrast argument of the function results takes a character vector of length three: the name of the variable, the name of the factor level for the numerator of the log2 ratio, and the name of the factor level for the denominator.

Page 75: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the

Accounting for type