Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package...

30
Package ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 Date 2015-11-06 Author Colin Worby Maintainer Colin Worby <[email protected]> Description Suite of functions for the simulation, visualisation and analysis of bacterial evolution within- and between-host. License GPL-3 NeedsCompilation no Repository CRAN Date/Publication 2015-11-06 23:58:19 R topics documented: seedy-package ........................................ 2 ancestors .......................................... 3 deepseq ........................................... 4 deepseqmat ......................................... 4 diversity.range ........................................ 5 estcoaltime ......................................... 6 expsnps ........................................... 7 flat .............................................. 8 gd .............................................. 9 hump ............................................ 10 librtoDNA .......................................... 10 meansnps .......................................... 11 networkmat ......................................... 12 outbreak ........................................... 13 plotdistmat ......................................... 13 plotdiversity ......................................... 15 1

Transcript of Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package...

Page 1: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

Package ‘seedy’November 6, 2015

Type Package

Title Simulation of Evolutionary and Epidemiological Dynamics

Version 1.3

Date 2015-11-06

Author Colin Worby

Maintainer Colin Worby <[email protected]>

DescriptionSuite of functions for the simulation, visualisation and analysis of bacterial evolution within-and between-host.

License GPL-3

NeedsCompilation no

Repository CRAN

Date/Publication 2015-11-06 23:58:19

R topics documented:seedy-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2ancestors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3deepseq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4deepseqmat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4diversity.range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5estcoaltime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6expsnps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7flat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8gd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9hump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10librtoDNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10meansnps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11networkmat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12outbreak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13plotdistmat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13plotdiversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1

Page 2: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

2 seedy-package

plotnetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15plotobservedsnps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16plotoutbreak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17plotsnpfreq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19sharedvariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20simfixoutbreak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20simulateoutbreak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22simulatepopulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26transroutes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27withinhost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Index 30

seedy-package Simulation of Evolutionary and Epidemiological Dynamics

Description

A package of functions to simulate, visualize and assess epidemiological and pathogen genomicsample data collected during an outbreak.

Details

Package: seedyType: PackageVersion: 1.3Date: 2015-11-06License: GPL-3

Author(s)

Colin Worby ([email protected])

Examples

# Load within host datadata(withinhost)

# Calculate genetic distance matrixGmat <- gd(withinhost$obs.strain, withinhost$libr, withinhost$nuc,

withinhost$librstrains)

# Set colorscolvec <- rainbow(1200)[1:1000] # Color palette

Page 3: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

ancestors 3

coltext <- rep("black", length(colvec)) # Corresponding text colorscoltext[680:970] <- "white" # White text for darker background colours

# Plot distance matrixplotdistmat(Gmat, colvec, coltext, pos="bottomleft", labels=NULL, numbers=TRUE)

# Load outbreak datadata(outbreak)sampledata <- outbreak$sampledataepidata <- outbreak$epidata

# Calculate distance matrix for observed samplesdistmat <- gd(sampledata[,3], outbreak$libr, outbreak$nuc, outbreak$librstrains)

# Now pick colors for sampled isolatesrefnode <- 1 # Compare distance to which isolate?colv <- NULL # Vector of colors for samplesmaxD <- max(distmat[,refnode])

for (i in 1:nrow(sampledata)) {colv <- c(colv,

colvec[floor((length(colvec)-1)*(distmat[refnode,i])/maxD)+1])}

plotoutbreak(epidata, sampledata, col=colv, stack=TRUE, arr.len=0.1,blockheight=0.5, hspace=500, label.pos="left", block.col="grey",jitter=0.004, xlab="Time", pch=1)

ancestors Vector of infection ancestors

Description

Provides the chain of infection leading to a specified individual.

Usage

ancestors(x, ID, sources)

Arguments

x Individual ID.

ID List of person IDs.

sources List of infection sources, corresponding to ID.

Details

First element will be the ID x, last element will be zero.

Page 4: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

4 deepseqmat

Examples

data(outbreak)ancestors(9, outbreak$epidata[,1], outbreak$epidata[,4])

deepseq Deep-sequenced bacterial samples

Description

Simulated deep-sequence samples within a host. Equilibrium population 10000, mutation rate 0.001per site per generation, genome length 100kb, sampled every 1000 generations until 25000.

Usage

deepseq

Format

Deep-sequenced samples were taken at 50 time points.

Examples

data(deepseq)D <- plotdiversity(deepseq, sample.times=(1:50)*1000, xlab="Time", ylab="Expected pairwise SNPs")

deepseqmat Generate deep sequence observations

Description

Generates a matrix of polymorphism frequencies

Usage

deepseqmat(X)

Arguments

X Simulated complete genomic sampling data from the simulateoutbreak orsimulatepopulation functions.

Page 5: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

diversity.range 5

Details

The argument full=TRUE must be passed to simulateoutbreak or simulatepopulation in orderfor complete genomic data to be generated for each sample. Data of this format can be passed tothis function.

Value

Returns a matrix, each row represents a polymorphic site, each column a sample. Frequencies ofthe mutant type (relative to the initial strain) are reported.

Examples

data(deepseq)W <- deepseqmat(deepseq)

diversity.range Range of genetic diversity over time

Description

Generates multiple populations stochastically from an identical source, and measures the resultingdiversity over time in each.

Usage

diversity.range(m.rate, runtime, equi.pop, iterations = 10, n.points = 100,genomelength = 1e+05, bottle.times=0, bottle.size=1, feedback = 1000,makeplot = TRUE, area = TRUE, colline = "blue", colarea = rgb(0, 0, 1, 0.4),ref.strain = NULL, init.freq = 1, libr=NULL, nuc=NULL, ...)

Arguments

m.rate Mutation rate (per genome per generation).

runtime Number of bacterial generations over which to simulate.

equi.pop Equilibrium effective population size.

iterations Number of populations to simulate.

n.points Number of equidistant points to sample diversity during runtime.

genomelength Genome length.

bottle.times Vector of population bottleneck times.

bottle.size Size of population bottleneck.

feedback Number of generations between each simulation report.

makeplot Should resulting diversity be plotted?

area Should 95% central quantile of genetic diversity be shaded? If FALSE, thenindividual diversity trajectories are plotted.

Page 6: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

6 estcoaltime

colline Colour of lines (if makeplot=TRUE).

colarea Colour of shaded area (if makeplot=TRUE and area=TRUE).

ref.strain Reference strain, if required.

init.freq Initial frequency of strains in starting population (if libr and nuc specified)

libr Library of initial sequences.

nuc Nucleotides at polymorphic sites, corresponding to libr.

... Additional arguments to be passed to plot.

Details

Provides an empirical estimate of the expected genetic diversity (pairwise SNP distance) over time,with associated uncertainty. Initial population can be specified using the libr, nuc and init.freqarguments, otherwise population is grown from a single genotype. Resolution can be improved byincreasing n.points, and accuracy by increasing iterations (at the expense of accuracy).

Value

A iterations by n.points matrix with diversity over time for each simulation.

See Also

plotdiversity

Examples

iterations <- 10K <- diversity.range(m.rate=0.0005, runtime=1000, equi.pop=1000,iterations=iterations, n.points=100, genomelength=100000, feedback=100,makeplot=TRUE, area=TRUE, colline="blue", colarea=rgb(0,0,1,0.4))

estcoaltime Estimate expected time to coalescence for sampled lineages in bottle-necked population

Description

Estimates the expected time to coalescence for two randomly sampled lineages at a particular time,given the past population dynamics.

Usage

estcoaltime(bottlesize, popsize, bottletimes, obstime)

Page 7: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

expsnps 7

Arguments

bottlesize Effective size of population bottlenecks.

popsize Effective population size between bottlenecks.

bottletimes Vector of bottleneck times.

obstime Time of observation.

Details

Expected coalescent time is calculated under the assumption that population remains constant atpopsize, but drops to bottlesize for a single generation at bottleneck times. The probability ofcoalescence in a particular generation is the reciprocal of the population size at the time.

Value

Returns the expected number of generations since coalescence.

Examples

estcoaltime(bottlesize=10, popsize=10000, bottletimes=c(1000,2000,3000,4000),obstime=5000)

expsnps Distribution of genetic distances

Description

For any pair of epidemiologically linked individuals, returns the distribution of genetic distanceseparating randomly drawn samples.

Usage

expsnps(x, m.rate, c.rate, tau)

Arguments

x Vector of (non-negative integer) quantiles.

m.rate Mutation rate (per genome per generation).

c.rate Rate of coalescence prior to lineage divergence (constant).

tau Total time from lineage divergence to observations.

Page 8: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

8 flat

Details

A pair of genomes sampled during an outbreak will have an epidemiological and an evolutionaryrelationship with each other. ’Lineage divergence’ is defined to be the time at which the lineagesceased to exist within the same host (or, the latest possible time of coalescence). Mutations mayarise in two distinct time periods: (a) the time between coalescence and lineage divergence, and(b) the time between lineage divergence and observation. The latter follows a Poisson distributionwith mean equal to the mutation rate multiplied by the total time between lineage divergence andobservations (two branches). The former is not Poisson distributed, since the time to coalescenceis typically unknown (but follows an exponential distribution with a constant effective populationsize). It follows that, with a constant effective population size (and therefore coalescent rate), thenumber of mutations follows a geometric distribution. Therefore, the total number of SNPs betweentwo samples is a geometric-Poisson mixture distribution, which this function returns.

Value

Returns the probability density for given genetic distances.

See Also

estcoaltime

Examples

expsnps(3, m.rate=0.003, c.rate=1/10000, tau=1000)

plot(expsnps(0:100, m.rate=0.003, c.rate=1/2000, tau=5000), type="h")

flat Equilibrium population growth model

Description

Provides the expected pathogen population at any point during infection

Usage

flat(x, span, eq.size)

Arguments

x Time after infection.

span Total duration of infection.

eq.size Expected equilibrium population size.

Value

Returns expected population size at x.

Page 9: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

gd 9

Examples

flat(1:200, span=200, eq.size=1000)

gd Genetic distance matrix

Description

Given a set of genomic samples, returns a pairwise geentic distance matrix

Usage

gd(samp, libr, nuc, key)

Arguments

samp Vector of sample IDs.

libr Library object from simulation functions. A list in which each entry representsa unique genotype, and is a vector of mutated nucleotide positions relative to thereference seqeunce.

nuc Nucleotide database from simulation functions. A list (corresponding to libr)in which each entry represents a unique genotype, and is a vector of mutatednucleotides relative to the reference sequence.

key Vector of sample IDs in the order they appear in the libr and nuc objects.

Details

Each element of samp represents one row and column in the genetic distance matrix.

Value

Returns a symmetric genetic distance matrix with rows and columns corresponding to the sampvector.

Examples

data(withinhost)gd(withinhost$obs.strain, withinhost$libr, withinhost$nuc, withinhost$librstrains)

Page 10: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

10 librtoDNA

hump Sinusoidal population growth model

Description

Provides the expected pathogen population at any point during infection

Usage

hump(x, span, max.size)

Arguments

x Time after infection.

span Total duration of infection.

max.size Expected maximum population size, attained at midpoint of infection.

Value

Returns expected population size at x.

Examples

hump(1:200, span=200, max.size=1000)

librtoDNA Convert simulation objects to DNA sequences or Nexus/Fasta files.

Description

Creates a character string or matrix of nucleotides (C, A, G, T), output to vector, matrix or Nexusfile.

Usage

librtoDNA(sampleID, libr, nuc, ref.strain, key, sampletime=NULL,strings = FALSE, filename = NULL, format = "nexus")

Page 11: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

meansnps 11

Arguments

sampleID Vector of sample IDs to output.

libr Library object from simulation functions. A list in which each entry representsa unique genotype, and is a vector of mutated nucleotide positions relative to thereference sequence.

nuc Nucleotide database from simulation functions. A list (corresponding to libr)in which each entry represents a unique genotype, and is a vector of mutatednucleotides relative to the reference sequence.

ref.strain Reference strain to which the libr and nuc objects are compared (string ofintegers in 1,...,4).

key Vector of sample IDs corresponding to the order of libr.

sampletime Vector of sample times. If specified, incorporates sample times into genomename in Nexus file.

strings If TRUE, returns a character vector, each element containing one genotype. Oth-erwise, returns a (number of genotypes)x(length of genome) character matrix.

filename File to which sequence data should be written. Output format is Nexus. Notwritten out if NULL.

format File format to be exported (if filename!=NULL). Options are "nexus" and "fasta".

Value

A character vector or matrix, depending on strings.

Examples

data(withinhost)G <- librtoDNA(sampleID=withinhost$obs.strain, libr=withinhost$libr, nuc=withinhost$nuc,

ref.strain=withinhost$ref.strain, key=withinhost$librstrains, strings=TRUE)

meansnps Mean diversity within a single population

Description

Calculates the expected pairwise genetic distance (number of SNPs) between two randomly sam-pled genomes in a population.

Usage

meansnps(strain.log, freq.log, libr, nuc, key)

Page 12: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

12 networkmat

Arguments

strain.log Vector of unique genotype IDs in populationfreq.log Vector of counts of genotypes in population, corresponding to strain.log.libr Library object from simulation functions. A list in which each entry represents

a unique genotype, and is a vector of mutated nucleotide positions relative to thereference seqeunce.

nuc Nucleotide database from simulation functions. A list (corresponding to libr)in which each entry represents a unique genotype, and is a vector of mutatednucleotides relative to the reference sequence.

key Vector of genotype IDs corresponding to the libr and nuc objects.

Examples

data(withinhost)meansnps(withinhost$obs.strain, rep(1, 10), withinhost$libr, withinhost$nuc,

withinhost$librstrains)

networkmat Create adjacency matrix

Description

For a given set of infection routes, returns an adjacency matrix.

Usage

networkmat(ID, sources)

Arguments

ID Vector of infective IDssources Vector of infection sources corresponding to ID.

Value

Returns a matrix with the [i,j]th entry equal to 1 if the ith infective in ID infected the jth infective inID.

See Also

ancestors

Examples

data(outbreak)networkmat(outbreak$epidata[,1], outbreak$epidata[,4])

Page 13: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

outbreak 13

outbreak Outbreak data

Description

Simulated outbreak data and genomic samples. An SIR outbreak was simulated with an infective in-dividual entering a population of thirty susceptibles. Ten genomes were sampled from each infectedindividual every 500 generations.

Usage

outbreak

Format

Consists of epidemiological data and genomic sample data.

Examples

data(outbreak)sampledata <- outbreak$sampledataepidata <- outbreak$epidata

distmat <- gd(sampledata[,3], outbreak$libr, outbreak$nuc, outbreak$librstrains)

# Now pick colors for sampled isolatescolvec <- rainbow(1200)[1:1000] # Color paletterefnode <- 1 # Compare distance to which isolate?colv <- NULL # Vector of colors for samplesmaxD <- max(distmat[,refnode])

for (i in 1:nrow(sampledata)) {colv <- c(colv,

colvec[floor((length(colvec)-1)*(distmat[refnode,i])/maxD)+1])}

plotoutbreak(epidata, sampledata, col=colv, stack=TRUE, arr.len=0.1,blockheight=0.5, hspace=500, label.pos="left", block.col="grey",jitter=0.004, xlab="Time", pch=1)

plotdistmat Plot genetic distance matrix

Description

Provides a graphical representation of the pairwise genetic distance matrix for a collection ofgenomes.

Page 14: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

14 plotdistmat

Usage

plotdistmat(distmat, colvec, coltext, pos = "topleft", labels = NULL,numbers = TRUE, ...)

Arguments

distmat Symmetrical pairwise distance matrix, the [i,j]th entry corresponding to the ge-netic distance between genomes i and j.

colvec Vector of colors to represent increasing genetic distance.

coltext Vector of colors for numerals representing geentic distance on plot (if numbers=TRUE).

pos Position of the matrix in plot. Allowed values are "topleft", "topright","bottomleft" and "bottomright".

labels Axis labels for genomes (by default, marked 1,...,n).

numbers Should the genetic distance be recorded in each cell?

... Additional arguments to be passed to plot.

Details

Plots the upper (or lower) diagonal genetic distance matrix, with each entry colored according togeentic distance.

See Also

gd

Examples

data(withinhost)Gmat <- gd(withinhost$obs.strain, withinhost$libr, withinhost$nuc,

withinhost$librstrains)

colvec <- rainbow(1200)[1:1000] # Color palettecoltext <- rep("black", length(colvec)) # Corresponding text colorscoltext[680:970] <- "white" # White text for darker background colours

plotdistmat(Gmat, colvec, coltext, pos="bottomleft", labels=NULL, numbers=TRUE)

Page 15: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

plotdiversity 15

plotdiversity Plot genetic diversity over time

Description

Calculates and plots the genetic diversity (mean pairwise number of SNPs between samples) indeep-sequenced samples.

Usage

plotdiversity(X, sample.times, makeplot = TRUE, filter=FALSE, ...)

Arguments

X Simulated deep sequence data over time.

sample.times Vector of sample collection times.

makeplot Should diversity samples be plotted?

filter Should a subset of the sequence data be plotted? That is, should only the se-quences at sample.times be plotted, out of a larger sample set?

... Additional arguments to be passed to plot (if makeplot=TRUE).

Value

Returns a vector of diversity values for each sample time.

Examples

data(deepseq)D <- plotdiversity(deepseq, sample.times=(1:50)*1000, xlab="Time",

ylab="Expected pairwise SNPs")

plotnetwork Plot weighted transmission network

Description

For a given weighted adjacency matrix, plots a directed network illustrating transmission probabil-ities.

Usage

plotnetwork(probmat, labels, arrlen = 0.15, scale = 1, ...)

Page 16: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

16 plotobservedsnps

Arguments

probmat Transmission probability matrix. The [i,j]th element represents the probabilitythat person i infected person j.

labels Labels representing row and columns in the probability matrix.

arrlen Length of arrow heads (see arrows).

scale Scaling for colors according to probability values.

... Additional arguments to be passed to plot.

Examples

data(outbreak)K <- networkmat(outbreak$epidata[,1], outbreak$epidata[,4])plotnetwork(K, labels=outbreak$epidata[,1])

plotobservedsnps Plot expected frequency of polymorphic sites in a model deep sequenc-ing project

Description

Given an average depth of coverage and per base sequencing error rate, estimate the read frequencyof most common polymorphic sites.

Usage

plotobservedsnps(data, timepoint=1, coverage=50, error=0.001, iterations=100, maxsnp=50,legend=TRUE, ylim=c(0,1.5*coverage), ...)

Arguments

data Full sequence data generated from the simulatepopulation function.

timepoint Which sampling time should be used.

coverage Coverage depth.

error Sequencing erorr rate per base.

iterations Number of iterations to generate confidence bounds.

maxsnp Number of polymorphic sites to show.

legend Should a legend be plotted?

ylim Bounds of y axis.

... Additional arguments to be passed to plot.

Page 17: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

plotoutbreak 17

Details

Plots the expected number of reads containing the most frequent polymorphisms, with a 95% con-fidence interval, derived from repeated random draws. Additionally shows the frequency of falsepositive observations arising through sequencing error.

Value

Returns an iterations x maxsnp matrix. Each column represents a polymorphic site, and eachrow one iteration. Each entry provides the number of reads containing the polymorphism in a giveniteration. Columns are ordered by frequency.

Author(s)

T. D. Read ([email protected])

Examples

data(deepseq)

# At the 25th time pointplotobservedsnps(data=deepseq, timepoint=25, xaxt="n", xlab="Ranked polymorphic sites",

ylab="Reads", yaxs="i", las=1)# At the 50th time pointplotobservedsnps(data=deepseq, timepoint=50, xaxt="n", xlab="Ranked polymorphic sites",

ylab="Reads", yaxs="i", las=1)

plotoutbreak Plot outbreak

Description

Provides a graphical representation of simulated outbreak and sampled genomes.

Usage

plotoutbreak(epidata, sampledata, col = "red", stack = TRUE, arr.len = 0.1,blockheight = 0.5, hspace = max(epidata[,3])/20, labels = NULL, label.pos = "left",block.col = "grey", jitter = 0, pch = 1, ...)

Arguments

epidata Simulated epidemiological data - matrix consisting of person IDs, infection andrecovery times, and infection source.

sampledata Simulated genomic sample data - matrix of person IDs, sample times and genomeID. Genomic sample data simulated with full=TRUE cannot be used.

Page 18: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

18 plotoutbreak

col A vector of colors to represent increasing genetic distance.

stack Should infectious periods be organized to minimize plot height? Alternatively,each period occupies one row.

arr.len Arrow length. See arrows.

blockheight The height of each bar representing infectious periods. Takes a value between 0and 1.

hspace Minimum horizontal space between two infectious period bars (in units of time).

labels Labels for each infectious episode. First columns of epidata by default.

label.pos Position of labels. Accepted values are "centre", "left" and "right".

block.col Background color for each infectious period bar.

jitter Amount of jitter to be applied to genome sample points (as a proportion of plotdimensions).

pch Point type for genome samples (see par).

... Additional arguments to be passed to plot.

Details

Graphical representation of transmission dynamics and sampled genomes. For multiple genomesamples per time point, set jitter>0.

Examples

data(outbreak)sampledata <- outbreak$sampledataepidata <- outbreak$epidata

distmat <- gd(sampledata[,3], outbreak$libr, outbreak$nuc, outbreak$librstrains)

# Now pick colors for sampled isolatescolvec <- rainbow(1200)[1:1000] # Color paletterefnode <- 1 # Compare distance to which isolate?colv <- NULL # Vector of colors for samplesmaxD <- max(distmat[,refnode])

for (i in 1:nrow(sampledata)) {colv <- c(colv,

colvec[floor((length(colvec)-1)*(distmat[refnode,i])/maxD)+1])}

plotoutbreak(epidata, sampledata, col=colv, stack=TRUE, arr.len=0.1,blockheight=0.5, hspace=500, label.pos="left", block.col="grey",jitter=0.004, xlab="Time", pch=1)

Page 19: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

plotsnpfreq 19

plotsnpfreq Plot frequency of polymorphic sites

Description

Returns the frequency of polymorphic sites above a probability level.

Usage

plotsnpfreq(data, timepoint=1, type="S", ...)

Arguments

data Full sequence data generated from the simulatepopulation function.

timepoint Which sampling time should be used.

type Type of plot desired. See plot.

... Additional arguments to be passed to plot.

Details

Plots frequency of polymorphic sites above each probability level.

Value

Returns a matrix with points used for the plot.

Examples

data(deepseq)

# At the 25th time pointplotsnpfreq(data=deepseq, timepoint=25, xlab="Mutant frequency", ylim=c(0,25),

ylab="No. sites", yaxs="i", xaxs="i", las=1, bty="l", col="red", lwd=2)# At the 50th time pointplotsnpfreq(data=deepseq, timepoint=50, xlab="Mutant frequency", ylim=c(0,25),

ylab="No. sites", yaxs="i", xaxs="i", las=1, bty="l", col="red", lwd=2)

Page 20: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

20 simfixoutbreak

sharedvariants Shared variants matrix

Description

Generates a matrix of shared variant counts between each sample pair.

Usage

sharedvariants(deepseqmat)

Arguments

deepseqmat Deep sequence matrix generated by the deepseqmat function.

Value

Returns a matrix in which each entry details the number of shared intermediate frequency variantsbetween each pair of samples.

Examples

data(deepseq)W <- deepseqmat(deepseq)Y <- sharedvariants(W)

simfixoutbreak Simulate evolutionary dynamics on a given transmission tree

Description

Simulate within-host evolutionary dynamics on top of an existing transmission tree and generategenomic samples.

Usage

simfixoutbreak(ID,inf.times, rec.times, inf.source, mut.rate, equi.pop=10000, shape=flat,inoc.size=1, imp.var=25, samples.per.time=1, samp.schedule="random",samp.freq=500, full=FALSE, feedback=500, glen=100000,ref.strain=NULL, ...)

Page 21: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

simfixoutbreak 21

Arguments

ID Vector of unique IDs.

inf.times Vector of (integer) infection times.

rec.times Vector of (integer) removal times.

inf.source Vector of infection sources. The ith entry corresponds to the ID of the source ofinfection. For importations, the source should be 0.

mut.rate Mutation rate (per genome per generation).

equi.pop Equilibrium effective population size of pathogens within-host.

shape Function describing the population growth dynamics. See Details.

inoc.size Size of pathogen inoculum at transmission.

imp.var The expected number of mutations separating unconnected importations.samples.per.time

Number of samples taken at sampling times.

samp.schedule How should sampling be conducted? Accepted values are: "calendar" - samplesare taken from all current infectives every samp.freq generations; "individual" -samples are taken from each infective at intervals of samp.freq after infection;"random" - samples are taken at one time between infection and removal foreach infective.

full Should ‘full’ genomic sampling be returned? That is, should a vector of geno-types and their respective frequencies be stored from each individual’s samplingtimes?

samp.freq Number of generations between each sampling time (see samp.schedule).

feedback Number of generations between simulation updates returned to R interface.

glen Length of genome.

ref.strain Initial sequence. By default, a random sequence of length glen.

... Additional arguments to be passed to the shape function.

Details

Population growth dynamics are defined by the function called by ’shape’. This function returns theexpected population size at each time step, given the total simulation time. By default, the popula-tion is expected to grow exponentially until reaching an equilibrium level, specified by equi.pop(flat). Alternatively, the population can follow a sinusoidal growth curve, peaking at runtime/2(hump). User-defined functions should be of the form function(time,span,equi.pop,...),where span is equal to the duration of infection in this setting.

Value

Returns a list of outbreak data:

epidata A matrix of epidemiological data with columns: person ID, infection time, re-moval time, source of infection.

sampledata A matrix of genome samples with columns: person ID, sampling time, genomeID.

Page 22: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

22 simulateoutbreak

libr A list with an entry for each unique genotype observed. Each entry is a vectorof mutation positions relative to the reference genome.

nuc A list with an entry for each unique genotype observed. Each entry is a vectorof nucleotide types (integer between 1 and 4).

librstrains A vector of unique genotype IDs corresponding to the libr object.

endtime End time of the outbreak.

Examples

# Simulate a transmission chaininf.times <- (0:20)*100rec.times <- inf.times + 100 + rpois(21,50)inf.source <- 0:20inf.source[c(3,11)] <- 0 # Two importationsmut.rate <- 0.001

# Now simulate evolutionary dynamics and samples on top of this treeW <- simfixoutbreak(ID=1:21, inf.times, rec.times, inf.source, mut.rate, equi.pop=1000, shape=flat,

inoc.size=10, imp.var=25, samples.per.time=5, samp.schedule="random",samp.freq=500, full=FALSE, feedback=100, glen=100000,ref.strain=NULL)

sampledata <- W$sampledataepidata <- W$epidata

# Calculate distance matrix for observed samplesdistmat <- gd(sampledata[,3], W$libr, W$nuc, W$librstrains)

# Now pick colors for sampled isolatescolvec <- rainbow(1200)[1:1000] # Color paletterefnode <- 1 # Compare distance to which isolate?colv <- NULL # Vector of colors for samplesmaxD <- max(distmat[,refnode])

for (i in 1:nrow(sampledata)) {colv <- c(colv,

colvec[floor((length(colvec)-1)*(distmat[refnode,i])/maxD)+1])}

plotoutbreak(epidata, sampledata, col=colv, stack=TRUE, arr.len=0.1,blockheight=0.5, hspace=60, label.pos="left", block.col="grey",jitter=0.004, xlab="Time", pch=1)

simulateoutbreak Simulate transmission and evolutionary dynamics

Page 23: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

simulateoutbreak 23

Description

Simulate within-host evolutionary dynamics on top of an SIR transmission process and generategenomic samples.

Usage

simulateoutbreak(init.sus, inf.rate, rem.rate, mut.rate, nmat = NULL,equi.pop = 10000, shape=flat, init.inf = 1, inoc.size = 1,samples.per.time = 1, samp.schedule = "random",samp.freq = 500, full=FALSE, mincases = 1,feedback = 500, glen = 1e+05, ref.strain = NULL, ...)

Arguments

init.sus Initial number of susceptible individuals.

inf.rate SIR rate of infection.

rem.rate SIR rate of removal.

mut.rate Mutation rate (per genome per generation).

nmat Connectivity matrix. Entry [i,j] gives the relative rate at which person i maycontact person j.

equi.pop Equilibrium effective population size of pathogens within-host.

shape Function describing the population growth dynamics. See Details.

init.inf Initial number of infected individuals.

inoc.size Size of pathogen inoculum at transmission.samples.per.time

Number of samples taken at sampling times.

samp.schedule How should sampling be conducted? Accepted values are: "calendar" - samplesare taken from all current infectives every samp.freq generations; "individual" -samples are taken from each infective at intervals of samp.freq after infection;"random" - samples are taken at one time between infection and removal foreach infective.

samp.freq Number of generations between each sampling time (see samp.schedule).

full Should ‘full’ genomic sampling be returned? That is, should a vector of geno-types and their respective frequencies be stored from each individual’s samplingtimes?

mincases Minimum final size of outbreak to output. If final size is less than this value,another outbreak is simulated.

feedback Number of generations between simulation updates returned to R interface.

glen Length of genome.

ref.strain Initial sequence. By default, a random sequence of length glen.

... Additional arguments to be passed to the shape function.

Page 24: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

24 simulateoutbreak

Details

Population growth dynamics are defined by the function called by ’shape’. This function returns theexpected population size at each time step, given the total simulation time. By default, the popula-tion is expected to grow exponentially until reaching an equilibrium level, specified by equi.pop(flat). Alternatively, the population can follow a sinusoidal growth curve, peaking at runtime/2(hump). User-defined functions should be of the form function(time,span,equi.pop,...),where span is equal to the duration of infection in this setting.

Value

Returns a list of outbreak data:

epidata A matrix of epidemiological data with columns: person ID, infection time, re-moval time, source of infection.

sampledata A matrix of genome samples with columns: person ID, sampling time, genomeID.

libr A list with an entry for each unique genotype observed. Each entry is a vectorof mutation positions relative to the reference genome.

nuc A list with an entry for each unique genotype observed. Each entry is a vectorof nucleotide types (integer between 1 and 4).

librstrains A vector of unique genotype IDs corresponding to the libr object.endtime End time of the outbreak.

Examples

W <- simulateoutbreak(init.sus=10, inf.rate=0.002, rem.rate=0.001, mut.rate=0.0001,equi.pop=2000, inoc.size=1, samples.per.time=10,samp.schedule="calendar", samp.freq=500, mincases=3)

sampledata <- W$sampledataepidata <- W$epidata

# Calculate distance matrix for observed samplesdistmat <- gd(sampledata[,3], W$libr, W$nuc, W$librstrains)

# Now pick colors for sampled isolatescolvec <- rainbow(1200)[1:1000] # Color paletterefnode <- 1 # Compare distance to which isolate?colv <- NULL # Vector of colors for samplesmaxD <- max(distmat[,refnode])

for (i in 1:nrow(sampledata)) {colv <- c(colv,

colvec[floor((length(colvec)-1)*(distmat[refnode,i])/maxD)+1])}

plotoutbreak(epidata, sampledata, col=colv, stack=TRUE, arr.len=0.1,blockheight=0.5, hspace=500, label.pos="left", block.col="grey",jitter=0.004, xlab="Time", pch=1)

Page 25: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

simulatepopulation 25

simulatepopulation Simulate a pathogen population

Description

Simulates a pathogen population undergoing a Wright-Fisher discrete-time evolutionary process.

Usage

simulatepopulation(m.rate, runtime, equi.pop, sample.times, n.samples=1,genomelength=100000, shape=flat, bottle.times=0, bottle.size=1, full=FALSE,feedback=1000, init.freq=1, libr=NULL, nuc=NULL, ref.strain=NULL, ...)

Arguments

m.rate Mutation rate (per sequence per generation).

runtime Number of generations for simulation to run.

equi.pop Equilibrium effective population size of pathogens within-host.

sample.times Vector of times at which to sample population.

n.samples Number of samples to take at each sampling point (if deepseq=FALSE).

genomelength Length of genome.

shape Function describing the population growth dynamics. See Details.

bottle.times Vector of population bottleneck times.

bottle.size Size of population bottleneck (if bottle.times!=0).

full Should complete samples (all genotypes and their frequencies) be returned?

feedback Intervals between R feedback on simulation progress.

init.freq Vector of initial frequencies of genotypes, if initial population should be speci-fied. By default, the population grows from a single, randomly generated geno-type.

libr Initial list of genotypes, if initial population should be specified. Must havesame length as init.freq.

nuc Initial list of mutations, if initial population should be specified. Must have samelength as init.freq.

ref.strain Reference strain. By default, this is randomly generated.

... Additional arguments to be passed to the shape function.

Page 26: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

26 transmission

Details

Simulation of a bacterial population. Population is by default initially clonal, and initiated by asingle genotype. Population is prevented from extinction at all times. Population can be passedthrough repeated bottlenecks using the arguments bottle.times and bottle.size. Populationgrowth dynamics are defined by the function called by ’shape’. This function returns the expectedpopulation size at each time step, given the total simulation time. By default, the population isexpected to grow exponentially until reaching an equilibrium level, specified by equi.pop (flat).Alternatively, the population can follow a sinusoidal growth curve, peaking at runtime/2 (hump).User-defined functions should be of the form function(time,span,equi.pop,...), where spanis equal to runtime in this setting.

Value

Returns a list of sampling data;

libr List of unique genotypes observed. Each entry is a vector of mutant loci relativeto the reference strain.

nuc List of mutation types corresponding to libr. Each entry is a vector of nu-cleotides mutated from the reference strain, corresponding to entries in libr.

librstrains Vector of unique genotype IDs corresponding to the libr object.

obs.strain If full=TRUE, list of observed genotypes, each entry corresponding to the sample.times.If full=FALSE, a vector of genotype IDs, corresponding to the returned vectorobs.time.

obs.freq List of observed genotype frequencies returned if full=TRUE.

obs.time Vector of observation times returned if full=FALSE.

ref.strain Reference strain used.

Examples

# Generate 5 genome samples at 5 time pointsX <- simulatepopulation(m.rate=0.0005, runtime=10000, equi.pop=2000,

sample.times=(1:5)*2000, n.samples=5,genomelength=10000, full=FALSE)

# Generate complete observations at 5 time pointsX <- simulatepopulation( m.rate=0.0005, runtime=10000, equi.pop=2000,

sample.times=(1:5)*2000, genomelength=10000,bottle.times=5000, bottle.size=1, full=TRUE)

transmission Outbreak data

Description

Simulated outbreak data and genomic samples. An SIR outbreak was simulated with an infectiveindividual entering a population of thirty susceptibles. A single genome was isolated from eachindividual during their infectious period.

Page 27: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

transroutes 27

Usage

transmission

Format

Consists of epidemiological data and genomic sample data.

Examples

data(transmission)W <- transmission

ID <- unique(W$sampledata[,1])GD <- gd(W$sampledata[,3], W$libr, W$nuc, W$librstrains)

sample.times <- W$sampledata[,2]inf.times <- numeric(length(ID))rec.times <- numeric(length(ID))truesource <- numeric(length(ID))for (i in 1:length(ID)) {

inf.times[i] <- W$epidata[which(W$epidata[,1]==ID[i]),2]rec.times[i] <- W$epidata[which(W$epidata[,1]==ID[i]),3]truesource[i] <- W$epidata[which(W$epidata[,1]==ID[i]),4]

}

colvec <- rainbow(1200)[1:1000] # Color paletterefnode <- 1 # Compare distance to which isolate?colv <- NULL # Vector of colors for samplesmaxD <- max(GD[,refnode])for (i in 1:length(ID)) {

colv <- c(colv, colvec[floor((length(colvec)-1)*(GD[refnode,i])/maxD)+1])}

plotoutbreak(W$epidata, W$sampledata, col=colv, pch=16)K <- transroutes(ID=ID, GD=GD, sample.times=sample.times, inf.times=inf.times,

rec.times=rec.times, mut.rate=0.01, eq.size=5000,bottle.size=1, p.level=0.95, summary=TRUE)

truesourceK$maxpostsource

transroutes Assessment of transmission routes using theoretical SNP distribution

Description

Calculates likelihood and posterior probability of each potential transmission route using a geometric-Poisson approximation of SNP distance.

Page 28: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

28 transroutes

Usage

transroutes(ID, GD, sample.times, inf.times, rec.times=NULL, mut.rate, eq.size,bottle.size=1, p.level=0.95, geninterval=NULL, summary=TRUE)

Arguments

ID Vector of individual IDs.GD Matrix of genetic distances, with the [i,j]-th entry corresponding to the genetic

distance between samples from the ith and jth entry of ID.sample.times Vector of genome sampling times (same length as ID).inf.times Vector of infection times (same length as ID).rec.times Vector of recovery times (same length as ID if specified).mut.rate Mutation rate.eq.size Equilibrium within-host effective population size.bottle.size Size of population bottleneck.p.level Probability level at which to reject potential transmission routes.geninterval Generation interval (if bottle.size>1).summary Should a summary for each ID be printed to screen?

Details

Calculates the likelihood and posterior probability (given a flat prior) for each potential transmissionroute, as well as indicating which transmission routes would be rejected at a given probability level.If recovery times are not specified, each individual is assumed to be infectious for the duration ofthe outbreak.

Value

Returns calculated values:

maxpostsource Vector of maximum posterior probability transmission sources corresponding toID.

likelihood A matrix of likelihood values from the geometric-Poisson distribution. The[i,j]th entry provides the likelihood that the i-th individual infected the j-th indi-vidual.

posterior A matrix of posterior transmission probabilities. The [i,j]th entry provides theposterior probability that the i-th individual infected the j-th individual.

closestsource A list of the individuals carrying the most genetically similar genotype. Eachentry corresponds to ID.

reject Matrix indicating whether a transmission route would be rejected at the specifiedprobability level. The [i,j]th entry is equal to 1 if the route from i to j is rejectedat this level.

See Also

expsnps

Page 29: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

withinhost 29

Examples

data(transmission)W <- transmissionID <- unique(W$sampledata[,1])GD <- gd(W$sampledata[,3], W$libr, W$nuc, W$librstrains)

sample.times <- W$sampledata[,2]inf.times <- numeric(length(ID))rec.times <- numeric(length(ID))truesource <- numeric(length(ID))for (i in 1:length(ID)) {

inf.times[i] <- W$epidata[which(W$epidata[,1]==ID[i]),2]rec.times[i] <- W$epidata[which(W$epidata[,1]==ID[i]),3]truesource[i] <- W$epidata[which(W$epidata[,1]==ID[i]),4]

}

K <- transroutes(ID=ID, GD=GD, sample.times=sample.times, inf.times=inf.times,rec.times=rec.times, mut.rate=0.01, eq.size=5000,bottle.size=1, p.level=0.95, summary=TRUE)

truesourceK$maxpostsource

withinhost Within-host genomic data

Description

Simulated within-host bacterial genomic samples.

Usage

withinhost

Format

Two bacterial samples were taken every 2000 generations up to time 10000.

Examples

data(withinhost)Gmat <- gd(withinhost$obs.strain, withinhost$libr, withinhost$nuc,

withinhost$librstrains)

colvec <- rainbow(1200)[1:1000] # Color palettecoltext <- rep("black", length(colvec)) # Corresponding text colorscoltext[680:970] <- "white" # White text for darker background colours

plotdistmat(Gmat, colvec, coltext, pos="bottomleft", labels=NULL, numbers=TRUE)

Page 30: Package ‘seedy’ - The Comprehensive R Archive Network ‘seedy’ November 6, 2015 Type Package Title Simulation of Evolutionary and Epidemiological Dynamics Version 1.3 ...

Index

ancestors, 3, 12arrows, 16, 18

deepseq, 4deepseqmat, 4, 20diversity.range, 5

estcoaltime, 6, 8expsnps, 7, 28

flat, 8, 21, 24, 26

gd, 9, 14

hump, 10, 21, 24, 26

librtoDNA, 10

meansnps, 11

networkmat, 12

outbreak, 13

par, 18plot, 18, 19plotdistmat, 13plotdiversity, 6, 15plotnetwork, 15plotobservedsnps, 16plotoutbreak, 17plotsnpfreq, 19

seedy (seedy-package), 2seedy-package, 2sharedvariants, 20simfixoutbreak, 20simulateoutbreak, 4, 5, 22simulatepopulation, 4, 5, 16, 19, 25

transmission, 26transroutes, 27

withinhost, 29

30