Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between...

12
Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation for published version: Dudas, G, Bedford, T, Lycett, S & Rambaut, A 2015, 'Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex', Molecular Biology and Evolution, vol. 32, no. 1, pp. 162-172. https://doi.org/10.1093/molbev/msu287 Digital Object Identifier (DOI): 10.1093/molbev/msu287 Link: Link to publication record in Edinburgh Research Explorer Document Version: Publisher's PDF, also known as Version of record Published In: Molecular Biology and Evolution Publisher Rights Statement: © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 13. Mar. 2021

Transcript of Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between...

Page 1: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

Edinburgh Research Explorer

Reassortment between Influenza B Lineages and the Emergenceof a Coadapted PB1-PB2-HA Gene Complex

Citation for published versionDudas G Bedford T Lycett S amp Rambaut A 2015 Reassortment between Influenza B Lineages and theEmergence of a Coadapted PB1-PB2-HA Gene Complex Molecular Biology and Evolution vol 32 no 1pp 162-172 httpsdoiorg101093molbevmsu287

Digital Object Identifier (DOI)101093molbevmsu287

LinkLink to publication record in Edinburgh Research Explorer

Document VersionPublishers PDF also known as Version of record

Published InMolecular Biology and Evolution

Publisher Rights Statement

copy The Author 2014 Published by Oxford University Press on behalf of the Society for Molecular Biology andEvolution

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-CommercialLicense (httpcreativecommonsorglicensesby-nc40) which permits non-commercial re-use distribution andreproduction in any medium provided the original work is properly cited For commercial re-use please contactjournalspermissionsoupcom

General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights

Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation If you believe that the public display of this file breaches copyright pleasecontact openaccessedacuk providing details and we will remove access to the work immediately andinvestigate your claim

Download date 13 Mar 2021

Article

Reassortment between Influenza B Lineages and the Emergenceof a Coadapted PB1ndashPB2ndashHA Gene ComplexGytis Dudas1 Trevor Bedford2 Samantha Lycett13 and Andrew Rambaut145

1Institute of Evolutionary Biology University of Edinburgh Edinburgh United Kingdom2Vaccine and Infectious Disease Division Fred Hutchinson Cancer Research Center Seattle WA3Institute of Biodiversity Animal Health and Comparative Medicine University of Glasgow Glasgow United Kingdom4Fogarty International Center National Institutes of Health Bethesda MD5Centre for Immunology Infection and Evolution at the University of Edinburgh Edinburgh United Kingdom

Corresponding author E-mail gdudassmsedacuk

Associate editor Robin Bush

Abstract

Influenza B viruses make a considerable contribution to morbidity attributed to seasonal influenza Currently circulatinginfluenza B isolates are known to belong to two antigenically distinct lineages referred to as BVictoria and BYamagataFrequent exchange of genomic segments of these two lineages has been noted in the past but the observed patterns ofreassortment have not been formalized in detail We investigate interlineage reassortments by comparing phylogenetictrees across genomic segments Our analyses indicate that of the eight segments of influenza B viruses only segmentscoding for polymerase basic 1 and 2 (PB1 and PB2) and hemagglutinin (HA) proteins have maintained separate Victoriaand Yamagata lineages and that currently circulating strains possess PB1 PB2 and HA segments derived entirely fromone or the other lineage other segments have repeatedly reassorted between lineages thereby reducing genetic diversityWe argue that this difference between segments is due to selection against reassortant viruses with mixed-lineage PB1PB2 and HA segments Given sufficient time and continued recruitment to the reassortment-isolated PB1ndashPB2ndashHA genecomplex we expect influenza B viruses to eventually undergo sympatric speciation

Key words influenza reassortment evolution phylogenetics speciation

IntroductionSeasonal influenza causes between 250000 and 500000deaths annually and comprises lineages from three virustypes (A B and C) cocirculating in humans of which influ-enza A is considered to cause the majority of seasonal mor-bidity and mortality (World Health Organization 2009)Occasionally influenza B viruses become the predominantcirculating virus in some locations for example in the 20122013 European season as many as 53 of influenza sentinelsurveillance samples tested positive for influenza B (Broberget al 2013)

Like other members of Orthomyxoviridae influenza B vi-ruses have segmented genomes which allow viruses coinfect-ing the same cell to exchange segments a process known asreassortment Influenza A viruses are widely considered to bea major threat to human health worldwide due to their abilityto cause pandemics in humans through reassortment of cir-culating human strains with nonhuman influenza A strainsAlthough influenza B viruses have been observed to infectseals (Osterhaus et al 2000 Bodewes et al 2013) through areverse zoonosis they are thought to primarily infect humansand are thus unlikely to exhibit pandemics due to the absenceof an animal reservoir from which to acquire antigenic nov-elty Both influenza A and B evolve antigenically through timein a process known as antigenic drift in which mutations tothe hemagglutinin (HA) protein allow viruses to escape

existing human immunity and persist in the human popula-tion leading to recurrent seasonal epidemics (Burnet 1955Hay et al 2001 Bedford et al 2014)

Currently circulating influenza B viruses comprise two dis-tinct lineagesmdashVictoria and Yamagata (referred to as Vic andYam respectively)mdashnamed after strains BVictoria287 andBYamagata1688 that are thought to have genetically di-verged in HA around 1983 (Rota et al 1990) These two lin-eages now possess antigenically distinct HA surfaceglycoproteins (Kanegae et al 1990 Rota et al 1990 Neromeet al 1998 Nakagawa et al 2002 Ansaldi et al 2003) allowingthem to cocirculate in the human population Phylogeneticanalysis of evolutionary rate selective pressures and reassort-ment history of influenza B has shown extensive and oftencomplicated patterns of reassortment between all segmentsof influenza B viruses both between and within the Vic andYam lineages (Chen and Holmes 2008)

Here we extend previous methods to reveal an intriguingpattern of reassortment in influenza B In our approachmembership to either the Vic or Yam lineage in one segmentis used to label the individual isolates in the tree of the othersegments By modeling the transition between labels on aphylogenetic tree reassortment events which result in thereplacement of one segmentrsquos lineage by another show upas label changes along a branch (fig 1) We use this method toreconstruct major reassortment events and quantify reassort-ment dynamics over time in a data set of 452 influenza B

The Author 2014 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(httpcreativecommonsorglicensesby-nc40) which permits non-commercial re-use distribution and reproduction in anymedium provided the original work is properly cited For commercial re-use please contact journalspermissionsoupcom Open AccessMol Biol Evol doi101093molbevmsu287 Advance Access publication October 15 2014 1

MBE Advance Access published October 31 2014 at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

genomes and conduct secondary analyses in a data set of1603 influenza B genomes

We show that despite extensive reassortment three of theeight segmentsmdashtwo segments coding for components ofthe influenza B virus polymerase PB1 and PB2 and the sur-face glycoprotein HAmdashstill survive as distinct Vic and Yamlineages which appear to be codependent to the point wherevirions which do not contain PB1 PB2 or HA segments de-rived entirely from either the Vic or the Yam lineage haverarely been isolated and only circulate as transient lineagesonce isolated In other segments (PA NP NA MP and NS) asingle lineage has introgressed into the opposing backgroundand replaced the previous lineage All currently circulatinginfluenza B viruses have PA NP NA and MP segmentsderived from Yam lineage and NS segments derived fromVic lineage

Results

Analysis of Reassortment Patterns across Vic andYam Lineages

The differentiation into Vic and Yam lineages can be seen inall segments (fig 2) and is followed by interlineage reassort-ment events In the phylogenetic trees of the PA NP NA MPand NS segments either the Vic or Yam lineage has becomethe ldquotrunkrdquo of the tree with present-day viruses deriving en-tirely from the Vic or Yam lineage (yellow vs purple bars infig 2) following reassortment However the Vic and Yamlineages of PB1 PB2 and HA segments continue to cocircu-late to this day Periodic loss of diversity in PA NP NA MP

and NS segments is consistent with introgression of one lin-eage into the other in those segments whereas maintenanceof parallel Vic and Yam lineages results in continually increas-ing diversity in segments PB1 PB2 and HA (fig 3) The PB1PB2 and HA segments from present-day viruses maintain acommon ancestor in approximately 1983 and thus accumu-late genetic diversity since the split of those segments into Vicand Yam lineages whereas other segments often lose diversitywith ancestors to present-day viruses appearing between ap-proximately 1991 and approximately 1999

By measuring mean pairwise diversity between branches ineach tree that were assigned either a Vic or Yam label in othersegments we look for reductions in between-lineage diversitywhich indicate that an interlineage reassortment event hastaken place (fig 4) This method gives a quantitative measureof reassortment-induced loss of diversity between Vic andYam lineages in two trees although care should be takenwhen interpreting the statistic as it does not correspond toany real time of most recent common ancestors (TMRCAs) inthe tree but can be interpreted as mean coalescence datebetween Vic and Yam lineages of PB1 PB2 and HA segmentsin all other trees We focus only on PB1 PB2 and HA lineagelabels as all other segments eventually become completelyderived from either the Vic or the Yam lineage Losses ofdiversity (represented by more recent mean pairwiseTMRCAs between Vic and Yam labels) in figure 4 indicatethat every segment has reassorted with respect to the Vic andYam lineages of PB1 PB2 and HA segments However we alsosee that the labels for these three segments show reciprocalpreservation of diversity after 1997 This suggests that after1997 no reassortment events have taken place between Vicand Yam lineages of PB1 PB2 and HA segments and theirlineage labels only ldquomeetrdquo at the root We do see reduceddiversity between Vic and Yam labels of PB1 PB2 and HAsegments in a time period close to the initial split of Vic andYam lineages (1986ndash1996) These reductions in diversity rep-resent small clades with reassortant PB1ndashPB2ndashHA constella-tions which go extinct by 1997 (see fig 2) We also observethat the assignment of these three segment labels to branchesof other segment trees is very similar and often identical after1997 This suggests that PB1 PB2 and HA lineage labelsswitch simultaneously in all trees after 1997

We show the ratio of Vic to Yam sequences in our primaryand secondary data sets in different influenza seasons infigure 5 which is based on which lineage each sequencewas assigned to (see Materials and Methods) It is evidentthat losses of diversity in the PA NP NA MP and NS seg-ments are related to either the Vic (NS) or the Yam (PA NPNA and MP) lineage replacing the other lineage in the influ-enza B virus population Similarly the lack of reassortmentbetween Vic and Yam lineages and maintenance of diversityof PB1 PB2 and HA can be seen where the two lineages havebeen sequenced at a ratio close to 50 over long periods oftime (fig 5) On a year-to-year basis however the ratios forVic and Yam sequences PB1 PB2 and HA can fluctuatedramatically consistent with one lineage predominatingwithin a given season in agreement with surveillance data(Reed et al 2012)

FIG 1 Schematic analysis of reassortment patterns (A) We begin byassigning sequences falling on either side of a specified bifurcationwithin each segment tree to different lineages in this case the Vicand Yam bifurcation that occurred in the early 1980s (B) We thentransfer lineage labels from one tree to the same tips in another treeTransitions between labels along this second tree thus indicate reassort-ment events that combine lineages falling on different sides of theVicYam bifurcation in the first tree (C) A reassortment graph depictionshows that tip number 6 is determined to be a reassortant based on (B)

2

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

FIG 2 Maximum clade credibility trees of all eight genome segments of influenza B viruses isolated since 1980 Trees are colored based on inferred PB1ndashPB2ndashHA lineage Vertical bars indicate the original Vic and Yam lineages within each segment Each tree is the summarized output of a single analysiscomprised of 9000 trees sampled from the posterior distribution of trees

FIG 3 Oldest TMRCA of all surviving branches over time PA NP NA MP and NS segments of influenza B viruses show periodic increases in TMRCAsof all surviving branches (indicative of diversity loss) suggesting lineage turnover PB1 PB2 and HA segments on the other hand maintain the diversitydating back to the initial split of Vic and Yam lineages Each point is the mean TMRCA of all surviving lineages existing at each time slice through thetree and vertical lines indicating uncertainty are 95 highest posterior densities

3

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

We reconstructed reassortment events that were detectedby using lineage labels Figure 6 focuses only on interlineagereassortments that have occurred after 1990 We identify fivemajor (in terms of persistence) reassortant genome constel-lations (given in order PB1ndashPB2ndashPAndashHAndashNPndashNAndashMPndashNS

with prime [0] indicating independently acquired segments)circulating between 1992 and 2011 (fig 6)

BAlaska121996-like (YndashYndashYndashYndashYndashYndashYndashV) BNanchang21997-like (VndashVndashYndashVndashYndashVndashYndashV)

FIG 4 Mean pairwise TMRCA between Vic and Yam branches under PB1 PB2 and HA label sets PB1 PB2 and HA segment labels indicate that thesesegments show reciprocal preservation of diversity which dates back to the split of Vic and Yam lineages All other segments show increasingly morerecent TMRCAs between branches labeled as Vic and Yam in PB1 PB2 and HA label sets All vertical lines indicating uncertainty are 95 highestposterior densities

4

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

BIowa032002-like (VndashVndashY0ndashVndashYndashYndashY0ndashV0) BCaliforniaNHRC00012006-like (VndashVndashYndashVndashY0ndashYndash

Y0ndashV0) BBrisbane332008-like (VndashVndashYndashVndashY0ndashYndashYndashV)

In a previous study BAlaska121996-like BNanchang21997-like and BIowa032002-like constellations were ob-served (Chen and Holmes 2008) but sequences from BCaliforniaNHRC00012006-like and BBrisbane332008-likeconstellations were not available at the time In their studyChen and Holmes (2008) also recovered the coassortmentpattern of PB1 PB2 and HA lineages but did not remarkupon it Of these five constellations four (BNanchang21997-like BIowa032002-like BCaliforniaNHRC00012006-like and BBrisbane332008-like) are derived from in-trogression of Yam lineage segments into Vic lineage PB1ndashPB2ndashHA background with only one (BAlaska121996-like)resulting from introgression of Vic lineage NS segment intoan entirely Yam-derived background All five interlineagereassortment events described here are marked by the

preservation of either entirely Vic- or Yam-derived PB1ndashPB2ndashHA segments Figure 6 also shows that reassorting seg-ments appear to evolve with a considerable degree of auton-omy For example the NP lineage that entered a largely Viclineage-derived genome and gave rise to the BNanchang21997-like isolates continued circulating until 2010 eventhough the other segments it coassorted with in 1995ndash1996(PA and MP) went extinct following the next round of reas-sortment that led to the rise of BIowa032002-like genomeconstellations A more extreme example is the NS segmentwhere a Vic sublineage was reassorted into an entirely Yambackground (BAlaska121996-like) in 1994ndash1995 then reas-sorted back into a mostly Vic background some 5 years later(BIowa032002-like) where it has replaced the ldquooriginalrdquo Vicsublineage (see fig 6)

We observe that in all five successful interlineage reassort-ment events shown in figure 6 none break up the PB1ndashPB2ndashHA complex This is an unlikely outcomemdashthe probability ofnot breaking up PB1ndashPB2ndashHA across five reassortment events

FIG 5 Ratio of Vic and Yam sequences in the data set The ratio of Vic (yellow) to Yam (purple) sequences in each segment from the primary data setover time Black lines indicate where this ratio lies in the larger secondary data set Numbers at the top of the figure show the total number of genomesavailable for each influenza season in the primary data set comprised of 452 genomes from which the ratio was calculated whereas the numbers inbrackets correspond to numbers of sequences in the larger secondary genomes data set Numbers at the bottom are influenza seasons from the 19871988 (8788) season to the 20112012 season Yam lineage of PA NP NA and MP segments and Vic lineage of the NS segment eventually become fixed(in the population genetics sense of the word) in the influenza B population PB1 PB2 and HA segments maintain separate Vic and Yam lineages

5

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is P frac14 eth2522282 THORN

5frac14 00009 where reassortment events are

considered to sample from the Vic and Yam lineages atrandom for each of the eight segments If we correct for mul-tiple testing with the assumption that coassortment of anythree segments is of interest we find that the probability ofnot breaking up an arbitrary set of three segments across five

reassortment events is P frac148

3

eth2

522282 THORN

5frac14 00485

Although the vast majority of influenza B isolates possesseither Vic or Yam lineage-derived PB1ndashPB2ndashHA complexeson rare occasions mixed-lineage PB1ndashPB2ndashHA constellationsemerge Figure 7 shows the sum of branch lengths which werelabeled as having entirely Vic entirely Yam or mixed-lineagePB1 PB2 and HA segments Due to lack of reassortmentbetween Vic and Yam lineages of PB1 PB2 and HA (fig 4)since 1997 all segments have spent significantly longer periodsof evolutionary time with either entirely Vic-derived or en-tirely Yam-derived than with mixed-lineage PB1 PB2 and HAconstellations (fig 7) We have identified three instancesof mixed-lineage PB1ndashPB2ndashHA reassortants from the primarydata set with the following PB1ndashPB2ndashHA constellations VVY(BBangkok1631990-like 13 sequences isolated 1990ndashJanuary 5 1995) VYV (BNanchang6301994-like two se-quences isolated 1994ndash1996) and VYY (BNew York241993-like two sequences isolated January 8 1993ndash1994)We detected two new reassortant lineages when investigatingthe larger secondary data setmdashBWaikato62005-like viruseswith PB1ndashPB2ndashHA constellation YYV (17 sequences isolated

May 9ndashOctober 12 in 2005) and BMalaysia18297822007with PB1ndashPB2ndashHA constellation YVY (one sequence isolatedAugust 2 2007)

Analysis of Reassortment Properties

We attempted to quantify the temporal discordance betweenlineages reassorting into new genomic constellations If one

FIG 6 Schematic plot of reconstructed reassortments between Vic and Yam lineage segments of influenza B virus Lineages that coassort in genomesare represented by eight parallel lines with lineages that derive from the original Vic clade colored yellowbrown and lineages that derive from theoriginal Yam clade colored lilacpurple Interlineage reassortment events are indicated by lines entering a different genome The angle of incominglineages represents uncertainty in the timing of the event (mean date of the reassortant node and its parent node) Lineage extinction dates are notshown accurately

FIG 7 Amount of evolutionary time each segment has spent underdifferent PB1ndashPB2ndashHA constellations All segments have spent signifi-cantly more of their history with entirely Vic- or entirely Yam-derivedPB1ndashPB2ndashHA complexes All vertical lines indicating uncertainty are95 highest posterior densities

6

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were able to recover an influenza ldquospecies treerdquo includingadmixturereassortment events it would be possible to esti-mate the reassortment or recombination ldquodistancerdquo whichis the time between a split in the species tree in the pastand a reassortment event (see supplementary fig S17Supplementary Material online) Although we do not findevidence of differences in total number of reassortments be-tween segments (see supplementary fig S4 SupplementaryMaterial online) we find support for a reassortment ldquodis-tancerdquo effect in which a pair of tips on one segment has adifferent TMRCA from the same pair of tips on a differentsegment The summary statistic we use that reflects this dif-ference in TMRCAs TMRCA is most sensitive when only oneof the two trees being compared loses diversity through reas-sortment and the other acts like a proxy for the ldquospecies treerdquoWe normalize our TMRCA comparisons to arrive at TMRCAwhich accounts for uncertainty in tree topology (seeMaterials and Methods) Figure 8 shows TMRCA values forall pairs of trees Most segment pairs show very low values forthis statistic with TMRCAamp01 indicating that TMRCA mea-surements between replicate posterior samples from thesame segment are up to ten times smaller than TMRCA

values between posterior samples from different segmentsPB1 PB2 and HA trees on the other hand exhibit TMRCA

values that are much higher This shows that TMRCA differ-ences between trees of PB1 PB2 and HA segments arethough noisy occasionally very similar to uncertainty in tip-to-tip TMRCAs between replicate analyses of these segments

Discussion

Linkage between PB1 PB2 and HA Gene Segments

In this article we show that the PB1 PB2 and HA segments ofinfluenza B viruses are the only ones that have continuouslymaintained separate Vic and Yam lineages whereas othersegments have fixed either Vic or Yam lineages (figs 2 5and 6) Evidence suggests that this is a result of prolongedlack of reassortment between Vic and Yam lineages in PB1PB2 and HA (fig 4) which possess coassorting sequencesdetectable as high linkage disequilibrium (LD) (supplemen-tary fig S1 Supplementary Material online) The vast majority

of the sampled evolutionary history of each segment of influ-enza B viruses since the split of Vic and Yam lineages has beenspent in association with either completely Vic or completelyYam lineage-derived PB1ndashPB2ndashHA complexes (fig 7) sug-gesting that having ldquopurerdquo lineage PB1ndashPB2ndashHA complexesis important for whole-genome fitness We propose that thispattern of coassortment is due to the action of selection andnot simply biased or rare reassortment

The origin of the strong genetic linkage between PB1 PB2and HA segments remains unclear We believe that there aretwo alternative but similar explanations for the origins of thestrong genetic linkage between these segments Mutation-driven coevolution (Presgraves 2010) and DobzhanskyndashMuller (DM) incompatibility (Dobzhansky 1937 Muller1942) Mutation-driven coevolution (Presgraves 2010) hasbeen suggested to be the cause of hybrid dysfunction inSaccharomyces hybrids (Lee et al 2008) and evolves as abyproduct of adaptation If one or the other influenza B lin-eage has undergone adaptation we might expect thesechanges to be beneficial in its native background and incom-patible with a foreign background DM incompatibility oper-ates in a similar way but the main difference from thescenario described earlier is that the incompatible alleles areneutral or nearly neutral in their native background andbecome deleterious or lethal when combined with nonnativebackgrounds Emergence of DM incompatibility is aided bygeographic isolation Interestingly the Vic lineage of HA wasrestricted to eastern Asia between 1992 and 2000 (Neromeet al 1998 Shaw et al 2002) offering ample time for thebudding Vic lineage to accumulate alleles causing reassort-ment incompatibility However without more genomic datafrom the past it is difficult to estimate to what extent influ-enza B virus population structure contributed to the devel-opment of the current segment linkage

Potential Mechanisms for ReassortmentIncompatibility

Unfortunately the limited amount of genomic data availablefor the early years of the VicndashYam split precludes any at-tempts of answering whether selection or drift have led tothe current linkage of PB1 PB2 and HA segments Althoughthe origins of the linkage between these three segments mightbe difficult to explain we can speculate on the nature ofreassortment incompatibility For example it is intuitive forwhy this might be the case for PB1 and PB2 Both proteinsinteract directly as part of the RNA-dependent RNA polymer-ase heterotrimer Indeed we observe that PB1ndashPB2 reassor-tants are the rarest and least persistent among mixed-lineagePB1ndashPB2ndashHA strains and have not been isolated in greatnumbers In fact most reassortants breaking the PB1ndashPB2ndashHA complex apart have occurred in the past close to the splitof Vic and Yam lineages and have become very rare since

There is some evidence that the linkage between PB1 andHA might not be a phenomenon restricted to influenza Bviruses It has been established that at least for the 1957and the 1968 influenza pandemics caused by AH2N2 andAH3N2 subtypes respectively the viruses responsible were

FIG 8 TMRCA statistics for different segment pairs PB1 PB2 and HAtrees exhibit reciprocally highly similar TMRCAs unlike most otherpairwise comparisons All vertical lines indicating uncertainty are 95highest posterior densities

7

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

reassortants possessing PB1 and HA segments derived fromavian influenza A viruses (Kawaoka et al 1989) In additionoutdated techniques for producing vaccine seed strainsthrough selection for HAndashNA reassortants often yieldedPB1ndashHAndashNA reassortants as a side-effect (Bergeron et al2010 Fulvini et al 2011) Recent experiments have foundthat the presence or absence of a ldquoforeignrdquo PB1 segmentcan have dramatic effects on HA concentration on the surfaceof virions and total virion production (Cobbin et al 2013)However there have been reassortant influenza A virusescirculating for prolonged periods of time in humans thatdid have disparate PB1 and HA segments for exampleH1N2 outbreaks in 2001 (Gregory et al 2002) and H1N109in 2009 (Smith et al 2009)

We believe that the association between PB1 PB2 and HAsegments should be relatively straightforward to explore inthe lab Reverse genetics systems have been developed forinfluenza B viruses (Hoffmann et al 2002) which wouldallow the creation of artificial reassortants Based on the fre-quency and persistence times of different reassortant classeswe have observed we expect a hierarchy of reassortant fitnessstarting with PB1+PB2+HA reassortants which should bethe most fit followed by PB1+2HA then PB1+HAPB2and finally PB2+HAPB1 reassortants with the lowest fitnessWe believe that this is the most direct approach to unravelingthe mechanism responsible for the linkage within the PB1ndashPB2ndashHA complex

Will Influenza B Viruses Speciate

We suggest that the preservation of two PB1ndashPB2ndashHA com-plex lineages is similar to genomic speciation islands wheresmall numbers of genes resist being homogenized throughgene flow (Turner et al 2005) In this context we see threepotential paths of evolution for influenza B viruses If moresegments get recruited to the PB1ndashPB2ndashHA complex theprocess could continue until ldquospeciationrdquo occurs in whichnone of the segments is able to reassort across the VicndashYam lineage boundary Alternatively the influenza Bgenome could continue to be homogenized through geneflow with the exception of PB1 PB2 and HA segments orone of the two PB1ndashPB2ndashHA complexes could go extinctmarking the return of single-strain dynamics in the influenzaB virus population The eventual fate of influenza B viruseswill likely be determined by the combined effects of reassort-ment frequency and the strength of epistatic interactionsbetween segments

Materials and MethodsWe compiled a primary data set of 452 complete influenza Bgenomes from GISAID (Bogner et al 2006) dating from 1984to 2012 The longest protein-coding region of each segmentwas extracted and used for all further analyses We thusassume that homologous recombination has not takenplace and that the evolutionary history of the whole segmentcan be inferred from the longest coding sequence in the seg-ment To date there has been little evidence of homologousrecombination in influenza viruses (Chare et al 2003

Boni et al 2008 Han et al 2010) The segments of eachstrain were assigned to either Vic or Yam lineage by makingmaximum-likelihood trees of each segment using PhyML(Guindon and Gascuel 2003) and identifying whether theisolate was more closely related to BVictoria287 orBYamagata1688 sequences in that segment with the ex-ception of the NS segment as BVictoria287 was a reassor-tant and possessed a Yam lineage NS (Lindstrom et al 1999)BCzechoslovakia691990 was considered as being represen-tative of Vic lineage for the NS segment Every segment ineach genome thus received either a Vic or a Yam lineagedesignation for example the strain BVictoria287 receivedVndashVndashVndashVndashVndashVndashVndashY as its NS segment is derived from theYam lineage and the rest of the genome is Vic

We also collated a secondary data set from all completeinfluenza B virus genomes available on GenBank as of May 52014 After removing isolates that had considerable portionsof any sequence missing were isolated prior to 1980 or weresuspected of having a contaminant sequence in any segmentwe were left with 1603 sequences This data set only becameavailable after all primary analyses were performed are mainlyfrom Australia New Zealand and the United States and aretoo numerous to analyze in BEAST (Drummond et al 2012)PhyML (Guindon and Gascuel 2003) was used to producephylogenies of each segment and the lineage of each isolatewas determined based on grouping with either BVictoria287 or BYamagata1688 sequences as described above Byassociating strains with lineage identity of each of their seg-ments we reconstructed the most parsimonious interlineagereassortment history for the secondary data set The second-ary data set was used to check how representative the pri-mary data set was to estimate LD and to broadly confirm ourresults All analyses pertain to the primary data set unlessstated otherwise

Temporally calibrated phylogenies were recovered for eachsegment in the primary data set using Markov chain MonteCarlo (MCMC) methods in the BEAST software package(Drummond et al 2012) We modeled the substitution pro-cess using the HasegawandashKishinondashYano model of nucleotidesubstitution (Hasegawa et al 1985) with separate transitionmodels for each of the three codon partitions and addition-ally estimated realized synonymous and nonsynonymoussubstitution counts (OrsquoBrien et al 2009) We used a flexibleBayesian skyride demographic model (Minin et al 2008) Weaccounted for incomplete sampling dates for 94 sequences(of which 93 had only year and 1 had only year and month ofisolation) whereby tip date is estimated as a latent variable inthe MCMC integration A relaxed molecular clock was usedwhere branch rates are drawn from a lognormal distribution(Drummond et al 2006) We ran three independent MCMCchains each with 200 million states sampled every 20000steps and discarded the first 10 of the MCMC states asburn-in After assessing convergence of all three MCMCchains by visual inspection using Tracer (Rambaut et al2009) we combined samples across chains to give a total of27000 samples from the posterior distribution of trees

Every sequence was assigned seven discrete traits in BEAUticorresponding to the lineages of all other segments with

8

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 2: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

Article

Reassortment between Influenza B Lineages and the Emergenceof a Coadapted PB1ndashPB2ndashHA Gene ComplexGytis Dudas1 Trevor Bedford2 Samantha Lycett13 and Andrew Rambaut145

1Institute of Evolutionary Biology University of Edinburgh Edinburgh United Kingdom2Vaccine and Infectious Disease Division Fred Hutchinson Cancer Research Center Seattle WA3Institute of Biodiversity Animal Health and Comparative Medicine University of Glasgow Glasgow United Kingdom4Fogarty International Center National Institutes of Health Bethesda MD5Centre for Immunology Infection and Evolution at the University of Edinburgh Edinburgh United Kingdom

Corresponding author E-mail gdudassmsedacuk

Associate editor Robin Bush

Abstract

Influenza B viruses make a considerable contribution to morbidity attributed to seasonal influenza Currently circulatinginfluenza B isolates are known to belong to two antigenically distinct lineages referred to as BVictoria and BYamagataFrequent exchange of genomic segments of these two lineages has been noted in the past but the observed patterns ofreassortment have not been formalized in detail We investigate interlineage reassortments by comparing phylogenetictrees across genomic segments Our analyses indicate that of the eight segments of influenza B viruses only segmentscoding for polymerase basic 1 and 2 (PB1 and PB2) and hemagglutinin (HA) proteins have maintained separate Victoriaand Yamagata lineages and that currently circulating strains possess PB1 PB2 and HA segments derived entirely fromone or the other lineage other segments have repeatedly reassorted between lineages thereby reducing genetic diversityWe argue that this difference between segments is due to selection against reassortant viruses with mixed-lineage PB1PB2 and HA segments Given sufficient time and continued recruitment to the reassortment-isolated PB1ndashPB2ndashHA genecomplex we expect influenza B viruses to eventually undergo sympatric speciation

Key words influenza reassortment evolution phylogenetics speciation

IntroductionSeasonal influenza causes between 250000 and 500000deaths annually and comprises lineages from three virustypes (A B and C) cocirculating in humans of which influ-enza A is considered to cause the majority of seasonal mor-bidity and mortality (World Health Organization 2009)Occasionally influenza B viruses become the predominantcirculating virus in some locations for example in the 20122013 European season as many as 53 of influenza sentinelsurveillance samples tested positive for influenza B (Broberget al 2013)

Like other members of Orthomyxoviridae influenza B vi-ruses have segmented genomes which allow viruses coinfect-ing the same cell to exchange segments a process known asreassortment Influenza A viruses are widely considered to bea major threat to human health worldwide due to their abilityto cause pandemics in humans through reassortment of cir-culating human strains with nonhuman influenza A strainsAlthough influenza B viruses have been observed to infectseals (Osterhaus et al 2000 Bodewes et al 2013) through areverse zoonosis they are thought to primarily infect humansand are thus unlikely to exhibit pandemics due to the absenceof an animal reservoir from which to acquire antigenic nov-elty Both influenza A and B evolve antigenically through timein a process known as antigenic drift in which mutations tothe hemagglutinin (HA) protein allow viruses to escape

existing human immunity and persist in the human popula-tion leading to recurrent seasonal epidemics (Burnet 1955Hay et al 2001 Bedford et al 2014)

Currently circulating influenza B viruses comprise two dis-tinct lineagesmdashVictoria and Yamagata (referred to as Vic andYam respectively)mdashnamed after strains BVictoria287 andBYamagata1688 that are thought to have genetically di-verged in HA around 1983 (Rota et al 1990) These two lin-eages now possess antigenically distinct HA surfaceglycoproteins (Kanegae et al 1990 Rota et al 1990 Neromeet al 1998 Nakagawa et al 2002 Ansaldi et al 2003) allowingthem to cocirculate in the human population Phylogeneticanalysis of evolutionary rate selective pressures and reassort-ment history of influenza B has shown extensive and oftencomplicated patterns of reassortment between all segmentsof influenza B viruses both between and within the Vic andYam lineages (Chen and Holmes 2008)

Here we extend previous methods to reveal an intriguingpattern of reassortment in influenza B In our approachmembership to either the Vic or Yam lineage in one segmentis used to label the individual isolates in the tree of the othersegments By modeling the transition between labels on aphylogenetic tree reassortment events which result in thereplacement of one segmentrsquos lineage by another show upas label changes along a branch (fig 1) We use this method toreconstruct major reassortment events and quantify reassort-ment dynamics over time in a data set of 452 influenza B

The Author 2014 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(httpcreativecommonsorglicensesby-nc40) which permits non-commercial re-use distribution and reproduction in anymedium provided the original work is properly cited For commercial re-use please contact journalspermissionsoupcom Open AccessMol Biol Evol doi101093molbevmsu287 Advance Access publication October 15 2014 1

MBE Advance Access published October 31 2014 at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

genomes and conduct secondary analyses in a data set of1603 influenza B genomes

We show that despite extensive reassortment three of theeight segmentsmdashtwo segments coding for components ofthe influenza B virus polymerase PB1 and PB2 and the sur-face glycoprotein HAmdashstill survive as distinct Vic and Yamlineages which appear to be codependent to the point wherevirions which do not contain PB1 PB2 or HA segments de-rived entirely from either the Vic or the Yam lineage haverarely been isolated and only circulate as transient lineagesonce isolated In other segments (PA NP NA MP and NS) asingle lineage has introgressed into the opposing backgroundand replaced the previous lineage All currently circulatinginfluenza B viruses have PA NP NA and MP segmentsderived from Yam lineage and NS segments derived fromVic lineage

Results

Analysis of Reassortment Patterns across Vic andYam Lineages

The differentiation into Vic and Yam lineages can be seen inall segments (fig 2) and is followed by interlineage reassort-ment events In the phylogenetic trees of the PA NP NA MPand NS segments either the Vic or Yam lineage has becomethe ldquotrunkrdquo of the tree with present-day viruses deriving en-tirely from the Vic or Yam lineage (yellow vs purple bars infig 2) following reassortment However the Vic and Yamlineages of PB1 PB2 and HA segments continue to cocircu-late to this day Periodic loss of diversity in PA NP NA MP

and NS segments is consistent with introgression of one lin-eage into the other in those segments whereas maintenanceof parallel Vic and Yam lineages results in continually increas-ing diversity in segments PB1 PB2 and HA (fig 3) The PB1PB2 and HA segments from present-day viruses maintain acommon ancestor in approximately 1983 and thus accumu-late genetic diversity since the split of those segments into Vicand Yam lineages whereas other segments often lose diversitywith ancestors to present-day viruses appearing between ap-proximately 1991 and approximately 1999

By measuring mean pairwise diversity between branches ineach tree that were assigned either a Vic or Yam label in othersegments we look for reductions in between-lineage diversitywhich indicate that an interlineage reassortment event hastaken place (fig 4) This method gives a quantitative measureof reassortment-induced loss of diversity between Vic andYam lineages in two trees although care should be takenwhen interpreting the statistic as it does not correspond toany real time of most recent common ancestors (TMRCAs) inthe tree but can be interpreted as mean coalescence datebetween Vic and Yam lineages of PB1 PB2 and HA segmentsin all other trees We focus only on PB1 PB2 and HA lineagelabels as all other segments eventually become completelyderived from either the Vic or the Yam lineage Losses ofdiversity (represented by more recent mean pairwiseTMRCAs between Vic and Yam labels) in figure 4 indicatethat every segment has reassorted with respect to the Vic andYam lineages of PB1 PB2 and HA segments However we alsosee that the labels for these three segments show reciprocalpreservation of diversity after 1997 This suggests that after1997 no reassortment events have taken place between Vicand Yam lineages of PB1 PB2 and HA segments and theirlineage labels only ldquomeetrdquo at the root We do see reduceddiversity between Vic and Yam labels of PB1 PB2 and HAsegments in a time period close to the initial split of Vic andYam lineages (1986ndash1996) These reductions in diversity rep-resent small clades with reassortant PB1ndashPB2ndashHA constella-tions which go extinct by 1997 (see fig 2) We also observethat the assignment of these three segment labels to branchesof other segment trees is very similar and often identical after1997 This suggests that PB1 PB2 and HA lineage labelsswitch simultaneously in all trees after 1997

We show the ratio of Vic to Yam sequences in our primaryand secondary data sets in different influenza seasons infigure 5 which is based on which lineage each sequencewas assigned to (see Materials and Methods) It is evidentthat losses of diversity in the PA NP NA MP and NS seg-ments are related to either the Vic (NS) or the Yam (PA NPNA and MP) lineage replacing the other lineage in the influ-enza B virus population Similarly the lack of reassortmentbetween Vic and Yam lineages and maintenance of diversityof PB1 PB2 and HA can be seen where the two lineages havebeen sequenced at a ratio close to 50 over long periods oftime (fig 5) On a year-to-year basis however the ratios forVic and Yam sequences PB1 PB2 and HA can fluctuatedramatically consistent with one lineage predominatingwithin a given season in agreement with surveillance data(Reed et al 2012)

FIG 1 Schematic analysis of reassortment patterns (A) We begin byassigning sequences falling on either side of a specified bifurcationwithin each segment tree to different lineages in this case the Vicand Yam bifurcation that occurred in the early 1980s (B) We thentransfer lineage labels from one tree to the same tips in another treeTransitions between labels along this second tree thus indicate reassort-ment events that combine lineages falling on different sides of theVicYam bifurcation in the first tree (C) A reassortment graph depictionshows that tip number 6 is determined to be a reassortant based on (B)

2

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

FIG 2 Maximum clade credibility trees of all eight genome segments of influenza B viruses isolated since 1980 Trees are colored based on inferred PB1ndashPB2ndashHA lineage Vertical bars indicate the original Vic and Yam lineages within each segment Each tree is the summarized output of a single analysiscomprised of 9000 trees sampled from the posterior distribution of trees

FIG 3 Oldest TMRCA of all surviving branches over time PA NP NA MP and NS segments of influenza B viruses show periodic increases in TMRCAsof all surviving branches (indicative of diversity loss) suggesting lineage turnover PB1 PB2 and HA segments on the other hand maintain the diversitydating back to the initial split of Vic and Yam lineages Each point is the mean TMRCA of all surviving lineages existing at each time slice through thetree and vertical lines indicating uncertainty are 95 highest posterior densities

3

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

We reconstructed reassortment events that were detectedby using lineage labels Figure 6 focuses only on interlineagereassortments that have occurred after 1990 We identify fivemajor (in terms of persistence) reassortant genome constel-lations (given in order PB1ndashPB2ndashPAndashHAndashNPndashNAndashMPndashNS

with prime [0] indicating independently acquired segments)circulating between 1992 and 2011 (fig 6)

BAlaska121996-like (YndashYndashYndashYndashYndashYndashYndashV) BNanchang21997-like (VndashVndashYndashVndashYndashVndashYndashV)

FIG 4 Mean pairwise TMRCA between Vic and Yam branches under PB1 PB2 and HA label sets PB1 PB2 and HA segment labels indicate that thesesegments show reciprocal preservation of diversity which dates back to the split of Vic and Yam lineages All other segments show increasingly morerecent TMRCAs between branches labeled as Vic and Yam in PB1 PB2 and HA label sets All vertical lines indicating uncertainty are 95 highestposterior densities

4

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

BIowa032002-like (VndashVndashY0ndashVndashYndashYndashY0ndashV0) BCaliforniaNHRC00012006-like (VndashVndashYndashVndashY0ndashYndash

Y0ndashV0) BBrisbane332008-like (VndashVndashYndashVndashY0ndashYndashYndashV)

In a previous study BAlaska121996-like BNanchang21997-like and BIowa032002-like constellations were ob-served (Chen and Holmes 2008) but sequences from BCaliforniaNHRC00012006-like and BBrisbane332008-likeconstellations were not available at the time In their studyChen and Holmes (2008) also recovered the coassortmentpattern of PB1 PB2 and HA lineages but did not remarkupon it Of these five constellations four (BNanchang21997-like BIowa032002-like BCaliforniaNHRC00012006-like and BBrisbane332008-like) are derived from in-trogression of Yam lineage segments into Vic lineage PB1ndashPB2ndashHA background with only one (BAlaska121996-like)resulting from introgression of Vic lineage NS segment intoan entirely Yam-derived background All five interlineagereassortment events described here are marked by the

preservation of either entirely Vic- or Yam-derived PB1ndashPB2ndashHA segments Figure 6 also shows that reassorting seg-ments appear to evolve with a considerable degree of auton-omy For example the NP lineage that entered a largely Viclineage-derived genome and gave rise to the BNanchang21997-like isolates continued circulating until 2010 eventhough the other segments it coassorted with in 1995ndash1996(PA and MP) went extinct following the next round of reas-sortment that led to the rise of BIowa032002-like genomeconstellations A more extreme example is the NS segmentwhere a Vic sublineage was reassorted into an entirely Yambackground (BAlaska121996-like) in 1994ndash1995 then reas-sorted back into a mostly Vic background some 5 years later(BIowa032002-like) where it has replaced the ldquooriginalrdquo Vicsublineage (see fig 6)

We observe that in all five successful interlineage reassort-ment events shown in figure 6 none break up the PB1ndashPB2ndashHA complex This is an unlikely outcomemdashthe probability ofnot breaking up PB1ndashPB2ndashHA across five reassortment events

FIG 5 Ratio of Vic and Yam sequences in the data set The ratio of Vic (yellow) to Yam (purple) sequences in each segment from the primary data setover time Black lines indicate where this ratio lies in the larger secondary data set Numbers at the top of the figure show the total number of genomesavailable for each influenza season in the primary data set comprised of 452 genomes from which the ratio was calculated whereas the numbers inbrackets correspond to numbers of sequences in the larger secondary genomes data set Numbers at the bottom are influenza seasons from the 19871988 (8788) season to the 20112012 season Yam lineage of PA NP NA and MP segments and Vic lineage of the NS segment eventually become fixed(in the population genetics sense of the word) in the influenza B population PB1 PB2 and HA segments maintain separate Vic and Yam lineages

5

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is P frac14 eth2522282 THORN

5frac14 00009 where reassortment events are

considered to sample from the Vic and Yam lineages atrandom for each of the eight segments If we correct for mul-tiple testing with the assumption that coassortment of anythree segments is of interest we find that the probability ofnot breaking up an arbitrary set of three segments across five

reassortment events is P frac148

3

eth2

522282 THORN

5frac14 00485

Although the vast majority of influenza B isolates possesseither Vic or Yam lineage-derived PB1ndashPB2ndashHA complexeson rare occasions mixed-lineage PB1ndashPB2ndashHA constellationsemerge Figure 7 shows the sum of branch lengths which werelabeled as having entirely Vic entirely Yam or mixed-lineagePB1 PB2 and HA segments Due to lack of reassortmentbetween Vic and Yam lineages of PB1 PB2 and HA (fig 4)since 1997 all segments have spent significantly longer periodsof evolutionary time with either entirely Vic-derived or en-tirely Yam-derived than with mixed-lineage PB1 PB2 and HAconstellations (fig 7) We have identified three instancesof mixed-lineage PB1ndashPB2ndashHA reassortants from the primarydata set with the following PB1ndashPB2ndashHA constellations VVY(BBangkok1631990-like 13 sequences isolated 1990ndashJanuary 5 1995) VYV (BNanchang6301994-like two se-quences isolated 1994ndash1996) and VYY (BNew York241993-like two sequences isolated January 8 1993ndash1994)We detected two new reassortant lineages when investigatingthe larger secondary data setmdashBWaikato62005-like viruseswith PB1ndashPB2ndashHA constellation YYV (17 sequences isolated

May 9ndashOctober 12 in 2005) and BMalaysia18297822007with PB1ndashPB2ndashHA constellation YVY (one sequence isolatedAugust 2 2007)

Analysis of Reassortment Properties

We attempted to quantify the temporal discordance betweenlineages reassorting into new genomic constellations If one

FIG 6 Schematic plot of reconstructed reassortments between Vic and Yam lineage segments of influenza B virus Lineages that coassort in genomesare represented by eight parallel lines with lineages that derive from the original Vic clade colored yellowbrown and lineages that derive from theoriginal Yam clade colored lilacpurple Interlineage reassortment events are indicated by lines entering a different genome The angle of incominglineages represents uncertainty in the timing of the event (mean date of the reassortant node and its parent node) Lineage extinction dates are notshown accurately

FIG 7 Amount of evolutionary time each segment has spent underdifferent PB1ndashPB2ndashHA constellations All segments have spent signifi-cantly more of their history with entirely Vic- or entirely Yam-derivedPB1ndashPB2ndashHA complexes All vertical lines indicating uncertainty are95 highest posterior densities

6

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were able to recover an influenza ldquospecies treerdquo includingadmixturereassortment events it would be possible to esti-mate the reassortment or recombination ldquodistancerdquo whichis the time between a split in the species tree in the pastand a reassortment event (see supplementary fig S17Supplementary Material online) Although we do not findevidence of differences in total number of reassortments be-tween segments (see supplementary fig S4 SupplementaryMaterial online) we find support for a reassortment ldquodis-tancerdquo effect in which a pair of tips on one segment has adifferent TMRCA from the same pair of tips on a differentsegment The summary statistic we use that reflects this dif-ference in TMRCAs TMRCA is most sensitive when only oneof the two trees being compared loses diversity through reas-sortment and the other acts like a proxy for the ldquospecies treerdquoWe normalize our TMRCA comparisons to arrive at TMRCAwhich accounts for uncertainty in tree topology (seeMaterials and Methods) Figure 8 shows TMRCA values forall pairs of trees Most segment pairs show very low values forthis statistic with TMRCAamp01 indicating that TMRCA mea-surements between replicate posterior samples from thesame segment are up to ten times smaller than TMRCA

values between posterior samples from different segmentsPB1 PB2 and HA trees on the other hand exhibit TMRCA

values that are much higher This shows that TMRCA differ-ences between trees of PB1 PB2 and HA segments arethough noisy occasionally very similar to uncertainty in tip-to-tip TMRCAs between replicate analyses of these segments

Discussion

Linkage between PB1 PB2 and HA Gene Segments

In this article we show that the PB1 PB2 and HA segments ofinfluenza B viruses are the only ones that have continuouslymaintained separate Vic and Yam lineages whereas othersegments have fixed either Vic or Yam lineages (figs 2 5and 6) Evidence suggests that this is a result of prolongedlack of reassortment between Vic and Yam lineages in PB1PB2 and HA (fig 4) which possess coassorting sequencesdetectable as high linkage disequilibrium (LD) (supplemen-tary fig S1 Supplementary Material online) The vast majority

of the sampled evolutionary history of each segment of influ-enza B viruses since the split of Vic and Yam lineages has beenspent in association with either completely Vic or completelyYam lineage-derived PB1ndashPB2ndashHA complexes (fig 7) sug-gesting that having ldquopurerdquo lineage PB1ndashPB2ndashHA complexesis important for whole-genome fitness We propose that thispattern of coassortment is due to the action of selection andnot simply biased or rare reassortment

The origin of the strong genetic linkage between PB1 PB2and HA segments remains unclear We believe that there aretwo alternative but similar explanations for the origins of thestrong genetic linkage between these segments Mutation-driven coevolution (Presgraves 2010) and DobzhanskyndashMuller (DM) incompatibility (Dobzhansky 1937 Muller1942) Mutation-driven coevolution (Presgraves 2010) hasbeen suggested to be the cause of hybrid dysfunction inSaccharomyces hybrids (Lee et al 2008) and evolves as abyproduct of adaptation If one or the other influenza B lin-eage has undergone adaptation we might expect thesechanges to be beneficial in its native background and incom-patible with a foreign background DM incompatibility oper-ates in a similar way but the main difference from thescenario described earlier is that the incompatible alleles areneutral or nearly neutral in their native background andbecome deleterious or lethal when combined with nonnativebackgrounds Emergence of DM incompatibility is aided bygeographic isolation Interestingly the Vic lineage of HA wasrestricted to eastern Asia between 1992 and 2000 (Neromeet al 1998 Shaw et al 2002) offering ample time for thebudding Vic lineage to accumulate alleles causing reassort-ment incompatibility However without more genomic datafrom the past it is difficult to estimate to what extent influ-enza B virus population structure contributed to the devel-opment of the current segment linkage

Potential Mechanisms for ReassortmentIncompatibility

Unfortunately the limited amount of genomic data availablefor the early years of the VicndashYam split precludes any at-tempts of answering whether selection or drift have led tothe current linkage of PB1 PB2 and HA segments Althoughthe origins of the linkage between these three segments mightbe difficult to explain we can speculate on the nature ofreassortment incompatibility For example it is intuitive forwhy this might be the case for PB1 and PB2 Both proteinsinteract directly as part of the RNA-dependent RNA polymer-ase heterotrimer Indeed we observe that PB1ndashPB2 reassor-tants are the rarest and least persistent among mixed-lineagePB1ndashPB2ndashHA strains and have not been isolated in greatnumbers In fact most reassortants breaking the PB1ndashPB2ndashHA complex apart have occurred in the past close to the splitof Vic and Yam lineages and have become very rare since

There is some evidence that the linkage between PB1 andHA might not be a phenomenon restricted to influenza Bviruses It has been established that at least for the 1957and the 1968 influenza pandemics caused by AH2N2 andAH3N2 subtypes respectively the viruses responsible were

FIG 8 TMRCA statistics for different segment pairs PB1 PB2 and HAtrees exhibit reciprocally highly similar TMRCAs unlike most otherpairwise comparisons All vertical lines indicating uncertainty are 95highest posterior densities

7

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

reassortants possessing PB1 and HA segments derived fromavian influenza A viruses (Kawaoka et al 1989) In additionoutdated techniques for producing vaccine seed strainsthrough selection for HAndashNA reassortants often yieldedPB1ndashHAndashNA reassortants as a side-effect (Bergeron et al2010 Fulvini et al 2011) Recent experiments have foundthat the presence or absence of a ldquoforeignrdquo PB1 segmentcan have dramatic effects on HA concentration on the surfaceof virions and total virion production (Cobbin et al 2013)However there have been reassortant influenza A virusescirculating for prolonged periods of time in humans thatdid have disparate PB1 and HA segments for exampleH1N2 outbreaks in 2001 (Gregory et al 2002) and H1N109in 2009 (Smith et al 2009)

We believe that the association between PB1 PB2 and HAsegments should be relatively straightforward to explore inthe lab Reverse genetics systems have been developed forinfluenza B viruses (Hoffmann et al 2002) which wouldallow the creation of artificial reassortants Based on the fre-quency and persistence times of different reassortant classeswe have observed we expect a hierarchy of reassortant fitnessstarting with PB1+PB2+HA reassortants which should bethe most fit followed by PB1+2HA then PB1+HAPB2and finally PB2+HAPB1 reassortants with the lowest fitnessWe believe that this is the most direct approach to unravelingthe mechanism responsible for the linkage within the PB1ndashPB2ndashHA complex

Will Influenza B Viruses Speciate

We suggest that the preservation of two PB1ndashPB2ndashHA com-plex lineages is similar to genomic speciation islands wheresmall numbers of genes resist being homogenized throughgene flow (Turner et al 2005) In this context we see threepotential paths of evolution for influenza B viruses If moresegments get recruited to the PB1ndashPB2ndashHA complex theprocess could continue until ldquospeciationrdquo occurs in whichnone of the segments is able to reassort across the VicndashYam lineage boundary Alternatively the influenza Bgenome could continue to be homogenized through geneflow with the exception of PB1 PB2 and HA segments orone of the two PB1ndashPB2ndashHA complexes could go extinctmarking the return of single-strain dynamics in the influenzaB virus population The eventual fate of influenza B viruseswill likely be determined by the combined effects of reassort-ment frequency and the strength of epistatic interactionsbetween segments

Materials and MethodsWe compiled a primary data set of 452 complete influenza Bgenomes from GISAID (Bogner et al 2006) dating from 1984to 2012 The longest protein-coding region of each segmentwas extracted and used for all further analyses We thusassume that homologous recombination has not takenplace and that the evolutionary history of the whole segmentcan be inferred from the longest coding sequence in the seg-ment To date there has been little evidence of homologousrecombination in influenza viruses (Chare et al 2003

Boni et al 2008 Han et al 2010) The segments of eachstrain were assigned to either Vic or Yam lineage by makingmaximum-likelihood trees of each segment using PhyML(Guindon and Gascuel 2003) and identifying whether theisolate was more closely related to BVictoria287 orBYamagata1688 sequences in that segment with the ex-ception of the NS segment as BVictoria287 was a reassor-tant and possessed a Yam lineage NS (Lindstrom et al 1999)BCzechoslovakia691990 was considered as being represen-tative of Vic lineage for the NS segment Every segment ineach genome thus received either a Vic or a Yam lineagedesignation for example the strain BVictoria287 receivedVndashVndashVndashVndashVndashVndashVndashY as its NS segment is derived from theYam lineage and the rest of the genome is Vic

We also collated a secondary data set from all completeinfluenza B virus genomes available on GenBank as of May 52014 After removing isolates that had considerable portionsof any sequence missing were isolated prior to 1980 or weresuspected of having a contaminant sequence in any segmentwe were left with 1603 sequences This data set only becameavailable after all primary analyses were performed are mainlyfrom Australia New Zealand and the United States and aretoo numerous to analyze in BEAST (Drummond et al 2012)PhyML (Guindon and Gascuel 2003) was used to producephylogenies of each segment and the lineage of each isolatewas determined based on grouping with either BVictoria287 or BYamagata1688 sequences as described above Byassociating strains with lineage identity of each of their seg-ments we reconstructed the most parsimonious interlineagereassortment history for the secondary data set The second-ary data set was used to check how representative the pri-mary data set was to estimate LD and to broadly confirm ourresults All analyses pertain to the primary data set unlessstated otherwise

Temporally calibrated phylogenies were recovered for eachsegment in the primary data set using Markov chain MonteCarlo (MCMC) methods in the BEAST software package(Drummond et al 2012) We modeled the substitution pro-cess using the HasegawandashKishinondashYano model of nucleotidesubstitution (Hasegawa et al 1985) with separate transitionmodels for each of the three codon partitions and addition-ally estimated realized synonymous and nonsynonymoussubstitution counts (OrsquoBrien et al 2009) We used a flexibleBayesian skyride demographic model (Minin et al 2008) Weaccounted for incomplete sampling dates for 94 sequences(of which 93 had only year and 1 had only year and month ofisolation) whereby tip date is estimated as a latent variable inthe MCMC integration A relaxed molecular clock was usedwhere branch rates are drawn from a lognormal distribution(Drummond et al 2006) We ran three independent MCMCchains each with 200 million states sampled every 20000steps and discarded the first 10 of the MCMC states asburn-in After assessing convergence of all three MCMCchains by visual inspection using Tracer (Rambaut et al2009) we combined samples across chains to give a total of27000 samples from the posterior distribution of trees

Every sequence was assigned seven discrete traits in BEAUticorresponding to the lineages of all other segments with

8

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 3: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

genomes and conduct secondary analyses in a data set of1603 influenza B genomes

We show that despite extensive reassortment three of theeight segmentsmdashtwo segments coding for components ofthe influenza B virus polymerase PB1 and PB2 and the sur-face glycoprotein HAmdashstill survive as distinct Vic and Yamlineages which appear to be codependent to the point wherevirions which do not contain PB1 PB2 or HA segments de-rived entirely from either the Vic or the Yam lineage haverarely been isolated and only circulate as transient lineagesonce isolated In other segments (PA NP NA MP and NS) asingle lineage has introgressed into the opposing backgroundand replaced the previous lineage All currently circulatinginfluenza B viruses have PA NP NA and MP segmentsderived from Yam lineage and NS segments derived fromVic lineage

Results

Analysis of Reassortment Patterns across Vic andYam Lineages

The differentiation into Vic and Yam lineages can be seen inall segments (fig 2) and is followed by interlineage reassort-ment events In the phylogenetic trees of the PA NP NA MPand NS segments either the Vic or Yam lineage has becomethe ldquotrunkrdquo of the tree with present-day viruses deriving en-tirely from the Vic or Yam lineage (yellow vs purple bars infig 2) following reassortment However the Vic and Yamlineages of PB1 PB2 and HA segments continue to cocircu-late to this day Periodic loss of diversity in PA NP NA MP

and NS segments is consistent with introgression of one lin-eage into the other in those segments whereas maintenanceof parallel Vic and Yam lineages results in continually increas-ing diversity in segments PB1 PB2 and HA (fig 3) The PB1PB2 and HA segments from present-day viruses maintain acommon ancestor in approximately 1983 and thus accumu-late genetic diversity since the split of those segments into Vicand Yam lineages whereas other segments often lose diversitywith ancestors to present-day viruses appearing between ap-proximately 1991 and approximately 1999

By measuring mean pairwise diversity between branches ineach tree that were assigned either a Vic or Yam label in othersegments we look for reductions in between-lineage diversitywhich indicate that an interlineage reassortment event hastaken place (fig 4) This method gives a quantitative measureof reassortment-induced loss of diversity between Vic andYam lineages in two trees although care should be takenwhen interpreting the statistic as it does not correspond toany real time of most recent common ancestors (TMRCAs) inthe tree but can be interpreted as mean coalescence datebetween Vic and Yam lineages of PB1 PB2 and HA segmentsin all other trees We focus only on PB1 PB2 and HA lineagelabels as all other segments eventually become completelyderived from either the Vic or the Yam lineage Losses ofdiversity (represented by more recent mean pairwiseTMRCAs between Vic and Yam labels) in figure 4 indicatethat every segment has reassorted with respect to the Vic andYam lineages of PB1 PB2 and HA segments However we alsosee that the labels for these three segments show reciprocalpreservation of diversity after 1997 This suggests that after1997 no reassortment events have taken place between Vicand Yam lineages of PB1 PB2 and HA segments and theirlineage labels only ldquomeetrdquo at the root We do see reduceddiversity between Vic and Yam labels of PB1 PB2 and HAsegments in a time period close to the initial split of Vic andYam lineages (1986ndash1996) These reductions in diversity rep-resent small clades with reassortant PB1ndashPB2ndashHA constella-tions which go extinct by 1997 (see fig 2) We also observethat the assignment of these three segment labels to branchesof other segment trees is very similar and often identical after1997 This suggests that PB1 PB2 and HA lineage labelsswitch simultaneously in all trees after 1997

We show the ratio of Vic to Yam sequences in our primaryand secondary data sets in different influenza seasons infigure 5 which is based on which lineage each sequencewas assigned to (see Materials and Methods) It is evidentthat losses of diversity in the PA NP NA MP and NS seg-ments are related to either the Vic (NS) or the Yam (PA NPNA and MP) lineage replacing the other lineage in the influ-enza B virus population Similarly the lack of reassortmentbetween Vic and Yam lineages and maintenance of diversityof PB1 PB2 and HA can be seen where the two lineages havebeen sequenced at a ratio close to 50 over long periods oftime (fig 5) On a year-to-year basis however the ratios forVic and Yam sequences PB1 PB2 and HA can fluctuatedramatically consistent with one lineage predominatingwithin a given season in agreement with surveillance data(Reed et al 2012)

FIG 1 Schematic analysis of reassortment patterns (A) We begin byassigning sequences falling on either side of a specified bifurcationwithin each segment tree to different lineages in this case the Vicand Yam bifurcation that occurred in the early 1980s (B) We thentransfer lineage labels from one tree to the same tips in another treeTransitions between labels along this second tree thus indicate reassort-ment events that combine lineages falling on different sides of theVicYam bifurcation in the first tree (C) A reassortment graph depictionshows that tip number 6 is determined to be a reassortant based on (B)

2

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

FIG 2 Maximum clade credibility trees of all eight genome segments of influenza B viruses isolated since 1980 Trees are colored based on inferred PB1ndashPB2ndashHA lineage Vertical bars indicate the original Vic and Yam lineages within each segment Each tree is the summarized output of a single analysiscomprised of 9000 trees sampled from the posterior distribution of trees

FIG 3 Oldest TMRCA of all surviving branches over time PA NP NA MP and NS segments of influenza B viruses show periodic increases in TMRCAsof all surviving branches (indicative of diversity loss) suggesting lineage turnover PB1 PB2 and HA segments on the other hand maintain the diversitydating back to the initial split of Vic and Yam lineages Each point is the mean TMRCA of all surviving lineages existing at each time slice through thetree and vertical lines indicating uncertainty are 95 highest posterior densities

3

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

We reconstructed reassortment events that were detectedby using lineage labels Figure 6 focuses only on interlineagereassortments that have occurred after 1990 We identify fivemajor (in terms of persistence) reassortant genome constel-lations (given in order PB1ndashPB2ndashPAndashHAndashNPndashNAndashMPndashNS

with prime [0] indicating independently acquired segments)circulating between 1992 and 2011 (fig 6)

BAlaska121996-like (YndashYndashYndashYndashYndashYndashYndashV) BNanchang21997-like (VndashVndashYndashVndashYndashVndashYndashV)

FIG 4 Mean pairwise TMRCA between Vic and Yam branches under PB1 PB2 and HA label sets PB1 PB2 and HA segment labels indicate that thesesegments show reciprocal preservation of diversity which dates back to the split of Vic and Yam lineages All other segments show increasingly morerecent TMRCAs between branches labeled as Vic and Yam in PB1 PB2 and HA label sets All vertical lines indicating uncertainty are 95 highestposterior densities

4

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

BIowa032002-like (VndashVndashY0ndashVndashYndashYndashY0ndashV0) BCaliforniaNHRC00012006-like (VndashVndashYndashVndashY0ndashYndash

Y0ndashV0) BBrisbane332008-like (VndashVndashYndashVndashY0ndashYndashYndashV)

In a previous study BAlaska121996-like BNanchang21997-like and BIowa032002-like constellations were ob-served (Chen and Holmes 2008) but sequences from BCaliforniaNHRC00012006-like and BBrisbane332008-likeconstellations were not available at the time In their studyChen and Holmes (2008) also recovered the coassortmentpattern of PB1 PB2 and HA lineages but did not remarkupon it Of these five constellations four (BNanchang21997-like BIowa032002-like BCaliforniaNHRC00012006-like and BBrisbane332008-like) are derived from in-trogression of Yam lineage segments into Vic lineage PB1ndashPB2ndashHA background with only one (BAlaska121996-like)resulting from introgression of Vic lineage NS segment intoan entirely Yam-derived background All five interlineagereassortment events described here are marked by the

preservation of either entirely Vic- or Yam-derived PB1ndashPB2ndashHA segments Figure 6 also shows that reassorting seg-ments appear to evolve with a considerable degree of auton-omy For example the NP lineage that entered a largely Viclineage-derived genome and gave rise to the BNanchang21997-like isolates continued circulating until 2010 eventhough the other segments it coassorted with in 1995ndash1996(PA and MP) went extinct following the next round of reas-sortment that led to the rise of BIowa032002-like genomeconstellations A more extreme example is the NS segmentwhere a Vic sublineage was reassorted into an entirely Yambackground (BAlaska121996-like) in 1994ndash1995 then reas-sorted back into a mostly Vic background some 5 years later(BIowa032002-like) where it has replaced the ldquooriginalrdquo Vicsublineage (see fig 6)

We observe that in all five successful interlineage reassort-ment events shown in figure 6 none break up the PB1ndashPB2ndashHA complex This is an unlikely outcomemdashthe probability ofnot breaking up PB1ndashPB2ndashHA across five reassortment events

FIG 5 Ratio of Vic and Yam sequences in the data set The ratio of Vic (yellow) to Yam (purple) sequences in each segment from the primary data setover time Black lines indicate where this ratio lies in the larger secondary data set Numbers at the top of the figure show the total number of genomesavailable for each influenza season in the primary data set comprised of 452 genomes from which the ratio was calculated whereas the numbers inbrackets correspond to numbers of sequences in the larger secondary genomes data set Numbers at the bottom are influenza seasons from the 19871988 (8788) season to the 20112012 season Yam lineage of PA NP NA and MP segments and Vic lineage of the NS segment eventually become fixed(in the population genetics sense of the word) in the influenza B population PB1 PB2 and HA segments maintain separate Vic and Yam lineages

5

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is P frac14 eth2522282 THORN

5frac14 00009 where reassortment events are

considered to sample from the Vic and Yam lineages atrandom for each of the eight segments If we correct for mul-tiple testing with the assumption that coassortment of anythree segments is of interest we find that the probability ofnot breaking up an arbitrary set of three segments across five

reassortment events is P frac148

3

eth2

522282 THORN

5frac14 00485

Although the vast majority of influenza B isolates possesseither Vic or Yam lineage-derived PB1ndashPB2ndashHA complexeson rare occasions mixed-lineage PB1ndashPB2ndashHA constellationsemerge Figure 7 shows the sum of branch lengths which werelabeled as having entirely Vic entirely Yam or mixed-lineagePB1 PB2 and HA segments Due to lack of reassortmentbetween Vic and Yam lineages of PB1 PB2 and HA (fig 4)since 1997 all segments have spent significantly longer periodsof evolutionary time with either entirely Vic-derived or en-tirely Yam-derived than with mixed-lineage PB1 PB2 and HAconstellations (fig 7) We have identified three instancesof mixed-lineage PB1ndashPB2ndashHA reassortants from the primarydata set with the following PB1ndashPB2ndashHA constellations VVY(BBangkok1631990-like 13 sequences isolated 1990ndashJanuary 5 1995) VYV (BNanchang6301994-like two se-quences isolated 1994ndash1996) and VYY (BNew York241993-like two sequences isolated January 8 1993ndash1994)We detected two new reassortant lineages when investigatingthe larger secondary data setmdashBWaikato62005-like viruseswith PB1ndashPB2ndashHA constellation YYV (17 sequences isolated

May 9ndashOctober 12 in 2005) and BMalaysia18297822007with PB1ndashPB2ndashHA constellation YVY (one sequence isolatedAugust 2 2007)

Analysis of Reassortment Properties

We attempted to quantify the temporal discordance betweenlineages reassorting into new genomic constellations If one

FIG 6 Schematic plot of reconstructed reassortments between Vic and Yam lineage segments of influenza B virus Lineages that coassort in genomesare represented by eight parallel lines with lineages that derive from the original Vic clade colored yellowbrown and lineages that derive from theoriginal Yam clade colored lilacpurple Interlineage reassortment events are indicated by lines entering a different genome The angle of incominglineages represents uncertainty in the timing of the event (mean date of the reassortant node and its parent node) Lineage extinction dates are notshown accurately

FIG 7 Amount of evolutionary time each segment has spent underdifferent PB1ndashPB2ndashHA constellations All segments have spent signifi-cantly more of their history with entirely Vic- or entirely Yam-derivedPB1ndashPB2ndashHA complexes All vertical lines indicating uncertainty are95 highest posterior densities

6

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were able to recover an influenza ldquospecies treerdquo includingadmixturereassortment events it would be possible to esti-mate the reassortment or recombination ldquodistancerdquo whichis the time between a split in the species tree in the pastand a reassortment event (see supplementary fig S17Supplementary Material online) Although we do not findevidence of differences in total number of reassortments be-tween segments (see supplementary fig S4 SupplementaryMaterial online) we find support for a reassortment ldquodis-tancerdquo effect in which a pair of tips on one segment has adifferent TMRCA from the same pair of tips on a differentsegment The summary statistic we use that reflects this dif-ference in TMRCAs TMRCA is most sensitive when only oneof the two trees being compared loses diversity through reas-sortment and the other acts like a proxy for the ldquospecies treerdquoWe normalize our TMRCA comparisons to arrive at TMRCAwhich accounts for uncertainty in tree topology (seeMaterials and Methods) Figure 8 shows TMRCA values forall pairs of trees Most segment pairs show very low values forthis statistic with TMRCAamp01 indicating that TMRCA mea-surements between replicate posterior samples from thesame segment are up to ten times smaller than TMRCA

values between posterior samples from different segmentsPB1 PB2 and HA trees on the other hand exhibit TMRCA

values that are much higher This shows that TMRCA differ-ences between trees of PB1 PB2 and HA segments arethough noisy occasionally very similar to uncertainty in tip-to-tip TMRCAs between replicate analyses of these segments

Discussion

Linkage between PB1 PB2 and HA Gene Segments

In this article we show that the PB1 PB2 and HA segments ofinfluenza B viruses are the only ones that have continuouslymaintained separate Vic and Yam lineages whereas othersegments have fixed either Vic or Yam lineages (figs 2 5and 6) Evidence suggests that this is a result of prolongedlack of reassortment between Vic and Yam lineages in PB1PB2 and HA (fig 4) which possess coassorting sequencesdetectable as high linkage disequilibrium (LD) (supplemen-tary fig S1 Supplementary Material online) The vast majority

of the sampled evolutionary history of each segment of influ-enza B viruses since the split of Vic and Yam lineages has beenspent in association with either completely Vic or completelyYam lineage-derived PB1ndashPB2ndashHA complexes (fig 7) sug-gesting that having ldquopurerdquo lineage PB1ndashPB2ndashHA complexesis important for whole-genome fitness We propose that thispattern of coassortment is due to the action of selection andnot simply biased or rare reassortment

The origin of the strong genetic linkage between PB1 PB2and HA segments remains unclear We believe that there aretwo alternative but similar explanations for the origins of thestrong genetic linkage between these segments Mutation-driven coevolution (Presgraves 2010) and DobzhanskyndashMuller (DM) incompatibility (Dobzhansky 1937 Muller1942) Mutation-driven coevolution (Presgraves 2010) hasbeen suggested to be the cause of hybrid dysfunction inSaccharomyces hybrids (Lee et al 2008) and evolves as abyproduct of adaptation If one or the other influenza B lin-eage has undergone adaptation we might expect thesechanges to be beneficial in its native background and incom-patible with a foreign background DM incompatibility oper-ates in a similar way but the main difference from thescenario described earlier is that the incompatible alleles areneutral or nearly neutral in their native background andbecome deleterious or lethal when combined with nonnativebackgrounds Emergence of DM incompatibility is aided bygeographic isolation Interestingly the Vic lineage of HA wasrestricted to eastern Asia between 1992 and 2000 (Neromeet al 1998 Shaw et al 2002) offering ample time for thebudding Vic lineage to accumulate alleles causing reassort-ment incompatibility However without more genomic datafrom the past it is difficult to estimate to what extent influ-enza B virus population structure contributed to the devel-opment of the current segment linkage

Potential Mechanisms for ReassortmentIncompatibility

Unfortunately the limited amount of genomic data availablefor the early years of the VicndashYam split precludes any at-tempts of answering whether selection or drift have led tothe current linkage of PB1 PB2 and HA segments Althoughthe origins of the linkage between these three segments mightbe difficult to explain we can speculate on the nature ofreassortment incompatibility For example it is intuitive forwhy this might be the case for PB1 and PB2 Both proteinsinteract directly as part of the RNA-dependent RNA polymer-ase heterotrimer Indeed we observe that PB1ndashPB2 reassor-tants are the rarest and least persistent among mixed-lineagePB1ndashPB2ndashHA strains and have not been isolated in greatnumbers In fact most reassortants breaking the PB1ndashPB2ndashHA complex apart have occurred in the past close to the splitof Vic and Yam lineages and have become very rare since

There is some evidence that the linkage between PB1 andHA might not be a phenomenon restricted to influenza Bviruses It has been established that at least for the 1957and the 1968 influenza pandemics caused by AH2N2 andAH3N2 subtypes respectively the viruses responsible were

FIG 8 TMRCA statistics for different segment pairs PB1 PB2 and HAtrees exhibit reciprocally highly similar TMRCAs unlike most otherpairwise comparisons All vertical lines indicating uncertainty are 95highest posterior densities

7

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

reassortants possessing PB1 and HA segments derived fromavian influenza A viruses (Kawaoka et al 1989) In additionoutdated techniques for producing vaccine seed strainsthrough selection for HAndashNA reassortants often yieldedPB1ndashHAndashNA reassortants as a side-effect (Bergeron et al2010 Fulvini et al 2011) Recent experiments have foundthat the presence or absence of a ldquoforeignrdquo PB1 segmentcan have dramatic effects on HA concentration on the surfaceof virions and total virion production (Cobbin et al 2013)However there have been reassortant influenza A virusescirculating for prolonged periods of time in humans thatdid have disparate PB1 and HA segments for exampleH1N2 outbreaks in 2001 (Gregory et al 2002) and H1N109in 2009 (Smith et al 2009)

We believe that the association between PB1 PB2 and HAsegments should be relatively straightforward to explore inthe lab Reverse genetics systems have been developed forinfluenza B viruses (Hoffmann et al 2002) which wouldallow the creation of artificial reassortants Based on the fre-quency and persistence times of different reassortant classeswe have observed we expect a hierarchy of reassortant fitnessstarting with PB1+PB2+HA reassortants which should bethe most fit followed by PB1+2HA then PB1+HAPB2and finally PB2+HAPB1 reassortants with the lowest fitnessWe believe that this is the most direct approach to unravelingthe mechanism responsible for the linkage within the PB1ndashPB2ndashHA complex

Will Influenza B Viruses Speciate

We suggest that the preservation of two PB1ndashPB2ndashHA com-plex lineages is similar to genomic speciation islands wheresmall numbers of genes resist being homogenized throughgene flow (Turner et al 2005) In this context we see threepotential paths of evolution for influenza B viruses If moresegments get recruited to the PB1ndashPB2ndashHA complex theprocess could continue until ldquospeciationrdquo occurs in whichnone of the segments is able to reassort across the VicndashYam lineage boundary Alternatively the influenza Bgenome could continue to be homogenized through geneflow with the exception of PB1 PB2 and HA segments orone of the two PB1ndashPB2ndashHA complexes could go extinctmarking the return of single-strain dynamics in the influenzaB virus population The eventual fate of influenza B viruseswill likely be determined by the combined effects of reassort-ment frequency and the strength of epistatic interactionsbetween segments

Materials and MethodsWe compiled a primary data set of 452 complete influenza Bgenomes from GISAID (Bogner et al 2006) dating from 1984to 2012 The longest protein-coding region of each segmentwas extracted and used for all further analyses We thusassume that homologous recombination has not takenplace and that the evolutionary history of the whole segmentcan be inferred from the longest coding sequence in the seg-ment To date there has been little evidence of homologousrecombination in influenza viruses (Chare et al 2003

Boni et al 2008 Han et al 2010) The segments of eachstrain were assigned to either Vic or Yam lineage by makingmaximum-likelihood trees of each segment using PhyML(Guindon and Gascuel 2003) and identifying whether theisolate was more closely related to BVictoria287 orBYamagata1688 sequences in that segment with the ex-ception of the NS segment as BVictoria287 was a reassor-tant and possessed a Yam lineage NS (Lindstrom et al 1999)BCzechoslovakia691990 was considered as being represen-tative of Vic lineage for the NS segment Every segment ineach genome thus received either a Vic or a Yam lineagedesignation for example the strain BVictoria287 receivedVndashVndashVndashVndashVndashVndashVndashY as its NS segment is derived from theYam lineage and the rest of the genome is Vic

We also collated a secondary data set from all completeinfluenza B virus genomes available on GenBank as of May 52014 After removing isolates that had considerable portionsof any sequence missing were isolated prior to 1980 or weresuspected of having a contaminant sequence in any segmentwe were left with 1603 sequences This data set only becameavailable after all primary analyses were performed are mainlyfrom Australia New Zealand and the United States and aretoo numerous to analyze in BEAST (Drummond et al 2012)PhyML (Guindon and Gascuel 2003) was used to producephylogenies of each segment and the lineage of each isolatewas determined based on grouping with either BVictoria287 or BYamagata1688 sequences as described above Byassociating strains with lineage identity of each of their seg-ments we reconstructed the most parsimonious interlineagereassortment history for the secondary data set The second-ary data set was used to check how representative the pri-mary data set was to estimate LD and to broadly confirm ourresults All analyses pertain to the primary data set unlessstated otherwise

Temporally calibrated phylogenies were recovered for eachsegment in the primary data set using Markov chain MonteCarlo (MCMC) methods in the BEAST software package(Drummond et al 2012) We modeled the substitution pro-cess using the HasegawandashKishinondashYano model of nucleotidesubstitution (Hasegawa et al 1985) with separate transitionmodels for each of the three codon partitions and addition-ally estimated realized synonymous and nonsynonymoussubstitution counts (OrsquoBrien et al 2009) We used a flexibleBayesian skyride demographic model (Minin et al 2008) Weaccounted for incomplete sampling dates for 94 sequences(of which 93 had only year and 1 had only year and month ofisolation) whereby tip date is estimated as a latent variable inthe MCMC integration A relaxed molecular clock was usedwhere branch rates are drawn from a lognormal distribution(Drummond et al 2006) We ran three independent MCMCchains each with 200 million states sampled every 20000steps and discarded the first 10 of the MCMC states asburn-in After assessing convergence of all three MCMCchains by visual inspection using Tracer (Rambaut et al2009) we combined samples across chains to give a total of27000 samples from the posterior distribution of trees

Every sequence was assigned seven discrete traits in BEAUticorresponding to the lineages of all other segments with

8

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 4: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

FIG 2 Maximum clade credibility trees of all eight genome segments of influenza B viruses isolated since 1980 Trees are colored based on inferred PB1ndashPB2ndashHA lineage Vertical bars indicate the original Vic and Yam lineages within each segment Each tree is the summarized output of a single analysiscomprised of 9000 trees sampled from the posterior distribution of trees

FIG 3 Oldest TMRCA of all surviving branches over time PA NP NA MP and NS segments of influenza B viruses show periodic increases in TMRCAsof all surviving branches (indicative of diversity loss) suggesting lineage turnover PB1 PB2 and HA segments on the other hand maintain the diversitydating back to the initial split of Vic and Yam lineages Each point is the mean TMRCA of all surviving lineages existing at each time slice through thetree and vertical lines indicating uncertainty are 95 highest posterior densities

3

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

We reconstructed reassortment events that were detectedby using lineage labels Figure 6 focuses only on interlineagereassortments that have occurred after 1990 We identify fivemajor (in terms of persistence) reassortant genome constel-lations (given in order PB1ndashPB2ndashPAndashHAndashNPndashNAndashMPndashNS

with prime [0] indicating independently acquired segments)circulating between 1992 and 2011 (fig 6)

BAlaska121996-like (YndashYndashYndashYndashYndashYndashYndashV) BNanchang21997-like (VndashVndashYndashVndashYndashVndashYndashV)

FIG 4 Mean pairwise TMRCA between Vic and Yam branches under PB1 PB2 and HA label sets PB1 PB2 and HA segment labels indicate that thesesegments show reciprocal preservation of diversity which dates back to the split of Vic and Yam lineages All other segments show increasingly morerecent TMRCAs between branches labeled as Vic and Yam in PB1 PB2 and HA label sets All vertical lines indicating uncertainty are 95 highestposterior densities

4

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

BIowa032002-like (VndashVndashY0ndashVndashYndashYndashY0ndashV0) BCaliforniaNHRC00012006-like (VndashVndashYndashVndashY0ndashYndash

Y0ndashV0) BBrisbane332008-like (VndashVndashYndashVndashY0ndashYndashYndashV)

In a previous study BAlaska121996-like BNanchang21997-like and BIowa032002-like constellations were ob-served (Chen and Holmes 2008) but sequences from BCaliforniaNHRC00012006-like and BBrisbane332008-likeconstellations were not available at the time In their studyChen and Holmes (2008) also recovered the coassortmentpattern of PB1 PB2 and HA lineages but did not remarkupon it Of these five constellations four (BNanchang21997-like BIowa032002-like BCaliforniaNHRC00012006-like and BBrisbane332008-like) are derived from in-trogression of Yam lineage segments into Vic lineage PB1ndashPB2ndashHA background with only one (BAlaska121996-like)resulting from introgression of Vic lineage NS segment intoan entirely Yam-derived background All five interlineagereassortment events described here are marked by the

preservation of either entirely Vic- or Yam-derived PB1ndashPB2ndashHA segments Figure 6 also shows that reassorting seg-ments appear to evolve with a considerable degree of auton-omy For example the NP lineage that entered a largely Viclineage-derived genome and gave rise to the BNanchang21997-like isolates continued circulating until 2010 eventhough the other segments it coassorted with in 1995ndash1996(PA and MP) went extinct following the next round of reas-sortment that led to the rise of BIowa032002-like genomeconstellations A more extreme example is the NS segmentwhere a Vic sublineage was reassorted into an entirely Yambackground (BAlaska121996-like) in 1994ndash1995 then reas-sorted back into a mostly Vic background some 5 years later(BIowa032002-like) where it has replaced the ldquooriginalrdquo Vicsublineage (see fig 6)

We observe that in all five successful interlineage reassort-ment events shown in figure 6 none break up the PB1ndashPB2ndashHA complex This is an unlikely outcomemdashthe probability ofnot breaking up PB1ndashPB2ndashHA across five reassortment events

FIG 5 Ratio of Vic and Yam sequences in the data set The ratio of Vic (yellow) to Yam (purple) sequences in each segment from the primary data setover time Black lines indicate where this ratio lies in the larger secondary data set Numbers at the top of the figure show the total number of genomesavailable for each influenza season in the primary data set comprised of 452 genomes from which the ratio was calculated whereas the numbers inbrackets correspond to numbers of sequences in the larger secondary genomes data set Numbers at the bottom are influenza seasons from the 19871988 (8788) season to the 20112012 season Yam lineage of PA NP NA and MP segments and Vic lineage of the NS segment eventually become fixed(in the population genetics sense of the word) in the influenza B population PB1 PB2 and HA segments maintain separate Vic and Yam lineages

5

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is P frac14 eth2522282 THORN

5frac14 00009 where reassortment events are

considered to sample from the Vic and Yam lineages atrandom for each of the eight segments If we correct for mul-tiple testing with the assumption that coassortment of anythree segments is of interest we find that the probability ofnot breaking up an arbitrary set of three segments across five

reassortment events is P frac148

3

eth2

522282 THORN

5frac14 00485

Although the vast majority of influenza B isolates possesseither Vic or Yam lineage-derived PB1ndashPB2ndashHA complexeson rare occasions mixed-lineage PB1ndashPB2ndashHA constellationsemerge Figure 7 shows the sum of branch lengths which werelabeled as having entirely Vic entirely Yam or mixed-lineagePB1 PB2 and HA segments Due to lack of reassortmentbetween Vic and Yam lineages of PB1 PB2 and HA (fig 4)since 1997 all segments have spent significantly longer periodsof evolutionary time with either entirely Vic-derived or en-tirely Yam-derived than with mixed-lineage PB1 PB2 and HAconstellations (fig 7) We have identified three instancesof mixed-lineage PB1ndashPB2ndashHA reassortants from the primarydata set with the following PB1ndashPB2ndashHA constellations VVY(BBangkok1631990-like 13 sequences isolated 1990ndashJanuary 5 1995) VYV (BNanchang6301994-like two se-quences isolated 1994ndash1996) and VYY (BNew York241993-like two sequences isolated January 8 1993ndash1994)We detected two new reassortant lineages when investigatingthe larger secondary data setmdashBWaikato62005-like viruseswith PB1ndashPB2ndashHA constellation YYV (17 sequences isolated

May 9ndashOctober 12 in 2005) and BMalaysia18297822007with PB1ndashPB2ndashHA constellation YVY (one sequence isolatedAugust 2 2007)

Analysis of Reassortment Properties

We attempted to quantify the temporal discordance betweenlineages reassorting into new genomic constellations If one

FIG 6 Schematic plot of reconstructed reassortments between Vic and Yam lineage segments of influenza B virus Lineages that coassort in genomesare represented by eight parallel lines with lineages that derive from the original Vic clade colored yellowbrown and lineages that derive from theoriginal Yam clade colored lilacpurple Interlineage reassortment events are indicated by lines entering a different genome The angle of incominglineages represents uncertainty in the timing of the event (mean date of the reassortant node and its parent node) Lineage extinction dates are notshown accurately

FIG 7 Amount of evolutionary time each segment has spent underdifferent PB1ndashPB2ndashHA constellations All segments have spent signifi-cantly more of their history with entirely Vic- or entirely Yam-derivedPB1ndashPB2ndashHA complexes All vertical lines indicating uncertainty are95 highest posterior densities

6

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were able to recover an influenza ldquospecies treerdquo includingadmixturereassortment events it would be possible to esti-mate the reassortment or recombination ldquodistancerdquo whichis the time between a split in the species tree in the pastand a reassortment event (see supplementary fig S17Supplementary Material online) Although we do not findevidence of differences in total number of reassortments be-tween segments (see supplementary fig S4 SupplementaryMaterial online) we find support for a reassortment ldquodis-tancerdquo effect in which a pair of tips on one segment has adifferent TMRCA from the same pair of tips on a differentsegment The summary statistic we use that reflects this dif-ference in TMRCAs TMRCA is most sensitive when only oneof the two trees being compared loses diversity through reas-sortment and the other acts like a proxy for the ldquospecies treerdquoWe normalize our TMRCA comparisons to arrive at TMRCAwhich accounts for uncertainty in tree topology (seeMaterials and Methods) Figure 8 shows TMRCA values forall pairs of trees Most segment pairs show very low values forthis statistic with TMRCAamp01 indicating that TMRCA mea-surements between replicate posterior samples from thesame segment are up to ten times smaller than TMRCA

values between posterior samples from different segmentsPB1 PB2 and HA trees on the other hand exhibit TMRCA

values that are much higher This shows that TMRCA differ-ences between trees of PB1 PB2 and HA segments arethough noisy occasionally very similar to uncertainty in tip-to-tip TMRCAs between replicate analyses of these segments

Discussion

Linkage between PB1 PB2 and HA Gene Segments

In this article we show that the PB1 PB2 and HA segments ofinfluenza B viruses are the only ones that have continuouslymaintained separate Vic and Yam lineages whereas othersegments have fixed either Vic or Yam lineages (figs 2 5and 6) Evidence suggests that this is a result of prolongedlack of reassortment between Vic and Yam lineages in PB1PB2 and HA (fig 4) which possess coassorting sequencesdetectable as high linkage disequilibrium (LD) (supplemen-tary fig S1 Supplementary Material online) The vast majority

of the sampled evolutionary history of each segment of influ-enza B viruses since the split of Vic and Yam lineages has beenspent in association with either completely Vic or completelyYam lineage-derived PB1ndashPB2ndashHA complexes (fig 7) sug-gesting that having ldquopurerdquo lineage PB1ndashPB2ndashHA complexesis important for whole-genome fitness We propose that thispattern of coassortment is due to the action of selection andnot simply biased or rare reassortment

The origin of the strong genetic linkage between PB1 PB2and HA segments remains unclear We believe that there aretwo alternative but similar explanations for the origins of thestrong genetic linkage between these segments Mutation-driven coevolution (Presgraves 2010) and DobzhanskyndashMuller (DM) incompatibility (Dobzhansky 1937 Muller1942) Mutation-driven coevolution (Presgraves 2010) hasbeen suggested to be the cause of hybrid dysfunction inSaccharomyces hybrids (Lee et al 2008) and evolves as abyproduct of adaptation If one or the other influenza B lin-eage has undergone adaptation we might expect thesechanges to be beneficial in its native background and incom-patible with a foreign background DM incompatibility oper-ates in a similar way but the main difference from thescenario described earlier is that the incompatible alleles areneutral or nearly neutral in their native background andbecome deleterious or lethal when combined with nonnativebackgrounds Emergence of DM incompatibility is aided bygeographic isolation Interestingly the Vic lineage of HA wasrestricted to eastern Asia between 1992 and 2000 (Neromeet al 1998 Shaw et al 2002) offering ample time for thebudding Vic lineage to accumulate alleles causing reassort-ment incompatibility However without more genomic datafrom the past it is difficult to estimate to what extent influ-enza B virus population structure contributed to the devel-opment of the current segment linkage

Potential Mechanisms for ReassortmentIncompatibility

Unfortunately the limited amount of genomic data availablefor the early years of the VicndashYam split precludes any at-tempts of answering whether selection or drift have led tothe current linkage of PB1 PB2 and HA segments Althoughthe origins of the linkage between these three segments mightbe difficult to explain we can speculate on the nature ofreassortment incompatibility For example it is intuitive forwhy this might be the case for PB1 and PB2 Both proteinsinteract directly as part of the RNA-dependent RNA polymer-ase heterotrimer Indeed we observe that PB1ndashPB2 reassor-tants are the rarest and least persistent among mixed-lineagePB1ndashPB2ndashHA strains and have not been isolated in greatnumbers In fact most reassortants breaking the PB1ndashPB2ndashHA complex apart have occurred in the past close to the splitof Vic and Yam lineages and have become very rare since

There is some evidence that the linkage between PB1 andHA might not be a phenomenon restricted to influenza Bviruses It has been established that at least for the 1957and the 1968 influenza pandemics caused by AH2N2 andAH3N2 subtypes respectively the viruses responsible were

FIG 8 TMRCA statistics for different segment pairs PB1 PB2 and HAtrees exhibit reciprocally highly similar TMRCAs unlike most otherpairwise comparisons All vertical lines indicating uncertainty are 95highest posterior densities

7

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

reassortants possessing PB1 and HA segments derived fromavian influenza A viruses (Kawaoka et al 1989) In additionoutdated techniques for producing vaccine seed strainsthrough selection for HAndashNA reassortants often yieldedPB1ndashHAndashNA reassortants as a side-effect (Bergeron et al2010 Fulvini et al 2011) Recent experiments have foundthat the presence or absence of a ldquoforeignrdquo PB1 segmentcan have dramatic effects on HA concentration on the surfaceof virions and total virion production (Cobbin et al 2013)However there have been reassortant influenza A virusescirculating for prolonged periods of time in humans thatdid have disparate PB1 and HA segments for exampleH1N2 outbreaks in 2001 (Gregory et al 2002) and H1N109in 2009 (Smith et al 2009)

We believe that the association between PB1 PB2 and HAsegments should be relatively straightforward to explore inthe lab Reverse genetics systems have been developed forinfluenza B viruses (Hoffmann et al 2002) which wouldallow the creation of artificial reassortants Based on the fre-quency and persistence times of different reassortant classeswe have observed we expect a hierarchy of reassortant fitnessstarting with PB1+PB2+HA reassortants which should bethe most fit followed by PB1+2HA then PB1+HAPB2and finally PB2+HAPB1 reassortants with the lowest fitnessWe believe that this is the most direct approach to unravelingthe mechanism responsible for the linkage within the PB1ndashPB2ndashHA complex

Will Influenza B Viruses Speciate

We suggest that the preservation of two PB1ndashPB2ndashHA com-plex lineages is similar to genomic speciation islands wheresmall numbers of genes resist being homogenized throughgene flow (Turner et al 2005) In this context we see threepotential paths of evolution for influenza B viruses If moresegments get recruited to the PB1ndashPB2ndashHA complex theprocess could continue until ldquospeciationrdquo occurs in whichnone of the segments is able to reassort across the VicndashYam lineage boundary Alternatively the influenza Bgenome could continue to be homogenized through geneflow with the exception of PB1 PB2 and HA segments orone of the two PB1ndashPB2ndashHA complexes could go extinctmarking the return of single-strain dynamics in the influenzaB virus population The eventual fate of influenza B viruseswill likely be determined by the combined effects of reassort-ment frequency and the strength of epistatic interactionsbetween segments

Materials and MethodsWe compiled a primary data set of 452 complete influenza Bgenomes from GISAID (Bogner et al 2006) dating from 1984to 2012 The longest protein-coding region of each segmentwas extracted and used for all further analyses We thusassume that homologous recombination has not takenplace and that the evolutionary history of the whole segmentcan be inferred from the longest coding sequence in the seg-ment To date there has been little evidence of homologousrecombination in influenza viruses (Chare et al 2003

Boni et al 2008 Han et al 2010) The segments of eachstrain were assigned to either Vic or Yam lineage by makingmaximum-likelihood trees of each segment using PhyML(Guindon and Gascuel 2003) and identifying whether theisolate was more closely related to BVictoria287 orBYamagata1688 sequences in that segment with the ex-ception of the NS segment as BVictoria287 was a reassor-tant and possessed a Yam lineage NS (Lindstrom et al 1999)BCzechoslovakia691990 was considered as being represen-tative of Vic lineage for the NS segment Every segment ineach genome thus received either a Vic or a Yam lineagedesignation for example the strain BVictoria287 receivedVndashVndashVndashVndashVndashVndashVndashY as its NS segment is derived from theYam lineage and the rest of the genome is Vic

We also collated a secondary data set from all completeinfluenza B virus genomes available on GenBank as of May 52014 After removing isolates that had considerable portionsof any sequence missing were isolated prior to 1980 or weresuspected of having a contaminant sequence in any segmentwe were left with 1603 sequences This data set only becameavailable after all primary analyses were performed are mainlyfrom Australia New Zealand and the United States and aretoo numerous to analyze in BEAST (Drummond et al 2012)PhyML (Guindon and Gascuel 2003) was used to producephylogenies of each segment and the lineage of each isolatewas determined based on grouping with either BVictoria287 or BYamagata1688 sequences as described above Byassociating strains with lineage identity of each of their seg-ments we reconstructed the most parsimonious interlineagereassortment history for the secondary data set The second-ary data set was used to check how representative the pri-mary data set was to estimate LD and to broadly confirm ourresults All analyses pertain to the primary data set unlessstated otherwise

Temporally calibrated phylogenies were recovered for eachsegment in the primary data set using Markov chain MonteCarlo (MCMC) methods in the BEAST software package(Drummond et al 2012) We modeled the substitution pro-cess using the HasegawandashKishinondashYano model of nucleotidesubstitution (Hasegawa et al 1985) with separate transitionmodels for each of the three codon partitions and addition-ally estimated realized synonymous and nonsynonymoussubstitution counts (OrsquoBrien et al 2009) We used a flexibleBayesian skyride demographic model (Minin et al 2008) Weaccounted for incomplete sampling dates for 94 sequences(of which 93 had only year and 1 had only year and month ofisolation) whereby tip date is estimated as a latent variable inthe MCMC integration A relaxed molecular clock was usedwhere branch rates are drawn from a lognormal distribution(Drummond et al 2006) We ran three independent MCMCchains each with 200 million states sampled every 20000steps and discarded the first 10 of the MCMC states asburn-in After assessing convergence of all three MCMCchains by visual inspection using Tracer (Rambaut et al2009) we combined samples across chains to give a total of27000 samples from the posterior distribution of trees

Every sequence was assigned seven discrete traits in BEAUticorresponding to the lineages of all other segments with

8

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 5: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

We reconstructed reassortment events that were detectedby using lineage labels Figure 6 focuses only on interlineagereassortments that have occurred after 1990 We identify fivemajor (in terms of persistence) reassortant genome constel-lations (given in order PB1ndashPB2ndashPAndashHAndashNPndashNAndashMPndashNS

with prime [0] indicating independently acquired segments)circulating between 1992 and 2011 (fig 6)

BAlaska121996-like (YndashYndashYndashYndashYndashYndashYndashV) BNanchang21997-like (VndashVndashYndashVndashYndashVndashYndashV)

FIG 4 Mean pairwise TMRCA between Vic and Yam branches under PB1 PB2 and HA label sets PB1 PB2 and HA segment labels indicate that thesesegments show reciprocal preservation of diversity which dates back to the split of Vic and Yam lineages All other segments show increasingly morerecent TMRCAs between branches labeled as Vic and Yam in PB1 PB2 and HA label sets All vertical lines indicating uncertainty are 95 highestposterior densities

4

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

BIowa032002-like (VndashVndashY0ndashVndashYndashYndashY0ndashV0) BCaliforniaNHRC00012006-like (VndashVndashYndashVndashY0ndashYndash

Y0ndashV0) BBrisbane332008-like (VndashVndashYndashVndashY0ndashYndashYndashV)

In a previous study BAlaska121996-like BNanchang21997-like and BIowa032002-like constellations were ob-served (Chen and Holmes 2008) but sequences from BCaliforniaNHRC00012006-like and BBrisbane332008-likeconstellations were not available at the time In their studyChen and Holmes (2008) also recovered the coassortmentpattern of PB1 PB2 and HA lineages but did not remarkupon it Of these five constellations four (BNanchang21997-like BIowa032002-like BCaliforniaNHRC00012006-like and BBrisbane332008-like) are derived from in-trogression of Yam lineage segments into Vic lineage PB1ndashPB2ndashHA background with only one (BAlaska121996-like)resulting from introgression of Vic lineage NS segment intoan entirely Yam-derived background All five interlineagereassortment events described here are marked by the

preservation of either entirely Vic- or Yam-derived PB1ndashPB2ndashHA segments Figure 6 also shows that reassorting seg-ments appear to evolve with a considerable degree of auton-omy For example the NP lineage that entered a largely Viclineage-derived genome and gave rise to the BNanchang21997-like isolates continued circulating until 2010 eventhough the other segments it coassorted with in 1995ndash1996(PA and MP) went extinct following the next round of reas-sortment that led to the rise of BIowa032002-like genomeconstellations A more extreme example is the NS segmentwhere a Vic sublineage was reassorted into an entirely Yambackground (BAlaska121996-like) in 1994ndash1995 then reas-sorted back into a mostly Vic background some 5 years later(BIowa032002-like) where it has replaced the ldquooriginalrdquo Vicsublineage (see fig 6)

We observe that in all five successful interlineage reassort-ment events shown in figure 6 none break up the PB1ndashPB2ndashHA complex This is an unlikely outcomemdashthe probability ofnot breaking up PB1ndashPB2ndashHA across five reassortment events

FIG 5 Ratio of Vic and Yam sequences in the data set The ratio of Vic (yellow) to Yam (purple) sequences in each segment from the primary data setover time Black lines indicate where this ratio lies in the larger secondary data set Numbers at the top of the figure show the total number of genomesavailable for each influenza season in the primary data set comprised of 452 genomes from which the ratio was calculated whereas the numbers inbrackets correspond to numbers of sequences in the larger secondary genomes data set Numbers at the bottom are influenza seasons from the 19871988 (8788) season to the 20112012 season Yam lineage of PA NP NA and MP segments and Vic lineage of the NS segment eventually become fixed(in the population genetics sense of the word) in the influenza B population PB1 PB2 and HA segments maintain separate Vic and Yam lineages

5

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is P frac14 eth2522282 THORN

5frac14 00009 where reassortment events are

considered to sample from the Vic and Yam lineages atrandom for each of the eight segments If we correct for mul-tiple testing with the assumption that coassortment of anythree segments is of interest we find that the probability ofnot breaking up an arbitrary set of three segments across five

reassortment events is P frac148

3

eth2

522282 THORN

5frac14 00485

Although the vast majority of influenza B isolates possesseither Vic or Yam lineage-derived PB1ndashPB2ndashHA complexeson rare occasions mixed-lineage PB1ndashPB2ndashHA constellationsemerge Figure 7 shows the sum of branch lengths which werelabeled as having entirely Vic entirely Yam or mixed-lineagePB1 PB2 and HA segments Due to lack of reassortmentbetween Vic and Yam lineages of PB1 PB2 and HA (fig 4)since 1997 all segments have spent significantly longer periodsof evolutionary time with either entirely Vic-derived or en-tirely Yam-derived than with mixed-lineage PB1 PB2 and HAconstellations (fig 7) We have identified three instancesof mixed-lineage PB1ndashPB2ndashHA reassortants from the primarydata set with the following PB1ndashPB2ndashHA constellations VVY(BBangkok1631990-like 13 sequences isolated 1990ndashJanuary 5 1995) VYV (BNanchang6301994-like two se-quences isolated 1994ndash1996) and VYY (BNew York241993-like two sequences isolated January 8 1993ndash1994)We detected two new reassortant lineages when investigatingthe larger secondary data setmdashBWaikato62005-like viruseswith PB1ndashPB2ndashHA constellation YYV (17 sequences isolated

May 9ndashOctober 12 in 2005) and BMalaysia18297822007with PB1ndashPB2ndashHA constellation YVY (one sequence isolatedAugust 2 2007)

Analysis of Reassortment Properties

We attempted to quantify the temporal discordance betweenlineages reassorting into new genomic constellations If one

FIG 6 Schematic plot of reconstructed reassortments between Vic and Yam lineage segments of influenza B virus Lineages that coassort in genomesare represented by eight parallel lines with lineages that derive from the original Vic clade colored yellowbrown and lineages that derive from theoriginal Yam clade colored lilacpurple Interlineage reassortment events are indicated by lines entering a different genome The angle of incominglineages represents uncertainty in the timing of the event (mean date of the reassortant node and its parent node) Lineage extinction dates are notshown accurately

FIG 7 Amount of evolutionary time each segment has spent underdifferent PB1ndashPB2ndashHA constellations All segments have spent signifi-cantly more of their history with entirely Vic- or entirely Yam-derivedPB1ndashPB2ndashHA complexes All vertical lines indicating uncertainty are95 highest posterior densities

6

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were able to recover an influenza ldquospecies treerdquo includingadmixturereassortment events it would be possible to esti-mate the reassortment or recombination ldquodistancerdquo whichis the time between a split in the species tree in the pastand a reassortment event (see supplementary fig S17Supplementary Material online) Although we do not findevidence of differences in total number of reassortments be-tween segments (see supplementary fig S4 SupplementaryMaterial online) we find support for a reassortment ldquodis-tancerdquo effect in which a pair of tips on one segment has adifferent TMRCA from the same pair of tips on a differentsegment The summary statistic we use that reflects this dif-ference in TMRCAs TMRCA is most sensitive when only oneof the two trees being compared loses diversity through reas-sortment and the other acts like a proxy for the ldquospecies treerdquoWe normalize our TMRCA comparisons to arrive at TMRCAwhich accounts for uncertainty in tree topology (seeMaterials and Methods) Figure 8 shows TMRCA values forall pairs of trees Most segment pairs show very low values forthis statistic with TMRCAamp01 indicating that TMRCA mea-surements between replicate posterior samples from thesame segment are up to ten times smaller than TMRCA

values between posterior samples from different segmentsPB1 PB2 and HA trees on the other hand exhibit TMRCA

values that are much higher This shows that TMRCA differ-ences between trees of PB1 PB2 and HA segments arethough noisy occasionally very similar to uncertainty in tip-to-tip TMRCAs between replicate analyses of these segments

Discussion

Linkage between PB1 PB2 and HA Gene Segments

In this article we show that the PB1 PB2 and HA segments ofinfluenza B viruses are the only ones that have continuouslymaintained separate Vic and Yam lineages whereas othersegments have fixed either Vic or Yam lineages (figs 2 5and 6) Evidence suggests that this is a result of prolongedlack of reassortment between Vic and Yam lineages in PB1PB2 and HA (fig 4) which possess coassorting sequencesdetectable as high linkage disequilibrium (LD) (supplemen-tary fig S1 Supplementary Material online) The vast majority

of the sampled evolutionary history of each segment of influ-enza B viruses since the split of Vic and Yam lineages has beenspent in association with either completely Vic or completelyYam lineage-derived PB1ndashPB2ndashHA complexes (fig 7) sug-gesting that having ldquopurerdquo lineage PB1ndashPB2ndashHA complexesis important for whole-genome fitness We propose that thispattern of coassortment is due to the action of selection andnot simply biased or rare reassortment

The origin of the strong genetic linkage between PB1 PB2and HA segments remains unclear We believe that there aretwo alternative but similar explanations for the origins of thestrong genetic linkage between these segments Mutation-driven coevolution (Presgraves 2010) and DobzhanskyndashMuller (DM) incompatibility (Dobzhansky 1937 Muller1942) Mutation-driven coevolution (Presgraves 2010) hasbeen suggested to be the cause of hybrid dysfunction inSaccharomyces hybrids (Lee et al 2008) and evolves as abyproduct of adaptation If one or the other influenza B lin-eage has undergone adaptation we might expect thesechanges to be beneficial in its native background and incom-patible with a foreign background DM incompatibility oper-ates in a similar way but the main difference from thescenario described earlier is that the incompatible alleles areneutral or nearly neutral in their native background andbecome deleterious or lethal when combined with nonnativebackgrounds Emergence of DM incompatibility is aided bygeographic isolation Interestingly the Vic lineage of HA wasrestricted to eastern Asia between 1992 and 2000 (Neromeet al 1998 Shaw et al 2002) offering ample time for thebudding Vic lineage to accumulate alleles causing reassort-ment incompatibility However without more genomic datafrom the past it is difficult to estimate to what extent influ-enza B virus population structure contributed to the devel-opment of the current segment linkage

Potential Mechanisms for ReassortmentIncompatibility

Unfortunately the limited amount of genomic data availablefor the early years of the VicndashYam split precludes any at-tempts of answering whether selection or drift have led tothe current linkage of PB1 PB2 and HA segments Althoughthe origins of the linkage between these three segments mightbe difficult to explain we can speculate on the nature ofreassortment incompatibility For example it is intuitive forwhy this might be the case for PB1 and PB2 Both proteinsinteract directly as part of the RNA-dependent RNA polymer-ase heterotrimer Indeed we observe that PB1ndashPB2 reassor-tants are the rarest and least persistent among mixed-lineagePB1ndashPB2ndashHA strains and have not been isolated in greatnumbers In fact most reassortants breaking the PB1ndashPB2ndashHA complex apart have occurred in the past close to the splitof Vic and Yam lineages and have become very rare since

There is some evidence that the linkage between PB1 andHA might not be a phenomenon restricted to influenza Bviruses It has been established that at least for the 1957and the 1968 influenza pandemics caused by AH2N2 andAH3N2 subtypes respectively the viruses responsible were

FIG 8 TMRCA statistics for different segment pairs PB1 PB2 and HAtrees exhibit reciprocally highly similar TMRCAs unlike most otherpairwise comparisons All vertical lines indicating uncertainty are 95highest posterior densities

7

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

reassortants possessing PB1 and HA segments derived fromavian influenza A viruses (Kawaoka et al 1989) In additionoutdated techniques for producing vaccine seed strainsthrough selection for HAndashNA reassortants often yieldedPB1ndashHAndashNA reassortants as a side-effect (Bergeron et al2010 Fulvini et al 2011) Recent experiments have foundthat the presence or absence of a ldquoforeignrdquo PB1 segmentcan have dramatic effects on HA concentration on the surfaceof virions and total virion production (Cobbin et al 2013)However there have been reassortant influenza A virusescirculating for prolonged periods of time in humans thatdid have disparate PB1 and HA segments for exampleH1N2 outbreaks in 2001 (Gregory et al 2002) and H1N109in 2009 (Smith et al 2009)

We believe that the association between PB1 PB2 and HAsegments should be relatively straightforward to explore inthe lab Reverse genetics systems have been developed forinfluenza B viruses (Hoffmann et al 2002) which wouldallow the creation of artificial reassortants Based on the fre-quency and persistence times of different reassortant classeswe have observed we expect a hierarchy of reassortant fitnessstarting with PB1+PB2+HA reassortants which should bethe most fit followed by PB1+2HA then PB1+HAPB2and finally PB2+HAPB1 reassortants with the lowest fitnessWe believe that this is the most direct approach to unravelingthe mechanism responsible for the linkage within the PB1ndashPB2ndashHA complex

Will Influenza B Viruses Speciate

We suggest that the preservation of two PB1ndashPB2ndashHA com-plex lineages is similar to genomic speciation islands wheresmall numbers of genes resist being homogenized throughgene flow (Turner et al 2005) In this context we see threepotential paths of evolution for influenza B viruses If moresegments get recruited to the PB1ndashPB2ndashHA complex theprocess could continue until ldquospeciationrdquo occurs in whichnone of the segments is able to reassort across the VicndashYam lineage boundary Alternatively the influenza Bgenome could continue to be homogenized through geneflow with the exception of PB1 PB2 and HA segments orone of the two PB1ndashPB2ndashHA complexes could go extinctmarking the return of single-strain dynamics in the influenzaB virus population The eventual fate of influenza B viruseswill likely be determined by the combined effects of reassort-ment frequency and the strength of epistatic interactionsbetween segments

Materials and MethodsWe compiled a primary data set of 452 complete influenza Bgenomes from GISAID (Bogner et al 2006) dating from 1984to 2012 The longest protein-coding region of each segmentwas extracted and used for all further analyses We thusassume that homologous recombination has not takenplace and that the evolutionary history of the whole segmentcan be inferred from the longest coding sequence in the seg-ment To date there has been little evidence of homologousrecombination in influenza viruses (Chare et al 2003

Boni et al 2008 Han et al 2010) The segments of eachstrain were assigned to either Vic or Yam lineage by makingmaximum-likelihood trees of each segment using PhyML(Guindon and Gascuel 2003) and identifying whether theisolate was more closely related to BVictoria287 orBYamagata1688 sequences in that segment with the ex-ception of the NS segment as BVictoria287 was a reassor-tant and possessed a Yam lineage NS (Lindstrom et al 1999)BCzechoslovakia691990 was considered as being represen-tative of Vic lineage for the NS segment Every segment ineach genome thus received either a Vic or a Yam lineagedesignation for example the strain BVictoria287 receivedVndashVndashVndashVndashVndashVndashVndashY as its NS segment is derived from theYam lineage and the rest of the genome is Vic

We also collated a secondary data set from all completeinfluenza B virus genomes available on GenBank as of May 52014 After removing isolates that had considerable portionsof any sequence missing were isolated prior to 1980 or weresuspected of having a contaminant sequence in any segmentwe were left with 1603 sequences This data set only becameavailable after all primary analyses were performed are mainlyfrom Australia New Zealand and the United States and aretoo numerous to analyze in BEAST (Drummond et al 2012)PhyML (Guindon and Gascuel 2003) was used to producephylogenies of each segment and the lineage of each isolatewas determined based on grouping with either BVictoria287 or BYamagata1688 sequences as described above Byassociating strains with lineage identity of each of their seg-ments we reconstructed the most parsimonious interlineagereassortment history for the secondary data set The second-ary data set was used to check how representative the pri-mary data set was to estimate LD and to broadly confirm ourresults All analyses pertain to the primary data set unlessstated otherwise

Temporally calibrated phylogenies were recovered for eachsegment in the primary data set using Markov chain MonteCarlo (MCMC) methods in the BEAST software package(Drummond et al 2012) We modeled the substitution pro-cess using the HasegawandashKishinondashYano model of nucleotidesubstitution (Hasegawa et al 1985) with separate transitionmodels for each of the three codon partitions and addition-ally estimated realized synonymous and nonsynonymoussubstitution counts (OrsquoBrien et al 2009) We used a flexibleBayesian skyride demographic model (Minin et al 2008) Weaccounted for incomplete sampling dates for 94 sequences(of which 93 had only year and 1 had only year and month ofisolation) whereby tip date is estimated as a latent variable inthe MCMC integration A relaxed molecular clock was usedwhere branch rates are drawn from a lognormal distribution(Drummond et al 2006) We ran three independent MCMCchains each with 200 million states sampled every 20000steps and discarded the first 10 of the MCMC states asburn-in After assessing convergence of all three MCMCchains by visual inspection using Tracer (Rambaut et al2009) we combined samples across chains to give a total of27000 samples from the posterior distribution of trees

Every sequence was assigned seven discrete traits in BEAUticorresponding to the lineages of all other segments with

8

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 6: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

BIowa032002-like (VndashVndashY0ndashVndashYndashYndashY0ndashV0) BCaliforniaNHRC00012006-like (VndashVndashYndashVndashY0ndashYndash

Y0ndashV0) BBrisbane332008-like (VndashVndashYndashVndashY0ndashYndashYndashV)

In a previous study BAlaska121996-like BNanchang21997-like and BIowa032002-like constellations were ob-served (Chen and Holmes 2008) but sequences from BCaliforniaNHRC00012006-like and BBrisbane332008-likeconstellations were not available at the time In their studyChen and Holmes (2008) also recovered the coassortmentpattern of PB1 PB2 and HA lineages but did not remarkupon it Of these five constellations four (BNanchang21997-like BIowa032002-like BCaliforniaNHRC00012006-like and BBrisbane332008-like) are derived from in-trogression of Yam lineage segments into Vic lineage PB1ndashPB2ndashHA background with only one (BAlaska121996-like)resulting from introgression of Vic lineage NS segment intoan entirely Yam-derived background All five interlineagereassortment events described here are marked by the

preservation of either entirely Vic- or Yam-derived PB1ndashPB2ndashHA segments Figure 6 also shows that reassorting seg-ments appear to evolve with a considerable degree of auton-omy For example the NP lineage that entered a largely Viclineage-derived genome and gave rise to the BNanchang21997-like isolates continued circulating until 2010 eventhough the other segments it coassorted with in 1995ndash1996(PA and MP) went extinct following the next round of reas-sortment that led to the rise of BIowa032002-like genomeconstellations A more extreme example is the NS segmentwhere a Vic sublineage was reassorted into an entirely Yambackground (BAlaska121996-like) in 1994ndash1995 then reas-sorted back into a mostly Vic background some 5 years later(BIowa032002-like) where it has replaced the ldquooriginalrdquo Vicsublineage (see fig 6)

We observe that in all five successful interlineage reassort-ment events shown in figure 6 none break up the PB1ndashPB2ndashHA complex This is an unlikely outcomemdashthe probability ofnot breaking up PB1ndashPB2ndashHA across five reassortment events

FIG 5 Ratio of Vic and Yam sequences in the data set The ratio of Vic (yellow) to Yam (purple) sequences in each segment from the primary data setover time Black lines indicate where this ratio lies in the larger secondary data set Numbers at the top of the figure show the total number of genomesavailable for each influenza season in the primary data set comprised of 452 genomes from which the ratio was calculated whereas the numbers inbrackets correspond to numbers of sequences in the larger secondary genomes data set Numbers at the bottom are influenza seasons from the 19871988 (8788) season to the 20112012 season Yam lineage of PA NP NA and MP segments and Vic lineage of the NS segment eventually become fixed(in the population genetics sense of the word) in the influenza B population PB1 PB2 and HA segments maintain separate Vic and Yam lineages

5

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is P frac14 eth2522282 THORN

5frac14 00009 where reassortment events are

considered to sample from the Vic and Yam lineages atrandom for each of the eight segments If we correct for mul-tiple testing with the assumption that coassortment of anythree segments is of interest we find that the probability ofnot breaking up an arbitrary set of three segments across five

reassortment events is P frac148

3

eth2

522282 THORN

5frac14 00485

Although the vast majority of influenza B isolates possesseither Vic or Yam lineage-derived PB1ndashPB2ndashHA complexeson rare occasions mixed-lineage PB1ndashPB2ndashHA constellationsemerge Figure 7 shows the sum of branch lengths which werelabeled as having entirely Vic entirely Yam or mixed-lineagePB1 PB2 and HA segments Due to lack of reassortmentbetween Vic and Yam lineages of PB1 PB2 and HA (fig 4)since 1997 all segments have spent significantly longer periodsof evolutionary time with either entirely Vic-derived or en-tirely Yam-derived than with mixed-lineage PB1 PB2 and HAconstellations (fig 7) We have identified three instancesof mixed-lineage PB1ndashPB2ndashHA reassortants from the primarydata set with the following PB1ndashPB2ndashHA constellations VVY(BBangkok1631990-like 13 sequences isolated 1990ndashJanuary 5 1995) VYV (BNanchang6301994-like two se-quences isolated 1994ndash1996) and VYY (BNew York241993-like two sequences isolated January 8 1993ndash1994)We detected two new reassortant lineages when investigatingthe larger secondary data setmdashBWaikato62005-like viruseswith PB1ndashPB2ndashHA constellation YYV (17 sequences isolated

May 9ndashOctober 12 in 2005) and BMalaysia18297822007with PB1ndashPB2ndashHA constellation YVY (one sequence isolatedAugust 2 2007)

Analysis of Reassortment Properties

We attempted to quantify the temporal discordance betweenlineages reassorting into new genomic constellations If one

FIG 6 Schematic plot of reconstructed reassortments between Vic and Yam lineage segments of influenza B virus Lineages that coassort in genomesare represented by eight parallel lines with lineages that derive from the original Vic clade colored yellowbrown and lineages that derive from theoriginal Yam clade colored lilacpurple Interlineage reassortment events are indicated by lines entering a different genome The angle of incominglineages represents uncertainty in the timing of the event (mean date of the reassortant node and its parent node) Lineage extinction dates are notshown accurately

FIG 7 Amount of evolutionary time each segment has spent underdifferent PB1ndashPB2ndashHA constellations All segments have spent signifi-cantly more of their history with entirely Vic- or entirely Yam-derivedPB1ndashPB2ndashHA complexes All vertical lines indicating uncertainty are95 highest posterior densities

6

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were able to recover an influenza ldquospecies treerdquo includingadmixturereassortment events it would be possible to esti-mate the reassortment or recombination ldquodistancerdquo whichis the time between a split in the species tree in the pastand a reassortment event (see supplementary fig S17Supplementary Material online) Although we do not findevidence of differences in total number of reassortments be-tween segments (see supplementary fig S4 SupplementaryMaterial online) we find support for a reassortment ldquodis-tancerdquo effect in which a pair of tips on one segment has adifferent TMRCA from the same pair of tips on a differentsegment The summary statistic we use that reflects this dif-ference in TMRCAs TMRCA is most sensitive when only oneof the two trees being compared loses diversity through reas-sortment and the other acts like a proxy for the ldquospecies treerdquoWe normalize our TMRCA comparisons to arrive at TMRCAwhich accounts for uncertainty in tree topology (seeMaterials and Methods) Figure 8 shows TMRCA values forall pairs of trees Most segment pairs show very low values forthis statistic with TMRCAamp01 indicating that TMRCA mea-surements between replicate posterior samples from thesame segment are up to ten times smaller than TMRCA

values between posterior samples from different segmentsPB1 PB2 and HA trees on the other hand exhibit TMRCA

values that are much higher This shows that TMRCA differ-ences between trees of PB1 PB2 and HA segments arethough noisy occasionally very similar to uncertainty in tip-to-tip TMRCAs between replicate analyses of these segments

Discussion

Linkage between PB1 PB2 and HA Gene Segments

In this article we show that the PB1 PB2 and HA segments ofinfluenza B viruses are the only ones that have continuouslymaintained separate Vic and Yam lineages whereas othersegments have fixed either Vic or Yam lineages (figs 2 5and 6) Evidence suggests that this is a result of prolongedlack of reassortment between Vic and Yam lineages in PB1PB2 and HA (fig 4) which possess coassorting sequencesdetectable as high linkage disequilibrium (LD) (supplemen-tary fig S1 Supplementary Material online) The vast majority

of the sampled evolutionary history of each segment of influ-enza B viruses since the split of Vic and Yam lineages has beenspent in association with either completely Vic or completelyYam lineage-derived PB1ndashPB2ndashHA complexes (fig 7) sug-gesting that having ldquopurerdquo lineage PB1ndashPB2ndashHA complexesis important for whole-genome fitness We propose that thispattern of coassortment is due to the action of selection andnot simply biased or rare reassortment

The origin of the strong genetic linkage between PB1 PB2and HA segments remains unclear We believe that there aretwo alternative but similar explanations for the origins of thestrong genetic linkage between these segments Mutation-driven coevolution (Presgraves 2010) and DobzhanskyndashMuller (DM) incompatibility (Dobzhansky 1937 Muller1942) Mutation-driven coevolution (Presgraves 2010) hasbeen suggested to be the cause of hybrid dysfunction inSaccharomyces hybrids (Lee et al 2008) and evolves as abyproduct of adaptation If one or the other influenza B lin-eage has undergone adaptation we might expect thesechanges to be beneficial in its native background and incom-patible with a foreign background DM incompatibility oper-ates in a similar way but the main difference from thescenario described earlier is that the incompatible alleles areneutral or nearly neutral in their native background andbecome deleterious or lethal when combined with nonnativebackgrounds Emergence of DM incompatibility is aided bygeographic isolation Interestingly the Vic lineage of HA wasrestricted to eastern Asia between 1992 and 2000 (Neromeet al 1998 Shaw et al 2002) offering ample time for thebudding Vic lineage to accumulate alleles causing reassort-ment incompatibility However without more genomic datafrom the past it is difficult to estimate to what extent influ-enza B virus population structure contributed to the devel-opment of the current segment linkage

Potential Mechanisms for ReassortmentIncompatibility

Unfortunately the limited amount of genomic data availablefor the early years of the VicndashYam split precludes any at-tempts of answering whether selection or drift have led tothe current linkage of PB1 PB2 and HA segments Althoughthe origins of the linkage between these three segments mightbe difficult to explain we can speculate on the nature ofreassortment incompatibility For example it is intuitive forwhy this might be the case for PB1 and PB2 Both proteinsinteract directly as part of the RNA-dependent RNA polymer-ase heterotrimer Indeed we observe that PB1ndashPB2 reassor-tants are the rarest and least persistent among mixed-lineagePB1ndashPB2ndashHA strains and have not been isolated in greatnumbers In fact most reassortants breaking the PB1ndashPB2ndashHA complex apart have occurred in the past close to the splitof Vic and Yam lineages and have become very rare since

There is some evidence that the linkage between PB1 andHA might not be a phenomenon restricted to influenza Bviruses It has been established that at least for the 1957and the 1968 influenza pandemics caused by AH2N2 andAH3N2 subtypes respectively the viruses responsible were

FIG 8 TMRCA statistics for different segment pairs PB1 PB2 and HAtrees exhibit reciprocally highly similar TMRCAs unlike most otherpairwise comparisons All vertical lines indicating uncertainty are 95highest posterior densities

7

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

reassortants possessing PB1 and HA segments derived fromavian influenza A viruses (Kawaoka et al 1989) In additionoutdated techniques for producing vaccine seed strainsthrough selection for HAndashNA reassortants often yieldedPB1ndashHAndashNA reassortants as a side-effect (Bergeron et al2010 Fulvini et al 2011) Recent experiments have foundthat the presence or absence of a ldquoforeignrdquo PB1 segmentcan have dramatic effects on HA concentration on the surfaceof virions and total virion production (Cobbin et al 2013)However there have been reassortant influenza A virusescirculating for prolonged periods of time in humans thatdid have disparate PB1 and HA segments for exampleH1N2 outbreaks in 2001 (Gregory et al 2002) and H1N109in 2009 (Smith et al 2009)

We believe that the association between PB1 PB2 and HAsegments should be relatively straightforward to explore inthe lab Reverse genetics systems have been developed forinfluenza B viruses (Hoffmann et al 2002) which wouldallow the creation of artificial reassortants Based on the fre-quency and persistence times of different reassortant classeswe have observed we expect a hierarchy of reassortant fitnessstarting with PB1+PB2+HA reassortants which should bethe most fit followed by PB1+2HA then PB1+HAPB2and finally PB2+HAPB1 reassortants with the lowest fitnessWe believe that this is the most direct approach to unravelingthe mechanism responsible for the linkage within the PB1ndashPB2ndashHA complex

Will Influenza B Viruses Speciate

We suggest that the preservation of two PB1ndashPB2ndashHA com-plex lineages is similar to genomic speciation islands wheresmall numbers of genes resist being homogenized throughgene flow (Turner et al 2005) In this context we see threepotential paths of evolution for influenza B viruses If moresegments get recruited to the PB1ndashPB2ndashHA complex theprocess could continue until ldquospeciationrdquo occurs in whichnone of the segments is able to reassort across the VicndashYam lineage boundary Alternatively the influenza Bgenome could continue to be homogenized through geneflow with the exception of PB1 PB2 and HA segments orone of the two PB1ndashPB2ndashHA complexes could go extinctmarking the return of single-strain dynamics in the influenzaB virus population The eventual fate of influenza B viruseswill likely be determined by the combined effects of reassort-ment frequency and the strength of epistatic interactionsbetween segments

Materials and MethodsWe compiled a primary data set of 452 complete influenza Bgenomes from GISAID (Bogner et al 2006) dating from 1984to 2012 The longest protein-coding region of each segmentwas extracted and used for all further analyses We thusassume that homologous recombination has not takenplace and that the evolutionary history of the whole segmentcan be inferred from the longest coding sequence in the seg-ment To date there has been little evidence of homologousrecombination in influenza viruses (Chare et al 2003

Boni et al 2008 Han et al 2010) The segments of eachstrain were assigned to either Vic or Yam lineage by makingmaximum-likelihood trees of each segment using PhyML(Guindon and Gascuel 2003) and identifying whether theisolate was more closely related to BVictoria287 orBYamagata1688 sequences in that segment with the ex-ception of the NS segment as BVictoria287 was a reassor-tant and possessed a Yam lineage NS (Lindstrom et al 1999)BCzechoslovakia691990 was considered as being represen-tative of Vic lineage for the NS segment Every segment ineach genome thus received either a Vic or a Yam lineagedesignation for example the strain BVictoria287 receivedVndashVndashVndashVndashVndashVndashVndashY as its NS segment is derived from theYam lineage and the rest of the genome is Vic

We also collated a secondary data set from all completeinfluenza B virus genomes available on GenBank as of May 52014 After removing isolates that had considerable portionsof any sequence missing were isolated prior to 1980 or weresuspected of having a contaminant sequence in any segmentwe were left with 1603 sequences This data set only becameavailable after all primary analyses were performed are mainlyfrom Australia New Zealand and the United States and aretoo numerous to analyze in BEAST (Drummond et al 2012)PhyML (Guindon and Gascuel 2003) was used to producephylogenies of each segment and the lineage of each isolatewas determined based on grouping with either BVictoria287 or BYamagata1688 sequences as described above Byassociating strains with lineage identity of each of their seg-ments we reconstructed the most parsimonious interlineagereassortment history for the secondary data set The second-ary data set was used to check how representative the pri-mary data set was to estimate LD and to broadly confirm ourresults All analyses pertain to the primary data set unlessstated otherwise

Temporally calibrated phylogenies were recovered for eachsegment in the primary data set using Markov chain MonteCarlo (MCMC) methods in the BEAST software package(Drummond et al 2012) We modeled the substitution pro-cess using the HasegawandashKishinondashYano model of nucleotidesubstitution (Hasegawa et al 1985) with separate transitionmodels for each of the three codon partitions and addition-ally estimated realized synonymous and nonsynonymoussubstitution counts (OrsquoBrien et al 2009) We used a flexibleBayesian skyride demographic model (Minin et al 2008) Weaccounted for incomplete sampling dates for 94 sequences(of which 93 had only year and 1 had only year and month ofisolation) whereby tip date is estimated as a latent variable inthe MCMC integration A relaxed molecular clock was usedwhere branch rates are drawn from a lognormal distribution(Drummond et al 2006) We ran three independent MCMCchains each with 200 million states sampled every 20000steps and discarded the first 10 of the MCMC states asburn-in After assessing convergence of all three MCMCchains by visual inspection using Tracer (Rambaut et al2009) we combined samples across chains to give a total of27000 samples from the posterior distribution of trees

Every sequence was assigned seven discrete traits in BEAUticorresponding to the lineages of all other segments with

8

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 7: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

is P frac14 eth2522282 THORN

5frac14 00009 where reassortment events are

considered to sample from the Vic and Yam lineages atrandom for each of the eight segments If we correct for mul-tiple testing with the assumption that coassortment of anythree segments is of interest we find that the probability ofnot breaking up an arbitrary set of three segments across five

reassortment events is P frac148

3

eth2

522282 THORN

5frac14 00485

Although the vast majority of influenza B isolates possesseither Vic or Yam lineage-derived PB1ndashPB2ndashHA complexeson rare occasions mixed-lineage PB1ndashPB2ndashHA constellationsemerge Figure 7 shows the sum of branch lengths which werelabeled as having entirely Vic entirely Yam or mixed-lineagePB1 PB2 and HA segments Due to lack of reassortmentbetween Vic and Yam lineages of PB1 PB2 and HA (fig 4)since 1997 all segments have spent significantly longer periodsof evolutionary time with either entirely Vic-derived or en-tirely Yam-derived than with mixed-lineage PB1 PB2 and HAconstellations (fig 7) We have identified three instancesof mixed-lineage PB1ndashPB2ndashHA reassortants from the primarydata set with the following PB1ndashPB2ndashHA constellations VVY(BBangkok1631990-like 13 sequences isolated 1990ndashJanuary 5 1995) VYV (BNanchang6301994-like two se-quences isolated 1994ndash1996) and VYY (BNew York241993-like two sequences isolated January 8 1993ndash1994)We detected two new reassortant lineages when investigatingthe larger secondary data setmdashBWaikato62005-like viruseswith PB1ndashPB2ndashHA constellation YYV (17 sequences isolated

May 9ndashOctober 12 in 2005) and BMalaysia18297822007with PB1ndashPB2ndashHA constellation YVY (one sequence isolatedAugust 2 2007)

Analysis of Reassortment Properties

We attempted to quantify the temporal discordance betweenlineages reassorting into new genomic constellations If one

FIG 6 Schematic plot of reconstructed reassortments between Vic and Yam lineage segments of influenza B virus Lineages that coassort in genomesare represented by eight parallel lines with lineages that derive from the original Vic clade colored yellowbrown and lineages that derive from theoriginal Yam clade colored lilacpurple Interlineage reassortment events are indicated by lines entering a different genome The angle of incominglineages represents uncertainty in the timing of the event (mean date of the reassortant node and its parent node) Lineage extinction dates are notshown accurately

FIG 7 Amount of evolutionary time each segment has spent underdifferent PB1ndashPB2ndashHA constellations All segments have spent signifi-cantly more of their history with entirely Vic- or entirely Yam-derivedPB1ndashPB2ndashHA complexes All vertical lines indicating uncertainty are95 highest posterior densities

6

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were able to recover an influenza ldquospecies treerdquo includingadmixturereassortment events it would be possible to esti-mate the reassortment or recombination ldquodistancerdquo whichis the time between a split in the species tree in the pastand a reassortment event (see supplementary fig S17Supplementary Material online) Although we do not findevidence of differences in total number of reassortments be-tween segments (see supplementary fig S4 SupplementaryMaterial online) we find support for a reassortment ldquodis-tancerdquo effect in which a pair of tips on one segment has adifferent TMRCA from the same pair of tips on a differentsegment The summary statistic we use that reflects this dif-ference in TMRCAs TMRCA is most sensitive when only oneof the two trees being compared loses diversity through reas-sortment and the other acts like a proxy for the ldquospecies treerdquoWe normalize our TMRCA comparisons to arrive at TMRCAwhich accounts for uncertainty in tree topology (seeMaterials and Methods) Figure 8 shows TMRCA values forall pairs of trees Most segment pairs show very low values forthis statistic with TMRCAamp01 indicating that TMRCA mea-surements between replicate posterior samples from thesame segment are up to ten times smaller than TMRCA

values between posterior samples from different segmentsPB1 PB2 and HA trees on the other hand exhibit TMRCA

values that are much higher This shows that TMRCA differ-ences between trees of PB1 PB2 and HA segments arethough noisy occasionally very similar to uncertainty in tip-to-tip TMRCAs between replicate analyses of these segments

Discussion

Linkage between PB1 PB2 and HA Gene Segments

In this article we show that the PB1 PB2 and HA segments ofinfluenza B viruses are the only ones that have continuouslymaintained separate Vic and Yam lineages whereas othersegments have fixed either Vic or Yam lineages (figs 2 5and 6) Evidence suggests that this is a result of prolongedlack of reassortment between Vic and Yam lineages in PB1PB2 and HA (fig 4) which possess coassorting sequencesdetectable as high linkage disequilibrium (LD) (supplemen-tary fig S1 Supplementary Material online) The vast majority

of the sampled evolutionary history of each segment of influ-enza B viruses since the split of Vic and Yam lineages has beenspent in association with either completely Vic or completelyYam lineage-derived PB1ndashPB2ndashHA complexes (fig 7) sug-gesting that having ldquopurerdquo lineage PB1ndashPB2ndashHA complexesis important for whole-genome fitness We propose that thispattern of coassortment is due to the action of selection andnot simply biased or rare reassortment

The origin of the strong genetic linkage between PB1 PB2and HA segments remains unclear We believe that there aretwo alternative but similar explanations for the origins of thestrong genetic linkage between these segments Mutation-driven coevolution (Presgraves 2010) and DobzhanskyndashMuller (DM) incompatibility (Dobzhansky 1937 Muller1942) Mutation-driven coevolution (Presgraves 2010) hasbeen suggested to be the cause of hybrid dysfunction inSaccharomyces hybrids (Lee et al 2008) and evolves as abyproduct of adaptation If one or the other influenza B lin-eage has undergone adaptation we might expect thesechanges to be beneficial in its native background and incom-patible with a foreign background DM incompatibility oper-ates in a similar way but the main difference from thescenario described earlier is that the incompatible alleles areneutral or nearly neutral in their native background andbecome deleterious or lethal when combined with nonnativebackgrounds Emergence of DM incompatibility is aided bygeographic isolation Interestingly the Vic lineage of HA wasrestricted to eastern Asia between 1992 and 2000 (Neromeet al 1998 Shaw et al 2002) offering ample time for thebudding Vic lineage to accumulate alleles causing reassort-ment incompatibility However without more genomic datafrom the past it is difficult to estimate to what extent influ-enza B virus population structure contributed to the devel-opment of the current segment linkage

Potential Mechanisms for ReassortmentIncompatibility

Unfortunately the limited amount of genomic data availablefor the early years of the VicndashYam split precludes any at-tempts of answering whether selection or drift have led tothe current linkage of PB1 PB2 and HA segments Althoughthe origins of the linkage between these three segments mightbe difficult to explain we can speculate on the nature ofreassortment incompatibility For example it is intuitive forwhy this might be the case for PB1 and PB2 Both proteinsinteract directly as part of the RNA-dependent RNA polymer-ase heterotrimer Indeed we observe that PB1ndashPB2 reassor-tants are the rarest and least persistent among mixed-lineagePB1ndashPB2ndashHA strains and have not been isolated in greatnumbers In fact most reassortants breaking the PB1ndashPB2ndashHA complex apart have occurred in the past close to the splitof Vic and Yam lineages and have become very rare since

There is some evidence that the linkage between PB1 andHA might not be a phenomenon restricted to influenza Bviruses It has been established that at least for the 1957and the 1968 influenza pandemics caused by AH2N2 andAH3N2 subtypes respectively the viruses responsible were

FIG 8 TMRCA statistics for different segment pairs PB1 PB2 and HAtrees exhibit reciprocally highly similar TMRCAs unlike most otherpairwise comparisons All vertical lines indicating uncertainty are 95highest posterior densities

7

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

reassortants possessing PB1 and HA segments derived fromavian influenza A viruses (Kawaoka et al 1989) In additionoutdated techniques for producing vaccine seed strainsthrough selection for HAndashNA reassortants often yieldedPB1ndashHAndashNA reassortants as a side-effect (Bergeron et al2010 Fulvini et al 2011) Recent experiments have foundthat the presence or absence of a ldquoforeignrdquo PB1 segmentcan have dramatic effects on HA concentration on the surfaceof virions and total virion production (Cobbin et al 2013)However there have been reassortant influenza A virusescirculating for prolonged periods of time in humans thatdid have disparate PB1 and HA segments for exampleH1N2 outbreaks in 2001 (Gregory et al 2002) and H1N109in 2009 (Smith et al 2009)

We believe that the association between PB1 PB2 and HAsegments should be relatively straightforward to explore inthe lab Reverse genetics systems have been developed forinfluenza B viruses (Hoffmann et al 2002) which wouldallow the creation of artificial reassortants Based on the fre-quency and persistence times of different reassortant classeswe have observed we expect a hierarchy of reassortant fitnessstarting with PB1+PB2+HA reassortants which should bethe most fit followed by PB1+2HA then PB1+HAPB2and finally PB2+HAPB1 reassortants with the lowest fitnessWe believe that this is the most direct approach to unravelingthe mechanism responsible for the linkage within the PB1ndashPB2ndashHA complex

Will Influenza B Viruses Speciate

We suggest that the preservation of two PB1ndashPB2ndashHA com-plex lineages is similar to genomic speciation islands wheresmall numbers of genes resist being homogenized throughgene flow (Turner et al 2005) In this context we see threepotential paths of evolution for influenza B viruses If moresegments get recruited to the PB1ndashPB2ndashHA complex theprocess could continue until ldquospeciationrdquo occurs in whichnone of the segments is able to reassort across the VicndashYam lineage boundary Alternatively the influenza Bgenome could continue to be homogenized through geneflow with the exception of PB1 PB2 and HA segments orone of the two PB1ndashPB2ndashHA complexes could go extinctmarking the return of single-strain dynamics in the influenzaB virus population The eventual fate of influenza B viruseswill likely be determined by the combined effects of reassort-ment frequency and the strength of epistatic interactionsbetween segments

Materials and MethodsWe compiled a primary data set of 452 complete influenza Bgenomes from GISAID (Bogner et al 2006) dating from 1984to 2012 The longest protein-coding region of each segmentwas extracted and used for all further analyses We thusassume that homologous recombination has not takenplace and that the evolutionary history of the whole segmentcan be inferred from the longest coding sequence in the seg-ment To date there has been little evidence of homologousrecombination in influenza viruses (Chare et al 2003

Boni et al 2008 Han et al 2010) The segments of eachstrain were assigned to either Vic or Yam lineage by makingmaximum-likelihood trees of each segment using PhyML(Guindon and Gascuel 2003) and identifying whether theisolate was more closely related to BVictoria287 orBYamagata1688 sequences in that segment with the ex-ception of the NS segment as BVictoria287 was a reassor-tant and possessed a Yam lineage NS (Lindstrom et al 1999)BCzechoslovakia691990 was considered as being represen-tative of Vic lineage for the NS segment Every segment ineach genome thus received either a Vic or a Yam lineagedesignation for example the strain BVictoria287 receivedVndashVndashVndashVndashVndashVndashVndashY as its NS segment is derived from theYam lineage and the rest of the genome is Vic

We also collated a secondary data set from all completeinfluenza B virus genomes available on GenBank as of May 52014 After removing isolates that had considerable portionsof any sequence missing were isolated prior to 1980 or weresuspected of having a contaminant sequence in any segmentwe were left with 1603 sequences This data set only becameavailable after all primary analyses were performed are mainlyfrom Australia New Zealand and the United States and aretoo numerous to analyze in BEAST (Drummond et al 2012)PhyML (Guindon and Gascuel 2003) was used to producephylogenies of each segment and the lineage of each isolatewas determined based on grouping with either BVictoria287 or BYamagata1688 sequences as described above Byassociating strains with lineage identity of each of their seg-ments we reconstructed the most parsimonious interlineagereassortment history for the secondary data set The second-ary data set was used to check how representative the pri-mary data set was to estimate LD and to broadly confirm ourresults All analyses pertain to the primary data set unlessstated otherwise

Temporally calibrated phylogenies were recovered for eachsegment in the primary data set using Markov chain MonteCarlo (MCMC) methods in the BEAST software package(Drummond et al 2012) We modeled the substitution pro-cess using the HasegawandashKishinondashYano model of nucleotidesubstitution (Hasegawa et al 1985) with separate transitionmodels for each of the three codon partitions and addition-ally estimated realized synonymous and nonsynonymoussubstitution counts (OrsquoBrien et al 2009) We used a flexibleBayesian skyride demographic model (Minin et al 2008) Weaccounted for incomplete sampling dates for 94 sequences(of which 93 had only year and 1 had only year and month ofisolation) whereby tip date is estimated as a latent variable inthe MCMC integration A relaxed molecular clock was usedwhere branch rates are drawn from a lognormal distribution(Drummond et al 2006) We ran three independent MCMCchains each with 200 million states sampled every 20000steps and discarded the first 10 of the MCMC states asburn-in After assessing convergence of all three MCMCchains by visual inspection using Tracer (Rambaut et al2009) we combined samples across chains to give a total of27000 samples from the posterior distribution of trees

Every sequence was assigned seven discrete traits in BEAUticorresponding to the lineages of all other segments with

8

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 8: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

were able to recover an influenza ldquospecies treerdquo includingadmixturereassortment events it would be possible to esti-mate the reassortment or recombination ldquodistancerdquo whichis the time between a split in the species tree in the pastand a reassortment event (see supplementary fig S17Supplementary Material online) Although we do not findevidence of differences in total number of reassortments be-tween segments (see supplementary fig S4 SupplementaryMaterial online) we find support for a reassortment ldquodis-tancerdquo effect in which a pair of tips on one segment has adifferent TMRCA from the same pair of tips on a differentsegment The summary statistic we use that reflects this dif-ference in TMRCAs TMRCA is most sensitive when only oneof the two trees being compared loses diversity through reas-sortment and the other acts like a proxy for the ldquospecies treerdquoWe normalize our TMRCA comparisons to arrive at TMRCAwhich accounts for uncertainty in tree topology (seeMaterials and Methods) Figure 8 shows TMRCA values forall pairs of trees Most segment pairs show very low values forthis statistic with TMRCAamp01 indicating that TMRCA mea-surements between replicate posterior samples from thesame segment are up to ten times smaller than TMRCA

values between posterior samples from different segmentsPB1 PB2 and HA trees on the other hand exhibit TMRCA

values that are much higher This shows that TMRCA differ-ences between trees of PB1 PB2 and HA segments arethough noisy occasionally very similar to uncertainty in tip-to-tip TMRCAs between replicate analyses of these segments

Discussion

Linkage between PB1 PB2 and HA Gene Segments

In this article we show that the PB1 PB2 and HA segments ofinfluenza B viruses are the only ones that have continuouslymaintained separate Vic and Yam lineages whereas othersegments have fixed either Vic or Yam lineages (figs 2 5and 6) Evidence suggests that this is a result of prolongedlack of reassortment between Vic and Yam lineages in PB1PB2 and HA (fig 4) which possess coassorting sequencesdetectable as high linkage disequilibrium (LD) (supplemen-tary fig S1 Supplementary Material online) The vast majority

of the sampled evolutionary history of each segment of influ-enza B viruses since the split of Vic and Yam lineages has beenspent in association with either completely Vic or completelyYam lineage-derived PB1ndashPB2ndashHA complexes (fig 7) sug-gesting that having ldquopurerdquo lineage PB1ndashPB2ndashHA complexesis important for whole-genome fitness We propose that thispattern of coassortment is due to the action of selection andnot simply biased or rare reassortment

The origin of the strong genetic linkage between PB1 PB2and HA segments remains unclear We believe that there aretwo alternative but similar explanations for the origins of thestrong genetic linkage between these segments Mutation-driven coevolution (Presgraves 2010) and DobzhanskyndashMuller (DM) incompatibility (Dobzhansky 1937 Muller1942) Mutation-driven coevolution (Presgraves 2010) hasbeen suggested to be the cause of hybrid dysfunction inSaccharomyces hybrids (Lee et al 2008) and evolves as abyproduct of adaptation If one or the other influenza B lin-eage has undergone adaptation we might expect thesechanges to be beneficial in its native background and incom-patible with a foreign background DM incompatibility oper-ates in a similar way but the main difference from thescenario described earlier is that the incompatible alleles areneutral or nearly neutral in their native background andbecome deleterious or lethal when combined with nonnativebackgrounds Emergence of DM incompatibility is aided bygeographic isolation Interestingly the Vic lineage of HA wasrestricted to eastern Asia between 1992 and 2000 (Neromeet al 1998 Shaw et al 2002) offering ample time for thebudding Vic lineage to accumulate alleles causing reassort-ment incompatibility However without more genomic datafrom the past it is difficult to estimate to what extent influ-enza B virus population structure contributed to the devel-opment of the current segment linkage

Potential Mechanisms for ReassortmentIncompatibility

Unfortunately the limited amount of genomic data availablefor the early years of the VicndashYam split precludes any at-tempts of answering whether selection or drift have led tothe current linkage of PB1 PB2 and HA segments Althoughthe origins of the linkage between these three segments mightbe difficult to explain we can speculate on the nature ofreassortment incompatibility For example it is intuitive forwhy this might be the case for PB1 and PB2 Both proteinsinteract directly as part of the RNA-dependent RNA polymer-ase heterotrimer Indeed we observe that PB1ndashPB2 reassor-tants are the rarest and least persistent among mixed-lineagePB1ndashPB2ndashHA strains and have not been isolated in greatnumbers In fact most reassortants breaking the PB1ndashPB2ndashHA complex apart have occurred in the past close to the splitof Vic and Yam lineages and have become very rare since

There is some evidence that the linkage between PB1 andHA might not be a phenomenon restricted to influenza Bviruses It has been established that at least for the 1957and the 1968 influenza pandemics caused by AH2N2 andAH3N2 subtypes respectively the viruses responsible were

FIG 8 TMRCA statistics for different segment pairs PB1 PB2 and HAtrees exhibit reciprocally highly similar TMRCAs unlike most otherpairwise comparisons All vertical lines indicating uncertainty are 95highest posterior densities

7

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

reassortants possessing PB1 and HA segments derived fromavian influenza A viruses (Kawaoka et al 1989) In additionoutdated techniques for producing vaccine seed strainsthrough selection for HAndashNA reassortants often yieldedPB1ndashHAndashNA reassortants as a side-effect (Bergeron et al2010 Fulvini et al 2011) Recent experiments have foundthat the presence or absence of a ldquoforeignrdquo PB1 segmentcan have dramatic effects on HA concentration on the surfaceof virions and total virion production (Cobbin et al 2013)However there have been reassortant influenza A virusescirculating for prolonged periods of time in humans thatdid have disparate PB1 and HA segments for exampleH1N2 outbreaks in 2001 (Gregory et al 2002) and H1N109in 2009 (Smith et al 2009)

We believe that the association between PB1 PB2 and HAsegments should be relatively straightforward to explore inthe lab Reverse genetics systems have been developed forinfluenza B viruses (Hoffmann et al 2002) which wouldallow the creation of artificial reassortants Based on the fre-quency and persistence times of different reassortant classeswe have observed we expect a hierarchy of reassortant fitnessstarting with PB1+PB2+HA reassortants which should bethe most fit followed by PB1+2HA then PB1+HAPB2and finally PB2+HAPB1 reassortants with the lowest fitnessWe believe that this is the most direct approach to unravelingthe mechanism responsible for the linkage within the PB1ndashPB2ndashHA complex

Will Influenza B Viruses Speciate

We suggest that the preservation of two PB1ndashPB2ndashHA com-plex lineages is similar to genomic speciation islands wheresmall numbers of genes resist being homogenized throughgene flow (Turner et al 2005) In this context we see threepotential paths of evolution for influenza B viruses If moresegments get recruited to the PB1ndashPB2ndashHA complex theprocess could continue until ldquospeciationrdquo occurs in whichnone of the segments is able to reassort across the VicndashYam lineage boundary Alternatively the influenza Bgenome could continue to be homogenized through geneflow with the exception of PB1 PB2 and HA segments orone of the two PB1ndashPB2ndashHA complexes could go extinctmarking the return of single-strain dynamics in the influenzaB virus population The eventual fate of influenza B viruseswill likely be determined by the combined effects of reassort-ment frequency and the strength of epistatic interactionsbetween segments

Materials and MethodsWe compiled a primary data set of 452 complete influenza Bgenomes from GISAID (Bogner et al 2006) dating from 1984to 2012 The longest protein-coding region of each segmentwas extracted and used for all further analyses We thusassume that homologous recombination has not takenplace and that the evolutionary history of the whole segmentcan be inferred from the longest coding sequence in the seg-ment To date there has been little evidence of homologousrecombination in influenza viruses (Chare et al 2003

Boni et al 2008 Han et al 2010) The segments of eachstrain were assigned to either Vic or Yam lineage by makingmaximum-likelihood trees of each segment using PhyML(Guindon and Gascuel 2003) and identifying whether theisolate was more closely related to BVictoria287 orBYamagata1688 sequences in that segment with the ex-ception of the NS segment as BVictoria287 was a reassor-tant and possessed a Yam lineage NS (Lindstrom et al 1999)BCzechoslovakia691990 was considered as being represen-tative of Vic lineage for the NS segment Every segment ineach genome thus received either a Vic or a Yam lineagedesignation for example the strain BVictoria287 receivedVndashVndashVndashVndashVndashVndashVndashY as its NS segment is derived from theYam lineage and the rest of the genome is Vic

We also collated a secondary data set from all completeinfluenza B virus genomes available on GenBank as of May 52014 After removing isolates that had considerable portionsof any sequence missing were isolated prior to 1980 or weresuspected of having a contaminant sequence in any segmentwe were left with 1603 sequences This data set only becameavailable after all primary analyses were performed are mainlyfrom Australia New Zealand and the United States and aretoo numerous to analyze in BEAST (Drummond et al 2012)PhyML (Guindon and Gascuel 2003) was used to producephylogenies of each segment and the lineage of each isolatewas determined based on grouping with either BVictoria287 or BYamagata1688 sequences as described above Byassociating strains with lineage identity of each of their seg-ments we reconstructed the most parsimonious interlineagereassortment history for the secondary data set The second-ary data set was used to check how representative the pri-mary data set was to estimate LD and to broadly confirm ourresults All analyses pertain to the primary data set unlessstated otherwise

Temporally calibrated phylogenies were recovered for eachsegment in the primary data set using Markov chain MonteCarlo (MCMC) methods in the BEAST software package(Drummond et al 2012) We modeled the substitution pro-cess using the HasegawandashKishinondashYano model of nucleotidesubstitution (Hasegawa et al 1985) with separate transitionmodels for each of the three codon partitions and addition-ally estimated realized synonymous and nonsynonymoussubstitution counts (OrsquoBrien et al 2009) We used a flexibleBayesian skyride demographic model (Minin et al 2008) Weaccounted for incomplete sampling dates for 94 sequences(of which 93 had only year and 1 had only year and month ofisolation) whereby tip date is estimated as a latent variable inthe MCMC integration A relaxed molecular clock was usedwhere branch rates are drawn from a lognormal distribution(Drummond et al 2006) We ran three independent MCMCchains each with 200 million states sampled every 20000steps and discarded the first 10 of the MCMC states asburn-in After assessing convergence of all three MCMCchains by visual inspection using Tracer (Rambaut et al2009) we combined samples across chains to give a total of27000 samples from the posterior distribution of trees

Every sequence was assigned seven discrete traits in BEAUticorresponding to the lineages of all other segments with

8

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 9: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

reassortants possessing PB1 and HA segments derived fromavian influenza A viruses (Kawaoka et al 1989) In additionoutdated techniques for producing vaccine seed strainsthrough selection for HAndashNA reassortants often yieldedPB1ndashHAndashNA reassortants as a side-effect (Bergeron et al2010 Fulvini et al 2011) Recent experiments have foundthat the presence or absence of a ldquoforeignrdquo PB1 segmentcan have dramatic effects on HA concentration on the surfaceof virions and total virion production (Cobbin et al 2013)However there have been reassortant influenza A virusescirculating for prolonged periods of time in humans thatdid have disparate PB1 and HA segments for exampleH1N2 outbreaks in 2001 (Gregory et al 2002) and H1N109in 2009 (Smith et al 2009)

We believe that the association between PB1 PB2 and HAsegments should be relatively straightforward to explore inthe lab Reverse genetics systems have been developed forinfluenza B viruses (Hoffmann et al 2002) which wouldallow the creation of artificial reassortants Based on the fre-quency and persistence times of different reassortant classeswe have observed we expect a hierarchy of reassortant fitnessstarting with PB1+PB2+HA reassortants which should bethe most fit followed by PB1+2HA then PB1+HAPB2and finally PB2+HAPB1 reassortants with the lowest fitnessWe believe that this is the most direct approach to unravelingthe mechanism responsible for the linkage within the PB1ndashPB2ndashHA complex

Will Influenza B Viruses Speciate

We suggest that the preservation of two PB1ndashPB2ndashHA com-plex lineages is similar to genomic speciation islands wheresmall numbers of genes resist being homogenized throughgene flow (Turner et al 2005) In this context we see threepotential paths of evolution for influenza B viruses If moresegments get recruited to the PB1ndashPB2ndashHA complex theprocess could continue until ldquospeciationrdquo occurs in whichnone of the segments is able to reassort across the VicndashYam lineage boundary Alternatively the influenza Bgenome could continue to be homogenized through geneflow with the exception of PB1 PB2 and HA segments orone of the two PB1ndashPB2ndashHA complexes could go extinctmarking the return of single-strain dynamics in the influenzaB virus population The eventual fate of influenza B viruseswill likely be determined by the combined effects of reassort-ment frequency and the strength of epistatic interactionsbetween segments

Materials and MethodsWe compiled a primary data set of 452 complete influenza Bgenomes from GISAID (Bogner et al 2006) dating from 1984to 2012 The longest protein-coding region of each segmentwas extracted and used for all further analyses We thusassume that homologous recombination has not takenplace and that the evolutionary history of the whole segmentcan be inferred from the longest coding sequence in the seg-ment To date there has been little evidence of homologousrecombination in influenza viruses (Chare et al 2003

Boni et al 2008 Han et al 2010) The segments of eachstrain were assigned to either Vic or Yam lineage by makingmaximum-likelihood trees of each segment using PhyML(Guindon and Gascuel 2003) and identifying whether theisolate was more closely related to BVictoria287 orBYamagata1688 sequences in that segment with the ex-ception of the NS segment as BVictoria287 was a reassor-tant and possessed a Yam lineage NS (Lindstrom et al 1999)BCzechoslovakia691990 was considered as being represen-tative of Vic lineage for the NS segment Every segment ineach genome thus received either a Vic or a Yam lineagedesignation for example the strain BVictoria287 receivedVndashVndashVndashVndashVndashVndashVndashY as its NS segment is derived from theYam lineage and the rest of the genome is Vic

We also collated a secondary data set from all completeinfluenza B virus genomes available on GenBank as of May 52014 After removing isolates that had considerable portionsof any sequence missing were isolated prior to 1980 or weresuspected of having a contaminant sequence in any segmentwe were left with 1603 sequences This data set only becameavailable after all primary analyses were performed are mainlyfrom Australia New Zealand and the United States and aretoo numerous to analyze in BEAST (Drummond et al 2012)PhyML (Guindon and Gascuel 2003) was used to producephylogenies of each segment and the lineage of each isolatewas determined based on grouping with either BVictoria287 or BYamagata1688 sequences as described above Byassociating strains with lineage identity of each of their seg-ments we reconstructed the most parsimonious interlineagereassortment history for the secondary data set The second-ary data set was used to check how representative the pri-mary data set was to estimate LD and to broadly confirm ourresults All analyses pertain to the primary data set unlessstated otherwise

Temporally calibrated phylogenies were recovered for eachsegment in the primary data set using Markov chain MonteCarlo (MCMC) methods in the BEAST software package(Drummond et al 2012) We modeled the substitution pro-cess using the HasegawandashKishinondashYano model of nucleotidesubstitution (Hasegawa et al 1985) with separate transitionmodels for each of the three codon partitions and addition-ally estimated realized synonymous and nonsynonymoussubstitution counts (OrsquoBrien et al 2009) We used a flexibleBayesian skyride demographic model (Minin et al 2008) Weaccounted for incomplete sampling dates for 94 sequences(of which 93 had only year and 1 had only year and month ofisolation) whereby tip date is estimated as a latent variable inthe MCMC integration A relaxed molecular clock was usedwhere branch rates are drawn from a lognormal distribution(Drummond et al 2006) We ran three independent MCMCchains each with 200 million states sampled every 20000steps and discarded the first 10 of the MCMC states asburn-in After assessing convergence of all three MCMCchains by visual inspection using Tracer (Rambaut et al2009) we combined samples across chains to give a total of27000 samples from the posterior distribution of trees

Every sequence was assigned seven discrete traits in BEAUticorresponding to the lineages of all other segments with

8

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 10: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

which a strain was isolated for example PB1 tree had PB2 PAHA NP NA MP and NS as traits and V or Y as values for eachtrait We inferred the ancestral state of lineages in each seg-ment by modeling transitions between these discrete statesusing an asymmetric transition matrix (Lemey et al 2009)with Bayesian stochastic search variable selection to estimatesignificant rates Because the posterior set of trees for a singlesegment has branches labeled with the inferred lineage in theremaining seven segments we can detect interlineage reas-sortments between pairs of segments by observing state tran-sitions that is Yam to Vic or Vic to Yam (fig 1) In additionby reconstructing the ancestral state of all other genomicsegments jointly we can infer coreassortment events whenmore than one trait transition occurs on the same node in atree Interphylogeny labeling approaches have been exten-sively used in the past to investigate reticulate evolution ininfluenza A viruses and HIV (Lycett et al 2012 Ward et al2013 Lu et al 2014)

Measures of Diversity

We inferred the diversity of each segment from their phylo-genetic tree by estimating the date of the most recentcommon ancestor of all branches at yearly time pointswhich places an upper bound on the maximum amount ofdiversity existing at each time point A version of this lineageturnover metric has previously been used to investigate thetempo and strength of selection in influenza A viruses duringseasonal circulation (Bedford et al 2011) In addition we cal-culated mean pairwise TMRCA between branches labeled asVic and Yam for PB1 PB2 and HA traits This gave us ameasure of how much a particular segment reassorts withrespect to Vic and Yam lineages of PB1 PB2 and HA seg-ments If Vic and Yam lineages of PB1 PB2 and HA segmentswere to be considered as being separate populations thismeasure would be equivalent to ldquobetween populationrdquodiversity

We also calculated the total amount of sampled evolution-ary time spent by each segment with entirely Vic entirelyYam or mixed-lineage PB1 PB2 and HA segments We dothis by summing the branch lengths in each tree under threedifferent lineage combinations of the PB1 PB2 and HA seg-ments PB1ndashPB2ndashHA derived entirely from Yam lineage PB1ndashPB2ndashHA entirely derived from Vic lineage and PB1ndashPB2ndashHAderived from a mixture of the two lineages This gives a mea-sure of how successful over long periods of time each par-ticular PB1ndashPB2ndashHA constellation has been

Tree-to-Tree Similarities

We express the normalized distance TMRCA between treesbelonging to two segments A and B for a particular posteriorsample i following

TMRCAethAi BiTHORN frac14TMRCAethAiAi

0THORN thorn TMRCAethBi Bi0THORN

2 TMRCAethAi BiTHORN eth1THORN

where TMRCAethAi BiTHORN frac141n

Xn

jfrac141gethAij BijTHORN and n is the total

number of pairwise comparisons available between setsof tips Thus gethAij BijTHORN is the absolute difference in TMRCA

of a pair of tips j where the pair is drawn from the ith pos-terior sample of tree A and the ith posterior sample of tree BAdditionally TMRCAethAiA0iTHORN is calculated from the ith poste-rior sample of tree A and ith posterior sample of an indepen-dent analysis of tree A (which we refer to as A0) which is usedin the normalization procedure to control for variability intree topology stability over the course of the MCMC chain(see supplementary figs S6 and S7 Supplementary Materialonline) We had three replicate analyses of each segment andin order to calculate TMRCAethAiA0iTHORN we used analyses num-bered 1 2 and 3 as A and analyses numbered 2 3 and 1 as A0in that order We subsampled our combined posterior distri-bution of trees to give a total of 2700 trees on which toanalyze TMRCA

Calculating the normalized TMRCAethAi BiTHORN for eachMCMC state provides us with a posterior distribution ofthis statistic allowing specific hypotheses regarding similaritiesbetween the trees of different segments to be tested Ourapproach exploits the branch scaling used by BEAST(Drummond et al 2012) as the trees are scaled in absolutetime and insensitive to variation in nucleotide substitutionrates between segments allowing for direct comparisons be-tween TMRCAs in different trees In the absence of reassort-ment we expect the tree of every segment to recapitulate theldquovirus treerdquo a concept analogous to ldquospecies treesrdquo in popu-lation genetics Our method operates under the assumptionthat the segment trees capture this ldquovirus treerdquo of influenza Bviruses quite well It is not an unreasonable assumption giventhe seasonal bottlenecks influenza viruses experience Thismakes it almost certain that influenza viruses circulating atany given time point are derived from a single genome thatexisted in the recent past The TMRCA statistic essentiallyquantifies the temporal distance between admixture eventsand nodes in the virus tree (see supplementary fig S17Supplementary Material online) We normalize TMRCA

values to get TMRCA a measure which quantifies theextent to which the similarity of two independent trees re-sembles phylogenetic noise The TMRCA statistic is an exten-sion of patristic distance methods and has previously beenused to tackle a wide variety of problems as phylogeneticdistance in predicting viral titer in Drosophila infected withviruses from closely related species (Longdon et al 2011) andto assess temporal incongruence in a phylogenetic tree ofamphibian species induced by using different calibrations(Ruane et al 2011)

LD across the Influenza B Genome

We used the secondary GenBank data set with 1603 com-plete genome sequences to estimate LD between amino acidloci across the longest proteins encoded by each segment ofthe influenza B virus genome To quantify LD we adapt the2

df statistic from (Hedrick and Thomson 1986)

2df frac14

2

N ethk 1THORN ethm 1THORN eth2THORN

where 2 is calculated from a classical contingency tableN is the number of haplotypes and ethk 1THORNethm 1THORN are the

9

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 11: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

degrees of freedom This statistic is equal to the widely used r2

LD statistic at biallelic loci but also quantifies LD when thereare more than two alleles per locus (Zhao et al 2005) LD wasestimated only at loci where each nucleotide or amino acidallele was present in at least two isolates We ignored gaps inthe alignment and did not consider them as polymorphismsIn all cases we used a minor allele frequency cutoff of 1 Wealso calculated another LD statistic D0 (Lewontin 1964) asD0ij frac14 Dij=Dmax

ij where Dij frac14 pethAiBjTHORN pethAiTHORNpethBjTHORN and

Dmaxij frac14 minfrac12pethAiTHORNpethBjTHORN eth1 pethAiTHORNTHORNeth1 pethBjTHORNTHORN

when Dij lt 0

Dmaxij frac14 minfrac12eth1 pethAiTHORNTHORNpethBjTHORN pethAiTHORNeth1 pethBjTHORNTHORN

when Dij 0

eth3THORN

where pethAiTHORN is the frequency of allele Ai at locus A pethBjTHORN is thefrequency of allele Bj at locus B and pethAiBjTHORN is the frequency ofhaplotype AiBj D0 is inflated when some haplotypes are notobserved for example when the minor allele frequency is lowWe find that D0 is almost uniformly high across the influenza Bvirus genome and close to 10 for almost any pair of polymor-phic loci This is because most amino acid alleles in the pop-ulation exist transiently meaning that they do not get a chanceto reassort and we only observe them within the backgroundsof more persistent alleles which D0 quantifies as complete LDWe think that metrics related to r2 such as2

df perform muchbetter on temporal data such as ours in finding persistentassociations between alleles and are easier to interpret

Data Availability

Python scripts used to process trees and sequences are avail-able at httpsgithubcomevogytisfluBtreemasterscripts(last accessed October 21 2014) Output files from scriptslineage designations MCC trees acknowledgment tables ac-cession numbers and redacted XML files (per GISAID DataAccess Agreement) are publicly available at httpsgithubcomevogytisfluBtreemasterdata (last accessed October21 2014)

Supplementary MaterialSupplementary material and figures S1ndashS17 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors thank Darren Obbard and Paul Wikramaratnafor helpful discussions and anonymous reviewers for com-ments and suggestions This study was supported by aNatural Environment Research Council studentshipD76739X to GD and a Newton International Fellowshipfrom the Royal Society to TB The research leading to theseresults has received funding from the European ResearchCouncil under the European Communityrsquos SeventhFramework Programme (FP72007-2013) under Grant

Agreement No 278433-PREDEMICS and ERC GrantAgreement No 260864 AR and SL acknowledge the sup-port of the Wellcome Trust (Grant No 092807)

ReferencesAnsaldi F DrsquoAgaro P de Florentiis D Puzelli S Lin YP Gregory V Bennett

M Donatelli I Gasparini R Crovari P et al 2003 Molecular charac-terization of influenza B viruses circulating in northern Italy duringthe 2001ndash2002 epidemic season J Med Virol 70463ndash469

Bedford T Cobey S Pascual M 2011 Strength and tempo of selectionrevealed in viral gene genealogies BMC Evol Biol 11220

Bedford T Suchard MA Lemey P Dudas G Gregory V Hay AJMcCauley JW Russell CA Smith DJ Rambaut A 2014 Integratinginfluenza antigenic dynamics with molecular evolution eLife 3p e01914

Bergeron C Valette M Lina B Ottmann M 2010 Genetic content ofinfluenza H3N2 vaccine seeds PLoS Curr 2RRN1165

Bodewes R Morick D de Mutsert G Osinga N Bestebroer T van derVliet S Smits SL Kuiken T Rimmelzwaan GF Fouchier RA et al2013 Recurring influenza B virus infections in seals Emerg Infect Dis19511ndash512

Bogner P Capua I Lipman DJ Cox NJ et al 2006 A global initiative onsharing avian flu data Nature 442981

Boni MF Zhou Y Taubenberger JK Holmes EC 2008 Homologousrecombination is very rare or absent in human influenza A virusJ Virol 824807ndash4811

Broberg E Beaute J Snacken R 2013 Fortnightly influenza surveillanceoverview 24 May 2013 - weeks 19-202013 Available fromhttpecdceuropaeuenpublicationsPublicationsinfluenza-fortnightly-surveillance-overview-24-may-2013pdf

Burnet SFM 1955 Principles of animal virology New York AcademicPress

Chare ER Gould EA Holmes EC 2003 Phylogenetic analysis reveals alow rate of homologous recombination in negative-sense RNA vi-ruses J Gen Virol 842691ndash2703

Chen R Holmes EC 2008 The evolutionary dynamics of human influ-enza B virus J Mol Evol 66655ndash663

Cobbin JCA Verity EE Gilbertson BP Rockman SP Brown LE 2013 Thesource of the PB1 gene in influenza vaccine reassortants selectivelyalters the hemagglutinin content of the resulting seed virus J Virol875577ndash5585

Dobzhansky T 1937 Genetics and the origin of species New YorkColumbia University Press

Drummond AJ Ho SYW Phillips MJ Rambaut A 2006 Relaxed phylo-genetics and dating with confidence PLoS Biol 4e88

Drummond AJ Suchard MA Xie D Rambaut A 2012 Bayesian phylo-genetics with BEAUti and the BEAST 17 Mol Biol Evol 291969ndash1973

Fulvini AA Ramanunninair M Le J Pokorny BA Arroyo JM Silverman JDevis R Bucher D 2011 Gene constellation of influenza A virusreassortants with high growth phenotype prepared as seed candi-dates for vaccine production PLoS One 6e20823

Gregory V Bennett M Orkhan M Hajjar SA Varsano N Mendelson EZambon M Ellis J Hay A Lin Y 2002 Emergence of influenza AH1N2 reassortant viruses in the human population during 2001Virology 3001ndash7

Guindon S Gascuel O 2003 A simple fast and accurate algorithm toestimate large phylogenies by maximum likelihood Syst Biol 52696ndash704

Han GZ Boni MF Li SS 2010 No observed effect of homologous re-combination on influenza C virus evolution Virol J 7227

Hasegawa M Kishino H Yano TA 1985 Dating of the human-apesplitting by a molecular clock of mitochondrial DNA J Mol Evol22160ndash174

Hay AJ Gregory V Douglas AR Lin YP 2001 The evolution of hu-man influenza viruses Philos Trans R Soc Lond B Biol Sci 3561861ndash1870

10

Dudas et al doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 12: Edinburgh Research Explorer · 2014. 12. 18. · Edinburgh Research Explorer Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1-PB2-HA Gene Complex Citation

Hedrick PW Thomson G 1986 A two-locus neutrality test applicationsto humans E coli and lodgepole pine Genetics 112135ndash156

Hoffmann E Mahmood K Yang CF Webster RG Greenberg HB KembleG 2002 Rescue of influenza B virus from eight plasmids Proc NatlAcad Sci U S A 9911411ndash11416

Kanegae Y Sugita S Endo A Ishida M Senya S Osako K Nerome K OyaA 1990 Evolutionary pattern of the hemagglutinin gene of influenzaB viruses isolated in Japan cocirculating lineages in the same epi-demic season J Virol 642860ndash2865

Kawaoka Y Krauss S Webster RG 1989 Avian-to-human transmissionof the PB1 gene of influenza A viruses in the 1957 and 1968 pan-demics J Virol 634603ndash4608

Lee HY Chou JY Cheong L Chang NH Yang SY Leu JY 2008Incompatibility of nuclear and mitochondrial genomes causeshybrid sterility between two yeast species Cell 1351065ndash1073

Lemey P Rambaut A Drummond AJ Suchard MA 2009 Bayesianphylogeography finds its roots PLoS Comput Biol 5e1000520

Lewontin RC 1964 The interaction of selection and linkage I generalconsiderations heterotic models Genetics 4949ndash67

Lindstrom SE Hiromoto Y Nishimura H Saito T Nerome R Nerome K1999 Comparative analysis of evolutionary mechanisms of the hem-agglutinin and three internal protein genes of influenza B virusmultiple cocirculating lineages and frequent reassortment of theNP m and NS genes J Virol 734413ndash4426

Longdon B Hadfield JD Webster CL Obbard DJ Jiggins FM 2011 Hostphylogeny determines viral persistence and replication in novelhosts PLoS Pathog 7e1002260

Lu L Lycett SJ Brown AJL 2014 Reassortment patterns of avian influ-enza virus internal segments among different subtypes BMC EvolBiol 1416

Lycett SJ Baillie G Coulter E Bhatt S Kellam P McCauley JW Wood JLBrown IH Pybus OG Leigh Brown AJ Combating Swine InfluenzaInitiative-COSI Consortium 2012 Estimating reassortment rates inco-circulating Eurasian swine influenza viruses J Gen Virol 932326ndash2336

Minin VN Bloomquist EW Suchard MA 2008 Smooth skyride througha rough skyline Bayesian coalescent-based inference of populationdynamics Mol Biol Evol 251459ndash1471

Muller H 1942 Isolating mechanisms evolution and temeperature BiolSymp 671ndash125

Nakagawa N Nukuzuma S Haratome S Go S Nakagawa T Hayashi K2002 Emergence of an influenza B virus with antigenic change J ClinMicrobiol 403068ndash3070

Nerome R Hiromoto Y Sugita S Tanabe N Ishida M Matsumoto MLindstrom SE Takahashi T Nerome K 1998 Evolutionary charac-teristics of influenza B virus since its first isolation in 1940 dynamiccirculation of deletion and insertion mechanism Arch Virol 1431569ndash1583

OrsquoBrien JD Minin VN Suchard MA 2009 Learning to count robustestimates for labeled distances between molecular sequences MolBiol Evol 26801ndash814

Osterhaus ADME Rimmelzwaan GF Martina BEE Bestebroer TMFouchier RAM 2000 Influenza B virus in seals Science 2881051ndash1053

Presgraves DC 2010 The molecular evolutionary basis of species forma-tion Nat Rev Genet 11175ndash180

Rambaut A Suchard M Drummond A 2009 Tracer v15 [Internet]Available from httptreebioedacuksoftwaretracer

Reed C Meltzer MI Finelli L Fiore A 2012 Public health impact ofincluding two lineages of influenza B in a quadrivalent seasonalinfluenza vaccine Vaccine 301993ndash1998

Rota PA Wallis TR Harmon MW Rota JS Kendal AP Nerome K 1990Cocirculation of two distinct evolutionary lineages of influenza typeB virus since 1983 Virology 17559ndash68

Ruane S Pyron RA Burbrink FT 2011 Phylogenetic relationships of thecretaceous frog Beelzebufo from Madagascar and the placementof fossil constraints based on temporal and phylogenetic evidenceJ Evol Biol 24274ndash285

Shaw MW Xu X Li Y Normand S Ueki RT Kunimoto GY Hall HKlimov A Cox NJ Subbarao K 2002 Reappearance and globalspread of variants of influenza BVictoria287 lineage viruses inthe 2000ndash2001 and 2001ndash2002 seasons Virology 3031ndash8

Smith GJD Vijaykrishna D Bahl J Lycett SJ Worobey M Pybus OG MaSK Cheung CL Raghwani J Bhatt S et al 2009 Origins and evolu-tionary genomics of the 2009 swine-origin H1N1 influenza A epi-demic Nature 4591122ndash1125

Turner TL Hahn MW Nuzhdin SV 2005 Genomic islands of speciationin Anopheles gambiae PLoS Biol 3e285

Ward MJ Lycett SJ Kalish ML Rambaut A Brown AJL 2013 Estimatingthe rate of intersubtype recombination in early HIV-1 group Mstrains J Virol 871967ndash1973

World Health Organization 2009 Influenza fact sheet Available fromhttpwwwwhointmediacentrefactsheetsfs211en

Zhao H Nettleton D Soller M Dekkers JCM 2005 Evaluation of linkagedisequilibrium measures between multi-allelic markers as predictorsof linkage disequilibrium between markers and QTL Genet Res 8677ndash87

11

Reassortment in Influenza B Viruses doi101093molbevmsu287 MBE at E

dinburgh University on D

ecember 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from