Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is...

22
Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a PDF file of an accepted peer-reviewed article but is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 The Author(s).

Transcript of Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is...

Page 1: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a PDF file of an accepted peer-reviewed article but is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 The Author(s).

Page 2: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

Title: Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak 1

Authors: Tao Zhang1†, Qunfu Wu1†, Zhigang Zhang1,2* 2

Affiliations: 3

1State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, 4

School of Life Sciences, Yunnan University, No.2 North Cuihu Road, Kunming, Yunnan, 5

650091, China 6

2Lead Contact 7

†These authors contributed equally to this work 8

*Correspondence: [email protected] 9

Summary: 10

An outbreak of coronavirus disease 2019 (COVID-19) caused by the 2019 novel 11

coronavirus (SARS-CoV-2) began in the city of Wuhan in China and has widely spread 12

worldwide. Currently, it is vital to explore potential intermediate hosts of SARS-CoV-2 to 13

control COVID-19 spread. Therefore, we reinvestigated published data from pangolin lung 14

samples from which SARS-CoV-like CoVs were detected by Liu et al.[1]. We found 15

genomic and evolutionary evidence of the occurrence of a SARS-CoV-2-like CoV (named 16

Pangolin-CoV) in dead Malayan pangolins. Pangolin-CoV is 91.02% and 90.55% identical 17

to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole genome level. Aside 18

from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. The S1 19

protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. 20

Five key amino acid residues involved in the interaction with human ACE2 are completely 21

consistent between Pangolin-CoV and SARS-CoV-2, but four amino acid mutations are 22

present in RaTG13. Both Pangolin-CoV and RaTG13 lost the putative furin recognition 23

sequence motif at S1/S2 cleavage site that can be observed in the SARS-CoV-2. 24

Conclusively, this study suggests that pangolin species are a natural reservoir of SARS-25

CoV-2-like CoVs. 26

Keywords: Pangolin; SARS-CoV-2; COVID-19; Origin. 27

Results and Discussion 28

Manuscript

Page 3: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

Similar to the case for SARS-CoV and MERS-CoV[2], the bat is still a probable 29

species of origin for SARS-CoV-2 because SARS-CoV-2 shares 96% whole-genome 30

identity with a bat coronavirus (CoV), BatCoV RaTG13, from Rhinolophus affinis from 31

Yunnan Province[3]. However, SARS-CoV and MERS-CoV usually pass into intermediate 32

hosts, such as civets or camels, before leaping to humans[4]. This fact indicates that SARS-33

CoV-2 was probably transmitted to humans by other animals. Considering that the earliest 34

COVID-19 patient reported no exposure at the seafood market[5], it is vital to find the 35

intermediate SARS-CoV-2 host to block interspecies transmission. On 24 October 2019, 36

Liu and his colleagues from the Guangdong Wildlife Rescue Center of China[1] first 37

detected the existence of a SARS-CoV-like CoV from lung samples of two dead Malayan 38

pangolins with a frothy liquid in their lungs and pulmonary fibrosis, and this fact was 39

discovered close to when the COVID-19 outbreak occurred. Using their published results, 40

we showed that all virus contigs assembled from 2 lung samples (lung07, lung08) exhibited 41

low identities, ranging from 80.24% to 88.93%, with known SARSr-CoVs. Hence, we 42

conjectured that the dead Malayan pangolins may carry a new CoV closely related to 43

SARS-CoV-2. 44

Assessing the probability of SARS-CoV-2-like CoV presence in pangolin species 45

To confirm our assumption, we downloaded raw RNA-seq data (sequence read archive 46

(SRA) accession number PRJNA573298) for those two lung samples from the SRA and 47

conducted consistent quality control and contaminant removal, as described by Liu’s 48

study[1]. We found 1882 clean reads from the lung08 sample that mapped to the SARS-49

CoV-2 reference genome (GenBank Accession MN908947)[6] and covered 76.02% of the 50

SARS-CoV-2 genome. We performed de novo assembly of those reads and obtained 36 51

contigs with lengths ranging from 287 bp to 2187 bp, with a mean length of 700 bp. Via 52

Blast analysis against proteins from 2845 CoV reference genomes, including RaTG13, 53

SARS-CoV-2s and other known CoVs, we found that 22 contigs were best matched to 54

SARS-CoV-2s (70.6%-100% amino acid identity; average: 95.41%) and that 12 contigs 55

matched to bat SARS-CoV-like CoV (92.7%-100% amino acid identity; average: 97.48%) 56

Page 4: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

(Table S1). These results indicate that the Malayan pangolin might carry a novel CoV (here 57

named Pangolin-CoV) that is similar to SARS-CoV-2. 58

Draft genome of Pangolin-CoV and its genomic characteristics 59

Using a reference-guided scaffolding approach, we created a Pangolin-CoV draft 60

genome (19,587 bp) based on the above 34 contigs. To reduce the effect of raw read errors 61

on scaffolding quality, small fragments that aligned against the reference genome with a 62

length less than 25 bp were manually discarded if they were unable to be covered by any 63

large fragments or reference genome. Remapping 1882 reads against the draft genome 64

resulted in 99.99% genome coverage (coverage depth range: 1X-47X) (Figure 1A). The 65

mean coverage depth was 7.71X across the whole genome, which was two times higher 66

than the lowest common 3X read coverage depth for single-nucleotide polymorphism (SNP) 67

calling based on low-coverage sequencing in the 1000 Genomes Project pilot phase[7]. 68

Similar coverage levels are also sufficient to detect rare or low-abundance microbial 69

species from metagenomic datasets[8], indicating that our assembled Pangolin-CoV draft 70

genome is reliable for further analyses. Based on Simplot analysis[9], Pangolin-CoV 71

showed high overall genome sequence identity to RaTG13 (90.55%) and SARS-CoV-2 72

(91.02%) throughout the genome (Figure 1B), although there was a higher identity (96.2%) 73

between SARS-CoV-2 and RaTG13[3]. Other SARS-CoV-like CoVs similar to Pangolin-74

CoV were bat SARSr-CoV ZXC21 (85.65%) and bat SARSr-CoV ZC45 (85.01%). While 75

this manuscript was under review, two similar preprint studies found that CoVs in 76

pangolins shared 90.3%[10] and 92.4%[11] DNA identity with SARS-CoV-2 77

approximating the 91.02% identity to SARS-CoV-2 observed here and supporting our 78

findings. Taken together, these results indicate that Pangolin-CoV might be the common 79

origin of SARS-CoV-2 and RaTG13. 80

The Pangolin-CoV genome organization was characterized by sequence alignment 81

against SARS-CoV-2 (GenBank accession MN908947) and RaTG13. The Pangolin-CoV 82

genome consists of six major open reading frames (ORFs) common to CoVs and four other 83

accessory genes (Figure 1C and Table S2). Further analysis indicated that Pangolin-CoV 84

genes aligned to SARS-CoV-2 genes with coverage ranging from 45.8% to 100% (average 85

Page 5: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

coverage 76.9%). Pangolin-CoV genes shared high average nucleotide and amino acid 86

identity with both SARS-CoV-2 (MN908947) (93.2% nucleotide/94.1% amino acid 87

identity) and RaTG13 (92.8% nucleotide/93.5% amino acid identity) genes (Figure 1C and 88

Table S2). Surprisingly, some Pangolin-CoV genes showed higher amino acid sequence 89

identity to SARS-CoV-2 genes than to RaTG13 genes, including orf1b (73.4%/72.8%), the 90

spike (S) protein (97.5%/95.4%), orf7a (96.9%/93.6%), and orf10 (97.3%/94.6%). The 91

high S protein amino acid identity implies functional similarity between Pangolin-CoV and 92

SARS-CoV-2. 93

Phylogenetic relationships among Pangolin-CoV, RaTG13 and SARS-CoV-2 94

To determine the evolutionary relationships among Pangolin-CoV, SARS-CoV-2 and 95

previously identified CoVs, we estimated phylogenetic trees based on the nucleotide 96

sequences of the whole genome sequence, RNA-dependent RNA polymerase gene (RdRp), 97

non-structural protein genes ORF1a and ORF1b, and main structural proteins encoded by 98

the S and M genes. In all phylogenies, Pangolin-CoV, RaTG13 and SARS-CoV-2 were 99

clustered into a well-supported group, here named the “SARS-CoV-2 group” (Figure 2 and 100

Figures S1 to S2). This group represents a novel Betacoronavirus group. Within this group, 101

RaTG13 and SARS-CoV-2 were grouped together, and Pangolin-CoV was their closest 102

common ancestor. However, whether the basal position of the SARS-CoV-2 group is 103

SARSr-CoV ZXC21 and/or SARSr-CoV ZC45 is still under debate. Such debate also 104

occurred in both the Wu et al.[6] and Zhou et al.[3] studies. A possible explanation is a past 105

history of recombination in the Betacoronavirus group[6]. It is noteworthy that the 106

discovered evolutionary relationships of CoVs shown by the whole genome, RdRp gene, 107

and S gene were highly consistent with those exhibited by complete genome information 108

in the Zhou et al. study[3]. This correspondence indicates that our Pangolin-CoV draft 109

genome has enough genomic information to trace the true evolutionary position of 110

Pangolin-CoV in CoVs. 111

Dualism of the S protein of Pangolin-CoV 112

The CoV S protein consists of 2 subunits (S1 and S2), mediates infection of receptor-113

expressing host cells and is a critical target for antiviral neutralizing antibodies[12]. S1 114

Page 6: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

contains a receptor-binding domain (RBD) that consists of an approximately 193 amino 115

acid fragment, which is responsible for recognizing and binding the cell surface 116

receptor[13, 14]. Zhou et al. experimentally confirmed that SARS-CoV-2 is able to use 117

human, Chinese horseshoe bat, civet, and pig ACE2 proteins as an entry receptor in ACE2-118

expressing cells[3], suggesting that the RBD of SARS-CoV-2 mediates infection in 119

humans and other animals. To gain sequence-level insight into the pathogenic potential of 120

Pangolin-CoV, we first investigated the amino acid variation pattern of the S1 proteins 121

from Pangolin-CoV, SARS-CoV-2, RaTG13, and other representative SARS/SARSr-122

CoVs. The amino acid phylogenetic tree showed that the S1 protein of Pangolin-CoV is 123

more closely related to that of 2019-CoV than to that of RaTG13. Within the RBD, we 124

further found that Pangolin-CoV and SARS-CoV-2 were highly conserved, with only one 125

amino acid change (500H/500Q) (Figure 3), which is not one of the five key residues 126

involved in the interaction with human ACE2[3, 14]. These results indicate that Pangolin-127

CoV could have pathogenic potential similar to that of SARS-CoV-2. In contrast, RaTG13 128

has changes in 17 amino acid residues, 4 of which are among the key amino acid residues 129

(Figure 3). There are evidences suggesting that the change of 472L (SARS-CoV) to 486F 130

(SARS-CoV-2) (corresponding to the second key amino acid residue change in Figure 3) 131

may make stronger van der Waals contact with M82 (ACE2)[15]. Besides, the major 132

substitution of 404V in the SARS-CoV-RBD with 417K in the SARS-CoV-2-RBD (see 133

420 alignment position in Figure 3 and without amino acid change between the SARS-134

CoV-2 and RaTG13) may result in tighter association because of the salt bridge formation 135

between 417K and 30D of ACE2[15]. Nevertheless, it still needs further investigation 136

about whether those mutations affect the affinity for ACE2. Whether the Pangolin-CoV or 137

RaTG13 as potential infectious agents to humans remains to be determined. 138

The S1/S2 cleavage site in the S protein is also an important determinant of the 139

transmissibility and pathogenicity of SARS-CoV/SARS-CoVr viruses[16]. The trimetric S 140

protein is processed at the S1/S2 cleavage site by host cell proteases during infection. 141

Following cleavage, also known as priming, the protein is divided into an N-terminal S1-142

ectodomain that recognizes a cognate cell surface receptor and a C-terminal S2-membrane 143

Page 7: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

anchored protein that drives fusion of the viral envelope with a cellular membrane. We 144

found that the SARS-CoV-2 S protein contains a putative furin recognition motif 145

(PRRARSV) (Figure 4) similar to that of MERS-CoV, which has a PRSVRSV motif that 146

is likely cleaved by furin[16, 17] during virus egress. Conversely, the furin sequence motif 147

at the S1/S2 site is missing in the S protein of Pangolin-CoV and all other SARS/SARSr-148

CoVs. This difference indicates the SARS-CoV-2 might gain a distinct mechanism to 149

promote its entry into host cells[18]. Interestingly, aside from MERS-CoV, similar 150

sequence patterns to the SARS-CoV-2 were also presented in some members of 151

Alphacoronavirus, Betacoronavirus, and Gammcoronavirus[19], raising an interesting 152

question regarding whether this furin sequence motif in SARS-CoV-2 might be derived 153

from those existed S protein of other coronaviruses or alternatively the SARS-CoV-2 might 154

be the recombinant of Pangolin-CoV or RaTG13 and other coronaviruses with similar furin 155

recognition motif in the unknown intermediate host. 156

Amino acid variations in the nucleocapsid (N) protein for potential diagnosis 157

The N protein is the most abundant protein in CoVs. The N protein is a highly 158

immunogenic phosphoprotein, and it is normally very conserved. The CoV N protein is 159

often used as a marker in diagnostic assays. To gain further insight into the diagnostic 160

potential of Pangolin-CoV, we investigated the amino acid variation pattern of the N 161

proteins from Pangolin-CoV, SARS-CoV-2, RaTG13, and other representative SARS-162

CoVs. Phylogenetic analysis based on the N protein supported the classification of 163

Pangolin-CoV as a sister taxon of SARS-CoV-2 and RaTG13 (Figure S3). We further 164

found seven amino acid mutations that differentiated our defined “SAR-CoV-2 group” 165

CoVs (12N, 26 G, 27S, 104D, 218A, 335T, 346N, and 350Q) from other known SARS-166

CoVs (12S, 26D, 27N, 104E, 218T, 335H, 346Q, and 350N). Two amino acid sites (38P 167

and 268Q) are shared by Pangolin-CoV, RaTG13 and SARS-CoVs, which are mutated to 168

38S and 268A in SARS-CoV-2. Only one amino acid residue shared by Pangolin-CoV and 169

other SARS-CoVs (129E) is consistently different in both SARS-CoV-2 and RaTG13 170

(129D). The observed amino acid changes in the N protein would be useful for developing 171

antigens with improved sensitivity for SARS-CoV-2 serological detection. 172

Page 8: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

Conclusion 173

Based on published metagenomic data, this study provides the first report on a 174

potential closely related kin (Pangolin-CoV) of SARS-CoV-2, which was discovered from 175

dead Malayan pangolins after extensive rescue efforts. Aside from RaTG13, the Pangolin-176

CoV is the CoV most closely related to SARS-CoV-2. Due to unavailability of the original 177

sample, we did not perform further experiments to confirm our findings, including PCR 178

validation, serological detection, or even isolation of the virus particles. Our discovered 179

Pangolin-CoV genome showed 91.02% nucleotide identity with the SARS-CoV-2 genome. 180

However, whether pangolin species are good candidates for SARS-CoV-2 origin is still 181

under debate. Considering the wide spread of SARSr-CoVs in natural reservoirs, such as 182

bats, camels, and pangolins, our findings would be meaningful for finding novel 183

intermediate SARS-CoV-2 hosts to block interspecies transmission. 184

Acknowledgements 185

This study was supported by the Second Tibetan Plateau Scientific Expedition and 186

Research (STEP) program (no. 2019QZKK0503), the National Key Research and 187

Development Program of China (no. 2018YFC2000500), the Key Research Program of the 188

Chinese Academy of Sciences (no. KFZD-SW-219), and the Chinese National Natural 189

Science Foundation (no. 31970571). 190

Author Contributions 191

Z.Z. performed project planning, coordination, execution, and facilitation. T.Z. and 192

W.Q. performed the metagenomic analysis. T.Z. carried out assemblies, gene prediction, 193

and annotation. W.Q. processed data collection and phylogenetic analysis. Z.Z., T.Z., and 194

W.Q. prepared the manuscript. 195

Declaration of Interests 196

The authors declare no competing interests. 197

Figure Legends 198

Figure 1 Genome-related analysis. (A) Sequence depth of reads remapped to Pangolin-199

CoV. (B) Similarity plot based on the full-length genome sequence of Pangolin-CoV. Full-200

length genome sequences of SARS-CoV-2 (Beta-CoV/Wuhan-Hu-1), BatCoV RaTG13, 201

Page 9: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

bat SARSr-CoV 21, bat SARSr-CoV45, bat SARSr-CoV WIV1, and SARS-CoV BJ01 202

were used as reference sequences. (C) Comparison of common genome organization 203

similarity among SARS-CoV-2, Pangolin-CoV and BatCoV RaTG13. Related to Table 204

S2. 205

Figure 2 Phylogenetic relationship of CoVs based on the whole genome and RdRp 206

gene nucleotide sequences. Red text denotes the Malayan Pangolin-CoV. Pink text 207

denotes SARS-CoV-2. Green text denotes a bat CoV with 96% similarity at the genome 208

level to SARS-CoV-2. Blue text denotes the reference CoVs used in Figure 1B. Detailed 209

information can be found in the STAR Methods. Related to Figures S1 to S3. 210

Figure 3 Amino acid sequence alignment of the S1 protein and its phylogeny. The 211

receptor-binding motif of SARS-CoV and the homologous region of other CoVs are 212

indicated by the grey box. The key amino acid residues involved in the interaction with 213

human ACE2 are marked with the orange box. Bat SARS-CoV-like CoVs had been 214

reported to not use ACE2 and have amino acid deletions at two motifs marked by the 215

yellow box. Detailed information can be found in the STAR Methods. 216

Figure 4 CoV S protein S1/S2 cleavage sites. Four amino acid insertions (SPRRs) unique 217

to SARS-CoV-2 are marked in yellow. Conserved S1/S2 cleavage sites are marked in 218

green. 219

STAR Methods 220

KEY RESOURCES TABLE 221

LEAD CONTACT AND MATERIALS AVAILABILITY 222

Requests for further information and data resources should be directed to and will be 223

fulfilled by the Lead Contact, Zhigang Zhang ([email protected]). This study did 224

not generate new unique reagents. 225

METHOD DETAILS 226

Data collection and preprocessing 227

We downloaded raw data for the lung08 and lung07 samples published in Liu’s 228

study[1] from the NCBI SRA under BioProject PRJNA573298. Raw reads were first 229

adaptor and quality trimmed using the Trimmomatic program (version 0.39)[20]. To 230

Page 10: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

remove host contamination, Bowtie2 (version 2.3.4.3)[21]was used to map clean reads to 231

the host reference genome of Manis javanica (NCBI Project ID: PRJNA256023). Only 232

unmapped reads were mapped to the SARS-CoV-2 reference genome (GenBank accession 233

MN908947) for identifying virus reads. 234

Genome assembly and gene prediction 235

Virus-mapped reads were assembled de novo using MEGAHIT (version 1.1.3)[22]. 236

Read remapping to assembled contigs was performed by using Bowtie2[21]. Mapping 237

coverage and depth were determined using Samtools (version 1.9)[23]. Contigs were 238

taxonomically annotated using BLAST 2.9.0+[24] against 2845 CoV reference genomes 239

(Table S1). The BatCoV RaTG13 genome was downloaded from the NGDC database 240

(https://bigd.big.ac.cn/) (accession no. GWHABKP00000000)[3]. The SARS-CoV-2 241

reference genome was downloaded from NCBI (accession no. MN908947)[6]. Other CoV 242

genomes were downloaded from the ViPR database 243

(https://www.viprbrc.org/brc/home.spg?decorator=corona) on 6 February 2020. We 244

further used a reference-guided strategy to construct a draft genome based on contigs 245

taxonomically annotated to SARS-CoV-2s, SARS-CoV, and bat SARS-CoV-like CoV. 246

Each contig was aligned against the SARS-CoV-2 reference genome with MUSCLE 247

software (version 3.8.31)[25]. Aligned contigs were merged into consensus scaffolds with 248

BioEdit version 7.2.5 (http://www.mybiosoftware.com/bioedit-7-0-9-biological-sequence-249

alignment-editor.html) following manual quality checking. Small fragments less than 25 250

bp in length were discarded if these fragments were not covered by any large fragments. 251

The potential ORFs of the final draft genome obtained were annotated by alignment to the 252

SARS-CoV-2 reference genome (accession no. MN908947). SimPlot 3.5.1[9] was used to 253

analyse whole genome nucleotide identity. 254

Phylogeny 255

Sequence alignment was carried out using MUSCLE software[25]. Alignment 256

accuracy was checked manually base by base. Gblocks[26] was used to process the gap in 257

the aligned sequence. Using MegaX (version 10.1.7)[27], we inferred all maximum 258

likelihood (ML) phylogenetic trees. 259

Page 11: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

QUANTIFICATION AND STATISTICAL ANALYSIS 260

Using MegaX software[27], we constructed all maximum likelihood (ML) 261

phylogenetic trees under the best-fit DNA/amino acid substitution model with 1000 262

bootstrap replications. Phylogenetic analyses were performed using the nucleotide 263

sequences of various CoV gene datasets: the whole genome, ORF1a, ORF1b, and the 264

membrane (M), S and RdRp genes. The best model of M was GTR+G, and the best for all 265

the others was GTR+G+I. Two additional protein-based trees were constructed under 266

WAG+G (S1 subunit of the S protein) and JTT+G (N protein). Branches with bootstrap 267

values< 70% were hidden in all phylogenetic trees. 268

DATA AND CODE AVAILABILITY 269

The dataset used in this study is provided as supplementary material (Tables S1 and 270

S2). This study did not generate code. 271

Supplemental Information 272

Table S1 Contigs taxonomically annotated by using BLASTx against 2845 CoV 273

reference genomes. Related to STAR Methods. 274

Table S2 Comparing nucleotide and amino acid sequence identity differences of ten 275

genes among Pangolin-CoV, SARS-CoV-2, and RaTG13. Related to Figure 1C. 276

References 277

1. Liu, P., Chen, W., and Chen, J.-P. (2019). Viral metagenomics revealed sendai virus 278

and coronavirus infection of Malayan Pangolins (Manis javanica). Viruses 11, 979. 279

2. Li, W., Shi, Z., Yu, M., Ren, W., Smith, C., Epstein, J.H., Wang, H., Crameri, G., Hu, 280

Z., Zhang, H., et al. (2005). Bats are natural reservoirs of SARS-like coronaviruses. 281

Science 310, 676-679. 282

3. Zhou, P., Yang, X.-L., Wang, X.-G., Hu, B., Zhang, L., Zhang, W., Si, H.-R., Zhu, Y., 283

Li, B., Huang, C.-L., et al. (2020). A pneumonia outbreak associated with a new 284

coronavirus of probable bat origin. Nature. doi: https://doi.org/10.1038/s41586-020-285

2012-7. 286

4. Cui, J., Li, F., and Shi, Z.-L. (2019). Origin and evolution of pathogenic coronaviruses. 287

Nat. Rev. Microbiol. 17, 181-192. 288

Page 12: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

5. Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., Zhang, L., Fan, G., Xu, J., Gu, 289

X., et al. (2020). Clinical features of patients infected with 2019 novel coronavirus in 290

Wuhan, China. Lancet 395, 497-506. 291

6. Wu, F., Zhao, S., Yu, B., Chen, Y.-M., Wang, W., Song, Z.-G., Hu, Y., Tao, Z.-W., Tian, 292

J.-H., Pei, Y.-Y., et al. (2020). A new coronavirus associated with human respiratory 293

disease in China. Nature. doi: https://doi.org/10.1038/s41586-020-2008-3. 294

7. Durbin, R.M., Altshuler, D., Durbin, R.M., Abecasis, G.R., Bentley, D.R., Chakravarti, 295

A., Clark, A.G., Collins, F.S., De La Vega, F.M., Donnelly, P., et al. (2010). A map of 296

human genome variation from population-scale sequencing. Nature 467, 1061-1073. 297

8. Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.L., Tyson, G.W., and 298

Nielsen, P.H. (2013). Genome sequences of rare, uncultured bacteria obtained by 299

differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533-538. 300

9. Lole, K.S., Bollinger, R.C., Paranjape, R.S., Gadkari, D., Kulkarni, S.S., Novak, N.G., 301

Ingersoll, R., Sheppard, H.W., and Ray, S.C. (1999). Full-length human 302

immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in 303

India, with evidence of intersubtype recombination. J. Virol. 73, 152-160. 304

10. Xiao, K., Zhai, J., Feng, Y., Zhou, N., Zhang, X., Zou, J.-J., Li, N., Guo, Y., Li, X., 305

Shen, X., et al. (2020). Isolation and characterization of 2019-nCoV-like coronavirus 306

from Malayan pangolins. bioRxiv, 2020.2002.2017.951335. 307

11. Lam, T.T.-Y., Shum, M.H.-H., Zhu, H.-C., Tong, Y.-G., Ni, X.-B., Liao, Y.-S., Wei, W., 308

Cheung, W.Y.-M., Li, W.-J., Li, L.-F., et al. (2020). Identification of 2019-nCoV 309

related coronaviruses in Malayan pangolins in southern China. bioRxiv, 310

2020.2002.2013.945485. 311

12. Tortorici, M.A., and Veesler, D. (2019). Structural insights into coronavirus entry. In 312

Advances in Virus Research, F.A. Rey, ed. (Academic Press), pp. 93-116. 313

13. Ge, X.-Y., Li, J.-L., Yang, X.-L., Chmura, A.A., Zhu, G., Epstein, J.H., Mazet, J.K., 314

Hu, B., Zhang, W., Peng, C., et al. (2013). Isolation and characterization of a bat 315

SARS-like coronavirus that uses the ACE2 receptor. Nature 503, 535-538. 316

14. Wong, S.K., Li, W., Moore, M.J., Choe, H., and Farzan, M. (2004). A 193-amino acid 317

Page 13: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

fragment of the SARS coronavirus S protein efficiently binds angiotensin-converting 318

enzyme 2. J. Biol. Chem. 279, 3197-3201. 319

15. Yan, R., Zhang, Y., Li, Y., Xia, L., Guo, Y., and Zhou, Q. (2020). Structural basis for 320

the recognition of the SARS-CoV-2 by full-length human ACE2. Science, eabb2762. 321

16. Millet, J.K., and Whittaker, G.R. (2014). Host cell entry of Middle East respiratory 322

syndrome coronavirus after two-step, furin-mediated activation of the spike protein. 323

Proc. Natl. Acad. Sci. USA. 111, 15214-15219. 324

17. Coutard, B., Valle, C., de Lamballerie, X., Canard, B., Seidah, N.G., and Decroly, E. 325

(2020). The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-326

like cleavage site absent in CoV of the same clade. Antiviral Res. 176, 104742. 327

18. Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T., Erichsen, S., 328

Schiergens, T.S., Herrler, G., Wu, N.-H., Nitsche, A., et al. (2020). SARS-CoV-2 cell 329

entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease 330

inhibitor. Cell. doi:10.1016/j.cell.2020.02.052. 331

19. Millet, J.K., and Whittaker, G.R. (2015). Host cell proteases: Critical determinants of 332

coronavirus tropism and pathogenesis. Virus Res. 202, 120-134. 333

20. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for 334

Illumina sequence data. Bioinformatics 30, 2114-2120. 335

21. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. 336

Nat. Meth. 9, 357-359. 337

22. Li, D., Liu, C.-M., Luo, R., Sadakane, K., and Lam, T.-W. (2015). MEGAHIT: an ultra-338

fast single-node solution for large and complex metagenomics assembly via succinct 339

de Bruijn graph. Bioinformatics 31, 1674-1676. 340

23. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., 341

Abecasis, G., Durbin, R., and Subgroup, G.P.D.P. (2009). The Sequence 342

Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079. 343

24. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and 344

Madden, T.L. (2009). BLAST+: architecture and applications. BMC Bioinformatics 345

10, 421. 346

Page 14: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

25. Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and 347

high throughput. Nucleic Acids Res. 32, 1792-1797. 348

26. Talavera, G., and Castresana, J. (2007). Improvement of phylogenies after removing 349

divergent and ambiguously aligned blocks from protein sequence alignments. Syst. 350

Biol. 56, 564-577. 351

27. Kumar, S., Stecher, G., Li, M., Knyaz, C., and Tamura, K. (2018). MEGA X: molecular 352

evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547-353

1549. 354

355

Page 15: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER

Deposited Data

Raw and analyzed data [1] PRJNA573298

Manis javanica reference genome NCBI sequence read archive (SRA)

PRJNA256023

SARS-CoV-2 reference genome GenBank MN908947

BatCov-RaTG13 genome NGDC (https://bigd.big.ac.cn/)

GWHABKP00000000

2845 Coronavirus reference genomes set ViPR https://www.viprbrc.org/brc/home.spg?decorator=corona

Software and Algorithms

Simplot [9] https://www.mybiosoftware.com/simplot-3-5-1-sequence-similarity-plotting.html

Trimmomatic [20] http://www.usadellab.org/cms/index.php?page=trimmomatic

Bowtie2 [21] http://bowtie-bio.sourceforge.net/bowtie2

MEGAHIT [22] https://github.com/voutcn/megahit

BLAST+ [24] ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST

SAMtools [23] http://samtools.sourceforge.net/

MUSCLE [25] http://drive5.com/muscle/

BioEdit San Diego Supercomputer Center

http://www.mybiosoftware.com/bioedit-7-0-9-biological-sequence-alignment-editor.html

Gblocks [26] http://molevol.cmima.csic.es/castresana/Gblocks.html

MEGA X [27] https://www.megasoftware.net/

Key Resource Table

Page 16: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

A

B

C

0 5000 10000 15000 20000

010

2030

40

Genome nucleotide position

Sequ

ensi

ng d

epth

Genome Coverage: 99.99% (19585nt/19587nt)Mean depth: 7.71

Genome nucleotide position300002500020000150001000050000

Perc

enta

ge n

ucle

otid

e id

entit

y

100

90

80

70

60

50

40

Beta-CoV/Wuhan-Hu-1 (91.02%)

SARS-CoV BJ01 (73.62)

BatCoV RaTG13 (90.55%)

Bat SARSr-CoV WIV1(73.94%)

Bat SARSr-CoV ZXC21(85.65%)Bat SARSr-CoV ZC45 (85.01%)

Query: Pangolin CoV

0 10000 20000 30000

EM N

101a 1b 3a

67a8S

E M N 10Mean

(DNA/AA %)

Mean (DNA/AA %)

1a 1b 3a 6 7a 8S

E M N 101a 1b 3a 6 7a 8S90.3/97.0 90.4/72.8 87.3/95.4 95.0/96.8 98.3/97.4 93.0/98.6 93.8/96.3 90.4/93.6 91.5/96.6 95.4/96.4 98.3/94.6 92.8/93.5

89.9/96.8 90.7/73.4 88.3/97.5 94.5/96.8 98.3/97.4 93.0/98.6 95.1/96.3 92.0/96.9 91.7/94.8 94.9/96.4 99.1/97.3 93.2/94.1

19,587 bpPangolin CoVPangolin

Human

Bat 29,855 bpBatCoV RaTG13

29,903 bpSARS-CoV-2

Figure 1

Page 17: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

RdRp

Human CoV 229E93

SARS-CoV SZ3

SARS-CoV BJ01

Bat SARSr-CoV YNLF31C

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

Bat SARSr-CoV Rp3

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV GX2013

Bat SARSr-CoV SC2018

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HuB2013

Bat SARSr-CoV ZXC21

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV ZC45

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV BM48-31 Pangolin-CoV

Bat CoV RaTG13

Beta-CoV/Wuhan-Hu-1

Beta-CoV/Wuhan/IPBCAMS-WH-01

Beta-CoV/Wuhan/IPBCAMS-WH-02

Beta-CoV/Wuhan/IPBCAMS-WH-03

Beta-CoV/Wuhan/IPBCAMS-WH-04

Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat Hp-BetaCoV Zhejiang2013

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Rousettus bat CoV HKU9

Human CoV HKU1

Mus musculus MHV-1

Human CoV OC43

TGEV Mink-CoV

PEDV

Scotophilus bat CoV 512

Rhinolophus bat CoV HKU2

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

Human CoV NL63

100100

10081

100

100

100100

85

100

100

92

100

100

100

100

100100

100

99

10098

100

Whole Genome

Bet

a-C

oVA

lpha

-CoV

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

SARS-CoV SZ3

SARS-CoV BJ01 Bat SARSr-CoV YNLF31C

Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV SC2018

Bat SARSr-CoV HuB2013

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV ZXC21 Bat SARSr-CoV ZC45

Pangolin-CoV Bat CoV RaTG13 Beta-CoV/Wuhan/IPBCAMS-WH-03 Beta-CoV/Wuhan-Hu-1 Beta-CoV/Wuhan/IPBCAMS-WH-02 Beta-CoV/Wuhan/IPBCAMS-WH-04 Beta-CoV/Wuhan/IPBCAMS-WH-01 Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat SARSr-CoV BM48-31

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

Human CoV HKU1

100

100

100

100

100

81

100

100

100

100

99

100

100

100

100

100100

100

100

100

100

100100

100

100100

100

100

100

100100

100

100

100

0.200.50

SARS-CoV-2group SARS-CoV-2

group

Figure 2

Page 18: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

340 350 360 370 380 390 400 410 420 430. . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . .

Beta-CoV/Wuhan-Hu-1 P N I T N L C P F G E V F N A T R F A S V Y A W N R K R I S N C V A D Y S V L Y N S A S F S T F K C Y G V S P T K L N D L C F T N V Y A D S F V I R G D E V R Q I A P G Q T G K I A D Y N Y K L P D D F 432Beta-CoV/Wuhan/IPBCAMS-WH-02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Pangolin-CoV . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V . . . . . . . . . . . . . . R . . G . . . . . . . . . 432Bat_CoV_RaTG13 . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . 432SARS-CoV_SZ3 . . . . . . . . . . . . . . . . K . P . . . . . E . . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432SARS-CoV_BJ01 . . . . . . . . . . . . . . . . K . P . . . . . E . . K . . . . . . . . . . . . . . T F . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_WIV1 . . . . . . . . . . . . . . . . T . P . . . . . E . . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_WIV16 . . . . . . . . . . . . . . . . T . P . . . . . E . . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_ZXC21 . . . . . V . . . H K . . . . . . . P . . . . . E . T K . . D . I . . . T . F . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_ZC45 . . . . . V . . . H K . . . . . . . P . . . . . E . T K . . D . I . . . T . F . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_YNLF31C . . . . . . . . . D K . . . . . . . P . . . . . E . T K . . D . . . . . T . F . . . T . . . . . N . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_SX2013 . . . . . . . . . D K . . . . . . . P . . . . . E . T K . . D . . . . . T . F . . . T . . . . . N . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_Rf1 . . . . . . . . . D K . . . . . . . P . . . . . E . T K . . D . . . . . T . F . . . T . . . . . N . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_Rp3 . . . . . R . . . D K . . . . . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_GX2013 . . . . . R . . . D K . . . . . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_HKU3-1 . . . . . R . . . D K . . . . . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_Longquan-140 . . . . . R . . . D K . . . V . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_SC2018 . . . . . R . . . D K . . . . S . . P . . . . . E . I K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_HuB2013 . . . . . R . . . D R . . . . S . . P . . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432

440 450 460 470 480 490 500 510 520. . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . .

Beta-CoV/Wuhan-Hu-1 T G C V I A W N S N N L D S K V G G N Y N Y L Y R L F R K S N L K P F E R D I S T E I Y Q A G T P C N G V E G F N C Y F P L Q S Y G F Q P T N G V G Y Q P Y R V V V L S F E L L H A 522

Beta-CoV/Wuhan/IPBCAMS-WH-02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Pangolin-CoV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H . . . . . . . . . . . . . . . . . . . . N . 522Bat_CoV_RaTG13 . . . . . . . . . K H I . A . E . . . F . . . . . . . . . A . . . . . . . . . . . . . . . . . K . . . . Q T . L . . . Y . . Y R . . . Y . . D . . . H . . . . . . . . . . . . . N . 522SARS-CoV_SZ3 M . . . L . . . T R . I . A T S T . . . . . K . . Y L . H G K . R . . . . . . . N V P F S P D K . . T P - P - L . . . W . . K D . . . Y T . S . I . . . . . . . . . . . . . . . N . 520SARS-CoV_BJ01 M . . . L . . . T R . I . A T S T . . . . . K . . Y L . H G K . R . . . . . . . N V P F S P D K . . T P - P - L . . . W . . N D . . . Y T . T . I . . . . . . . . . . . . . . . N . 520Bat_SARSr-CoV_WIV1 . . . . L . . . T R . I . A T Q T . . . . . K . . S L . H G K . R . . . . . . . N V P F S P D K . . T P - P - . . . . W . . N D . . . Y I . . . I . . . . . . . . . . . . . . . N . 520Bat_SARSr-CoV_WIV16 . . . . L . . . T R . I . A T Q T . . . . . K . . S L . H G K . R . . . . . . . N V P F S P D K . . T P - P - . . . . W . . N D . . . Y I . . . I . . . . . . . . . . . . . . . N . 520Bat_SARSr-CoV_ZXC21 . . . . . . . . T A K Q . T G H - - - - - . F . . S H . S T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N . N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_ZC45 . . . . . . . . T A K Q . V G N - - - - - . F . . S H . S T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N . N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_YNLF31C . . . . . . . . T A K Y . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G A R T . S - . D . N Q N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_SX2013 . . . . . . . . T A K Q . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N Q Y V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_Rf1 . . . . . . . . T A K Q . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N Q N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_Rp3 . . . . . . . . T A K Q . Q G Q - - - - - . Y . . S H . . T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . Y . S V P . A . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_GX2013 . . . . . . . . T A K Q . T G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_HKU3-1 . . . . . . . . T A K H . T G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_Longquan-140 . . . . . . . . T A K Q . I G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_SC2018 . . . . . . . . T A K Q . T G S - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_HuB2013 . . . . . . . . T A K Q . T G Y - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503

99

91

99

84

72

98

96

100

9393

99

Figure 3

Page 19: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

S1/S2 cleavage site

Beta-CoV/Wuhan-Hu-1Beta-CoV/Wuhan/IPBCAMS-WH-02Beta-CoV/Wuhan/IPBCAMS-WH-03Beta-CoV/Wuhan/IPBCAMS-WH-05Beta-CoV/Wuhan/IPBCAMS-WH-01Beta-CoV/Wuhan/IPBCAMS-WH-04

Bat_CoV_RaTG13Pangolin-CoV

Bat_SARSr-CoV_ZXC21Bat_SARSr-CoV_ZC45

SARS-CoV_SZ3SARS-CoV_BJ01

Bat_SARSr-CoV_WIV1Bat_SARSr-CoV_SHC014Bat_SARSr-CoV_Rs4231Bat_SARSr-CoV_WIV16

Bat_SARSr-CoV_YNLF31CBat_SARSr-CoV_SX2013

Bat_SARSr-CoV_Rf1Bat_SARSr-CoV_Rp3

Bat_SARSr-CoV_GX2013Bat_SARSr-CoV_HKU3-1

Bat_SARSr-CoV_Longquan-140Bat_SARSr-CoV_SC2018

Bat_SARSr-CoV_HuB2013

Y Q T Q T N S P R R A R S V A S Q S I I A Y T M S L G. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . - - - - S . . . . . . . . . . . . . . . .. . . . . . - - - - S . . . S . . A . . . . . . . . .. H . A S I - - - - L . . T G Q K A . V . . . . . . .. H . A S I - - - - L . . T S Q K A . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . V S L - - - - L . . T S Q K . . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .. H . A S L - - - - L . . T G Q K . . V . . . . . . .. H . A S H - - - - L . . T G Q K . . V . . . . . . .. H . A S T - - - - L . . . G Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .. H . A S T - - - - L . . T G Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .

680 690 700. . | . . . . | . . . . | . . . . | . . . . | . . . .

Figure 4

Page 20: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

SARS-CoV-2group

S M

0.50

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

Bat SARSr-CoV WIV16

Bat SARSr-CoV YNLF31C

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

SARS-CoV SZ3

SARS-CoV BJ01

Bat SARSr-CoV SC2018

Bat SARSr-CoV Rs4231

Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV HuB2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV BM48-31

Bat SARSr-CoV ZXC21

Bat SARSr-CoV ZC45

Pangolin-CoV

Bat CoV RaTG13

Beta-CoV/Wuhan-Hu-1

Beta-CoV/Wuhan/IPBCAMS-WH-01

Beta-CoV/Wuhan/IPBCAMS-WH-02

Beta-CoV/Wuhan/IPBCAMS-WH-03

Beta-CoV/Wuhan/IPBCAMS-WH-04

Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Human CoV HKU1

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

99

10089

73

9970

100675564

91

77

10051

99

68

91

100

99

95

62100

69

100

99

86

98

9866

91

100

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

SARS-CoV SZ3

SARS-CoV BJ01

Bat SARSr-CoV YNLF31C

Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV SC2018

Bat SARSr-CoV HuB2013

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV ZXC21

Bat SARSr-CoV ZC45

Pangolin-CoV

Bat CoV RaTG13

Beta-CoV/Wuhan/IPBCAMS-WH-03

Beta-CoV/Wuhan-Hu-1

Beta-CoV/Wuhan/IPBCAMS-WH-02

Beta-CoV/Wuhan/IPBCAMS-WH-04

Beta-CoV/Wuhan/IPBCAMS-WH-01

Beta-CoV/Wuhan/IPBCAMS-WH-05 Bat SARSr-CoV BM48-31

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

Human CoV HKU1

100

100

100

100

100

81

100

100

100

100

99

100

100

100

100

100

100

100

100

100

100

100100

100

100100

100

100

100

100100

100

100

100

0.50

SARS-CoV-2group

Figure S1. Phylogenetic relationship of CoVs based on the ORF1a gene (A) and ORF1b gene (B) nucleotide sequences. Related to Figure 2.

Supplemental Data

Page 21: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

orf1a

0.20

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV WIV1 Bat SARSr-CoV SHC014

SARS-CoV SZ3

SARS-CoV BJ01 Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV YNLF31C

Bat SARSr-CoV SC2018

Bat SARSr-CoV HuB2013

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV BM48-31

Bat SARSr-CoV ZXC21 Bat SARSr-CoV ZC45

Pangolin-CoV Bat CoV RaTG13 Beta-CoV/Wuhan/IPBCAMS-WH-01 Beta-CoV/Wuhan-Hu-1 Beta-CoV/Wuhan/IPBCAMS-WH-02 Beta-CoV/Wuhan/IPBCAMS-WH-03 Beta-CoV/Wuhan/IPBCAMS-WH-04 Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Human CoV HKU1

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

98

90

100

100100

100

100

90

93

100

86

97

100

100

100

97

100

94

83

98100

100

100

100

100

10088

100

83

orf1b

0.20

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV YNLF31C

SARS-CoV SZ3

SARS-CoV BJ01

Bat SARSr-CoV SC2018

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HuB2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV ZXC21

Bat SARSr-CoV ZC45

Bat SARSr-CoV BM48-31

Pangolin-CoV

Bat CoV RaTG13

Beta-CoV/Wuhan-Hu-1

Beta-CoV/Wuhan/IPBCAMS-WH-01

Beta-CoV/Wuhan/IPBCAMS-WH-02

Beta-CoV/Wuhan/IPBCAMS-WH-03

Beta-CoV/Wuhan/IPBCAMS-WH-04

Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

MERS-CoV

Human CoV HKU1

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

100

100

87

97

95

100

100

100

10081

100

100

85

100

99

100

100

100

100

100

72

95100

86

100100

100

100

100

99

75

100

100

SARS-CoV-2group

SARS-CoV-2group

Figure S2. Phylogenetic relationship of CoVs based on the S gene (A) and M gene (B) nucleotide sequences. Related to Figure 2.

Page 22: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of

Beta-CoV/Wuhan-Hu-1Beta-CoV/Wuhan/IPBCAMS-WH-02Beta-CoV/Wuhan/IPBCAMS-WH-03Beta-CoV/Wuhan/IPBCAMS-WH-05Beta-CoV/Wuhan/IPBCAMS-WH-01Beta-CoV/Wuhan/IPBCAMS-WH-04Bat_CoV_RaTG13Pangolin-CoVBat_SARSr-CoV_ZXC21Bat_SARSr-CoV_ZC45SARS-CoV_SZ3SARS-CoV_BJ01Bat_SARSr-CoV_WIV1Bat_SARSr-CoV_SHC014Bat_SARSr-CoV_WIV16Bat_SARSr-CoV_YNLF31CBat_SARSr-CoV_Rs4231Bat_SARSr-CoV_Longquan-140Bat_SARSr-CoV_HKU3-1Bat_SARSr-CoV_GX2013Bat_SARSr-CoV_Rf1Bat_SARSr-CoV_SX2013Bat_SARSr-CoV_HuB2013Bat_SARSr-CoV_Rp3Bat_SARSr-CoV_SC2018

Beta-CoV/Wuhan-Hu-1Beta-CoV/Wuhan/IPBCAMS-WH-02Beta-CoV/Wuhan/IPBCAMS-WH-03Beta-CoV/Wuhan/IPBCAMS-WH-05Beta-CoV/Wuhan/IPBCAMS-WH-01Beta-CoV/Wuhan/IPBCAMS-WH-04Bat_CoV_RaTG13Pangolin-CoVBat_SARSr-CoV_ZXC21Bat_SARSr-CoV_ZC45SARS-CoV_SZ3SARS-CoV_BJ01Bat_SARSr-CoV_WIV1Bat_SARSr-CoV_SHC014Bat_SARSr-CoV_WIV16Bat_SARSr-CoV_YNLF31CBat_SARSr-CoV_Rs4231Bat_SARSr-CoV_Longquan-140Bat_SARSr-CoV_HKU3-1Bat_SARSr-CoV_GX2013Bat_SARSr-CoV_Rf1Bat_SARSr-CoV_SX2013Bat_SARSr-CoV_HuB2013Bat_SARSr-CoV_Rp3Bat_SARSr-CoV_SC2018

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|MSDNGPQ-NQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAAIVLQLPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNSSRNSTPGSSRGTSPARMAGNGGDAAL.......-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-.............................P.................................................................................................................................................................................S...........-................A............P...........................R..............................................................E......N.............................................................T.............................-..SS............SDNS.....N...P.........................N.T..............K......................E........................E........................................................................................T.........-...S............SDNSK....N...P.........................N.T..............K......................E........................E........................................................................................T.........S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........P...S.........T.P.DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........P...S.........T..IDN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET........H-...S.S.......T...DN....G.N...P.........................E.R..Q..........G..............V........E................S.......E.......................N....T..................................GN................I.SG..ET.........P...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E.......................N....T..................................GN...........N......SG..ET.........-S..S.........T..ADN..D.G.....P.........................E.R.............GK........K....V........E................S.......E..V....................N.......................................GN...........N....L.SG..ET.........-S..S.........A..NDN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.............................S.........GN...........S....L.SG..ET.........S...S..S......T...DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V...I................N.......................................GN...........N....L.SG..ET.........-..CS.............DN..D.G.....P.........................G....Q..........GR.............V........E................S.......E..V....................N.........................N.............GN..T........N....V.SG..ET.........-...S.............DN..D.G...V.P.........................G....Q..........GR.............V........E................S.......E..V....................N.........................N.............GN..T........N....V.SG..ET.........-...S.............DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET.........-...S.........T...DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET.........-...S.........T...DN..D.G.....P.........................E...............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET..

230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|...ALLLLDRLNQLESKMSGKGQQQQGQTVTKKSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQKKQQTVTLLPAADLDDFSKQLQQSMS--SADSTQA*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.......................S.......................Q.................................................................................................................................................--.......*...............................................Q................................Q.................................................................------------------------------..-N.............--.......*............N.I................................Q..................................................................H..........Q...N.............................L......................E..........G--T.....*............N.V................................Q..................................................................H..........Q...N.............................L......................E..........G--T.....*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V.....................R..........Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q............D.........D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V.......P........................Q..................................................................H..........Q...N..........................T..A.P....P...P.........M....R...N...GA.......*..............V.......P........................Q............................I.....................................H..........Q...N..........................T..A.P........P.........M....R...H...GA.......*..............V................................Q......................D.T.........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................TS..............Q............D........................................S..........I.H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V.................S..............Q............D.....................................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................S..................................................................H..........Q...N..........................T..A.P....-...P.........M....R...N...GA.......*..............V..RS............................Q..................................................................H..........Q...N............I.............T..A.P........P.........M....R...N...GA.......*..............V................................Q..................................................................H..........Q...N.M..........A.............T..A.P........P.........M....R...N...GA.......*

100

100

90

52

65

100100

96

93

Figure S3. Amino acid sequence alignment of the N protein and its phylogeny. Related to Figure 2. Highly-conserved amino acid

residues in the N-protein marked by colours have diagnostic potential.