Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is...
Transcript of Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is...
![Page 1: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/1.jpg)
Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a PDF file of an accepted peer-reviewed article but is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 The Author(s).
![Page 2: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/2.jpg)
Title: Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak 1
Authors: Tao Zhang1†, Qunfu Wu1†, Zhigang Zhang1,2* 2
Affiliations: 3
1State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, 4
School of Life Sciences, Yunnan University, No.2 North Cuihu Road, Kunming, Yunnan, 5
650091, China 6
2Lead Contact 7
†These authors contributed equally to this work 8
*Correspondence: [email protected] 9
Summary: 10
An outbreak of coronavirus disease 2019 (COVID-19) caused by the 2019 novel 11
coronavirus (SARS-CoV-2) began in the city of Wuhan in China and has widely spread 12
worldwide. Currently, it is vital to explore potential intermediate hosts of SARS-CoV-2 to 13
control COVID-19 spread. Therefore, we reinvestigated published data from pangolin lung 14
samples from which SARS-CoV-like CoVs were detected by Liu et al.[1]. We found 15
genomic and evolutionary evidence of the occurrence of a SARS-CoV-2-like CoV (named 16
Pangolin-CoV) in dead Malayan pangolins. Pangolin-CoV is 91.02% and 90.55% identical 17
to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole genome level. Aside 18
from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. The S1 19
protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. 20
Five key amino acid residues involved in the interaction with human ACE2 are completely 21
consistent between Pangolin-CoV and SARS-CoV-2, but four amino acid mutations are 22
present in RaTG13. Both Pangolin-CoV and RaTG13 lost the putative furin recognition 23
sequence motif at S1/S2 cleavage site that can be observed in the SARS-CoV-2. 24
Conclusively, this study suggests that pangolin species are a natural reservoir of SARS-25
CoV-2-like CoVs. 26
Keywords: Pangolin; SARS-CoV-2; COVID-19; Origin. 27
Results and Discussion 28
Manuscript
![Page 3: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/3.jpg)
Similar to the case for SARS-CoV and MERS-CoV[2], the bat is still a probable 29
species of origin for SARS-CoV-2 because SARS-CoV-2 shares 96% whole-genome 30
identity with a bat coronavirus (CoV), BatCoV RaTG13, from Rhinolophus affinis from 31
Yunnan Province[3]. However, SARS-CoV and MERS-CoV usually pass into intermediate 32
hosts, such as civets or camels, before leaping to humans[4]. This fact indicates that SARS-33
CoV-2 was probably transmitted to humans by other animals. Considering that the earliest 34
COVID-19 patient reported no exposure at the seafood market[5], it is vital to find the 35
intermediate SARS-CoV-2 host to block interspecies transmission. On 24 October 2019, 36
Liu and his colleagues from the Guangdong Wildlife Rescue Center of China[1] first 37
detected the existence of a SARS-CoV-like CoV from lung samples of two dead Malayan 38
pangolins with a frothy liquid in their lungs and pulmonary fibrosis, and this fact was 39
discovered close to when the COVID-19 outbreak occurred. Using their published results, 40
we showed that all virus contigs assembled from 2 lung samples (lung07, lung08) exhibited 41
low identities, ranging from 80.24% to 88.93%, with known SARSr-CoVs. Hence, we 42
conjectured that the dead Malayan pangolins may carry a new CoV closely related to 43
SARS-CoV-2. 44
Assessing the probability of SARS-CoV-2-like CoV presence in pangolin species 45
To confirm our assumption, we downloaded raw RNA-seq data (sequence read archive 46
(SRA) accession number PRJNA573298) for those two lung samples from the SRA and 47
conducted consistent quality control and contaminant removal, as described by Liu’s 48
study[1]. We found 1882 clean reads from the lung08 sample that mapped to the SARS-49
CoV-2 reference genome (GenBank Accession MN908947)[6] and covered 76.02% of the 50
SARS-CoV-2 genome. We performed de novo assembly of those reads and obtained 36 51
contigs with lengths ranging from 287 bp to 2187 bp, with a mean length of 700 bp. Via 52
Blast analysis against proteins from 2845 CoV reference genomes, including RaTG13, 53
SARS-CoV-2s and other known CoVs, we found that 22 contigs were best matched to 54
SARS-CoV-2s (70.6%-100% amino acid identity; average: 95.41%) and that 12 contigs 55
matched to bat SARS-CoV-like CoV (92.7%-100% amino acid identity; average: 97.48%) 56
![Page 4: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/4.jpg)
(Table S1). These results indicate that the Malayan pangolin might carry a novel CoV (here 57
named Pangolin-CoV) that is similar to SARS-CoV-2. 58
Draft genome of Pangolin-CoV and its genomic characteristics 59
Using a reference-guided scaffolding approach, we created a Pangolin-CoV draft 60
genome (19,587 bp) based on the above 34 contigs. To reduce the effect of raw read errors 61
on scaffolding quality, small fragments that aligned against the reference genome with a 62
length less than 25 bp were manually discarded if they were unable to be covered by any 63
large fragments or reference genome. Remapping 1882 reads against the draft genome 64
resulted in 99.99% genome coverage (coverage depth range: 1X-47X) (Figure 1A). The 65
mean coverage depth was 7.71X across the whole genome, which was two times higher 66
than the lowest common 3X read coverage depth for single-nucleotide polymorphism (SNP) 67
calling based on low-coverage sequencing in the 1000 Genomes Project pilot phase[7]. 68
Similar coverage levels are also sufficient to detect rare or low-abundance microbial 69
species from metagenomic datasets[8], indicating that our assembled Pangolin-CoV draft 70
genome is reliable for further analyses. Based on Simplot analysis[9], Pangolin-CoV 71
showed high overall genome sequence identity to RaTG13 (90.55%) and SARS-CoV-2 72
(91.02%) throughout the genome (Figure 1B), although there was a higher identity (96.2%) 73
between SARS-CoV-2 and RaTG13[3]. Other SARS-CoV-like CoVs similar to Pangolin-74
CoV were bat SARSr-CoV ZXC21 (85.65%) and bat SARSr-CoV ZC45 (85.01%). While 75
this manuscript was under review, two similar preprint studies found that CoVs in 76
pangolins shared 90.3%[10] and 92.4%[11] DNA identity with SARS-CoV-2 77
approximating the 91.02% identity to SARS-CoV-2 observed here and supporting our 78
findings. Taken together, these results indicate that Pangolin-CoV might be the common 79
origin of SARS-CoV-2 and RaTG13. 80
The Pangolin-CoV genome organization was characterized by sequence alignment 81
against SARS-CoV-2 (GenBank accession MN908947) and RaTG13. The Pangolin-CoV 82
genome consists of six major open reading frames (ORFs) common to CoVs and four other 83
accessory genes (Figure 1C and Table S2). Further analysis indicated that Pangolin-CoV 84
genes aligned to SARS-CoV-2 genes with coverage ranging from 45.8% to 100% (average 85
![Page 5: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/5.jpg)
coverage 76.9%). Pangolin-CoV genes shared high average nucleotide and amino acid 86
identity with both SARS-CoV-2 (MN908947) (93.2% nucleotide/94.1% amino acid 87
identity) and RaTG13 (92.8% nucleotide/93.5% amino acid identity) genes (Figure 1C and 88
Table S2). Surprisingly, some Pangolin-CoV genes showed higher amino acid sequence 89
identity to SARS-CoV-2 genes than to RaTG13 genes, including orf1b (73.4%/72.8%), the 90
spike (S) protein (97.5%/95.4%), orf7a (96.9%/93.6%), and orf10 (97.3%/94.6%). The 91
high S protein amino acid identity implies functional similarity between Pangolin-CoV and 92
SARS-CoV-2. 93
Phylogenetic relationships among Pangolin-CoV, RaTG13 and SARS-CoV-2 94
To determine the evolutionary relationships among Pangolin-CoV, SARS-CoV-2 and 95
previously identified CoVs, we estimated phylogenetic trees based on the nucleotide 96
sequences of the whole genome sequence, RNA-dependent RNA polymerase gene (RdRp), 97
non-structural protein genes ORF1a and ORF1b, and main structural proteins encoded by 98
the S and M genes. In all phylogenies, Pangolin-CoV, RaTG13 and SARS-CoV-2 were 99
clustered into a well-supported group, here named the “SARS-CoV-2 group” (Figure 2 and 100
Figures S1 to S2). This group represents a novel Betacoronavirus group. Within this group, 101
RaTG13 and SARS-CoV-2 were grouped together, and Pangolin-CoV was their closest 102
common ancestor. However, whether the basal position of the SARS-CoV-2 group is 103
SARSr-CoV ZXC21 and/or SARSr-CoV ZC45 is still under debate. Such debate also 104
occurred in both the Wu et al.[6] and Zhou et al.[3] studies. A possible explanation is a past 105
history of recombination in the Betacoronavirus group[6]. It is noteworthy that the 106
discovered evolutionary relationships of CoVs shown by the whole genome, RdRp gene, 107
and S gene were highly consistent with those exhibited by complete genome information 108
in the Zhou et al. study[3]. This correspondence indicates that our Pangolin-CoV draft 109
genome has enough genomic information to trace the true evolutionary position of 110
Pangolin-CoV in CoVs. 111
Dualism of the S protein of Pangolin-CoV 112
The CoV S protein consists of 2 subunits (S1 and S2), mediates infection of receptor-113
expressing host cells and is a critical target for antiviral neutralizing antibodies[12]. S1 114
![Page 6: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/6.jpg)
contains a receptor-binding domain (RBD) that consists of an approximately 193 amino 115
acid fragment, which is responsible for recognizing and binding the cell surface 116
receptor[13, 14]. Zhou et al. experimentally confirmed that SARS-CoV-2 is able to use 117
human, Chinese horseshoe bat, civet, and pig ACE2 proteins as an entry receptor in ACE2-118
expressing cells[3], suggesting that the RBD of SARS-CoV-2 mediates infection in 119
humans and other animals. To gain sequence-level insight into the pathogenic potential of 120
Pangolin-CoV, we first investigated the amino acid variation pattern of the S1 proteins 121
from Pangolin-CoV, SARS-CoV-2, RaTG13, and other representative SARS/SARSr-122
CoVs. The amino acid phylogenetic tree showed that the S1 protein of Pangolin-CoV is 123
more closely related to that of 2019-CoV than to that of RaTG13. Within the RBD, we 124
further found that Pangolin-CoV and SARS-CoV-2 were highly conserved, with only one 125
amino acid change (500H/500Q) (Figure 3), which is not one of the five key residues 126
involved in the interaction with human ACE2[3, 14]. These results indicate that Pangolin-127
CoV could have pathogenic potential similar to that of SARS-CoV-2. In contrast, RaTG13 128
has changes in 17 amino acid residues, 4 of which are among the key amino acid residues 129
(Figure 3). There are evidences suggesting that the change of 472L (SARS-CoV) to 486F 130
(SARS-CoV-2) (corresponding to the second key amino acid residue change in Figure 3) 131
may make stronger van der Waals contact with M82 (ACE2)[15]. Besides, the major 132
substitution of 404V in the SARS-CoV-RBD with 417K in the SARS-CoV-2-RBD (see 133
420 alignment position in Figure 3 and without amino acid change between the SARS-134
CoV-2 and RaTG13) may result in tighter association because of the salt bridge formation 135
between 417K and 30D of ACE2[15]. Nevertheless, it still needs further investigation 136
about whether those mutations affect the affinity for ACE2. Whether the Pangolin-CoV or 137
RaTG13 as potential infectious agents to humans remains to be determined. 138
The S1/S2 cleavage site in the S protein is also an important determinant of the 139
transmissibility and pathogenicity of SARS-CoV/SARS-CoVr viruses[16]. The trimetric S 140
protein is processed at the S1/S2 cleavage site by host cell proteases during infection. 141
Following cleavage, also known as priming, the protein is divided into an N-terminal S1-142
ectodomain that recognizes a cognate cell surface receptor and a C-terminal S2-membrane 143
![Page 7: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/7.jpg)
anchored protein that drives fusion of the viral envelope with a cellular membrane. We 144
found that the SARS-CoV-2 S protein contains a putative furin recognition motif 145
(PRRARSV) (Figure 4) similar to that of MERS-CoV, which has a PRSVRSV motif that 146
is likely cleaved by furin[16, 17] during virus egress. Conversely, the furin sequence motif 147
at the S1/S2 site is missing in the S protein of Pangolin-CoV and all other SARS/SARSr-148
CoVs. This difference indicates the SARS-CoV-2 might gain a distinct mechanism to 149
promote its entry into host cells[18]. Interestingly, aside from MERS-CoV, similar 150
sequence patterns to the SARS-CoV-2 were also presented in some members of 151
Alphacoronavirus, Betacoronavirus, and Gammcoronavirus[19], raising an interesting 152
question regarding whether this furin sequence motif in SARS-CoV-2 might be derived 153
from those existed S protein of other coronaviruses or alternatively the SARS-CoV-2 might 154
be the recombinant of Pangolin-CoV or RaTG13 and other coronaviruses with similar furin 155
recognition motif in the unknown intermediate host. 156
Amino acid variations in the nucleocapsid (N) protein for potential diagnosis 157
The N protein is the most abundant protein in CoVs. The N protein is a highly 158
immunogenic phosphoprotein, and it is normally very conserved. The CoV N protein is 159
often used as a marker in diagnostic assays. To gain further insight into the diagnostic 160
potential of Pangolin-CoV, we investigated the amino acid variation pattern of the N 161
proteins from Pangolin-CoV, SARS-CoV-2, RaTG13, and other representative SARS-162
CoVs. Phylogenetic analysis based on the N protein supported the classification of 163
Pangolin-CoV as a sister taxon of SARS-CoV-2 and RaTG13 (Figure S3). We further 164
found seven amino acid mutations that differentiated our defined “SAR-CoV-2 group” 165
CoVs (12N, 26 G, 27S, 104D, 218A, 335T, 346N, and 350Q) from other known SARS-166
CoVs (12S, 26D, 27N, 104E, 218T, 335H, 346Q, and 350N). Two amino acid sites (38P 167
and 268Q) are shared by Pangolin-CoV, RaTG13 and SARS-CoVs, which are mutated to 168
38S and 268A in SARS-CoV-2. Only one amino acid residue shared by Pangolin-CoV and 169
other SARS-CoVs (129E) is consistently different in both SARS-CoV-2 and RaTG13 170
(129D). The observed amino acid changes in the N protein would be useful for developing 171
antigens with improved sensitivity for SARS-CoV-2 serological detection. 172
![Page 8: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/8.jpg)
Conclusion 173
Based on published metagenomic data, this study provides the first report on a 174
potential closely related kin (Pangolin-CoV) of SARS-CoV-2, which was discovered from 175
dead Malayan pangolins after extensive rescue efforts. Aside from RaTG13, the Pangolin-176
CoV is the CoV most closely related to SARS-CoV-2. Due to unavailability of the original 177
sample, we did not perform further experiments to confirm our findings, including PCR 178
validation, serological detection, or even isolation of the virus particles. Our discovered 179
Pangolin-CoV genome showed 91.02% nucleotide identity with the SARS-CoV-2 genome. 180
However, whether pangolin species are good candidates for SARS-CoV-2 origin is still 181
under debate. Considering the wide spread of SARSr-CoVs in natural reservoirs, such as 182
bats, camels, and pangolins, our findings would be meaningful for finding novel 183
intermediate SARS-CoV-2 hosts to block interspecies transmission. 184
Acknowledgements 185
This study was supported by the Second Tibetan Plateau Scientific Expedition and 186
Research (STEP) program (no. 2019QZKK0503), the National Key Research and 187
Development Program of China (no. 2018YFC2000500), the Key Research Program of the 188
Chinese Academy of Sciences (no. KFZD-SW-219), and the Chinese National Natural 189
Science Foundation (no. 31970571). 190
Author Contributions 191
Z.Z. performed project planning, coordination, execution, and facilitation. T.Z. and 192
W.Q. performed the metagenomic analysis. T.Z. carried out assemblies, gene prediction, 193
and annotation. W.Q. processed data collection and phylogenetic analysis. Z.Z., T.Z., and 194
W.Q. prepared the manuscript. 195
Declaration of Interests 196
The authors declare no competing interests. 197
Figure Legends 198
Figure 1 Genome-related analysis. (A) Sequence depth of reads remapped to Pangolin-199
CoV. (B) Similarity plot based on the full-length genome sequence of Pangolin-CoV. Full-200
length genome sequences of SARS-CoV-2 (Beta-CoV/Wuhan-Hu-1), BatCoV RaTG13, 201
![Page 9: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/9.jpg)
bat SARSr-CoV 21, bat SARSr-CoV45, bat SARSr-CoV WIV1, and SARS-CoV BJ01 202
were used as reference sequences. (C) Comparison of common genome organization 203
similarity among SARS-CoV-2, Pangolin-CoV and BatCoV RaTG13. Related to Table 204
S2. 205
Figure 2 Phylogenetic relationship of CoVs based on the whole genome and RdRp 206
gene nucleotide sequences. Red text denotes the Malayan Pangolin-CoV. Pink text 207
denotes SARS-CoV-2. Green text denotes a bat CoV with 96% similarity at the genome 208
level to SARS-CoV-2. Blue text denotes the reference CoVs used in Figure 1B. Detailed 209
information can be found in the STAR Methods. Related to Figures S1 to S3. 210
Figure 3 Amino acid sequence alignment of the S1 protein and its phylogeny. The 211
receptor-binding motif of SARS-CoV and the homologous region of other CoVs are 212
indicated by the grey box. The key amino acid residues involved in the interaction with 213
human ACE2 are marked with the orange box. Bat SARS-CoV-like CoVs had been 214
reported to not use ACE2 and have amino acid deletions at two motifs marked by the 215
yellow box. Detailed information can be found in the STAR Methods. 216
Figure 4 CoV S protein S1/S2 cleavage sites. Four amino acid insertions (SPRRs) unique 217
to SARS-CoV-2 are marked in yellow. Conserved S1/S2 cleavage sites are marked in 218
green. 219
STAR Methods 220
KEY RESOURCES TABLE 221
LEAD CONTACT AND MATERIALS AVAILABILITY 222
Requests for further information and data resources should be directed to and will be 223
fulfilled by the Lead Contact, Zhigang Zhang ([email protected]). This study did 224
not generate new unique reagents. 225
METHOD DETAILS 226
Data collection and preprocessing 227
We downloaded raw data for the lung08 and lung07 samples published in Liu’s 228
study[1] from the NCBI SRA under BioProject PRJNA573298. Raw reads were first 229
adaptor and quality trimmed using the Trimmomatic program (version 0.39)[20]. To 230
![Page 10: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/10.jpg)
remove host contamination, Bowtie2 (version 2.3.4.3)[21]was used to map clean reads to 231
the host reference genome of Manis javanica (NCBI Project ID: PRJNA256023). Only 232
unmapped reads were mapped to the SARS-CoV-2 reference genome (GenBank accession 233
MN908947) for identifying virus reads. 234
Genome assembly and gene prediction 235
Virus-mapped reads were assembled de novo using MEGAHIT (version 1.1.3)[22]. 236
Read remapping to assembled contigs was performed by using Bowtie2[21]. Mapping 237
coverage and depth were determined using Samtools (version 1.9)[23]. Contigs were 238
taxonomically annotated using BLAST 2.9.0+[24] against 2845 CoV reference genomes 239
(Table S1). The BatCoV RaTG13 genome was downloaded from the NGDC database 240
(https://bigd.big.ac.cn/) (accession no. GWHABKP00000000)[3]. The SARS-CoV-2 241
reference genome was downloaded from NCBI (accession no. MN908947)[6]. Other CoV 242
genomes were downloaded from the ViPR database 243
(https://www.viprbrc.org/brc/home.spg?decorator=corona) on 6 February 2020. We 244
further used a reference-guided strategy to construct a draft genome based on contigs 245
taxonomically annotated to SARS-CoV-2s, SARS-CoV, and bat SARS-CoV-like CoV. 246
Each contig was aligned against the SARS-CoV-2 reference genome with MUSCLE 247
software (version 3.8.31)[25]. Aligned contigs were merged into consensus scaffolds with 248
BioEdit version 7.2.5 (http://www.mybiosoftware.com/bioedit-7-0-9-biological-sequence-249
alignment-editor.html) following manual quality checking. Small fragments less than 25 250
bp in length were discarded if these fragments were not covered by any large fragments. 251
The potential ORFs of the final draft genome obtained were annotated by alignment to the 252
SARS-CoV-2 reference genome (accession no. MN908947). SimPlot 3.5.1[9] was used to 253
analyse whole genome nucleotide identity. 254
Phylogeny 255
Sequence alignment was carried out using MUSCLE software[25]. Alignment 256
accuracy was checked manually base by base. Gblocks[26] was used to process the gap in 257
the aligned sequence. Using MegaX (version 10.1.7)[27], we inferred all maximum 258
likelihood (ML) phylogenetic trees. 259
![Page 11: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/11.jpg)
QUANTIFICATION AND STATISTICAL ANALYSIS 260
Using MegaX software[27], we constructed all maximum likelihood (ML) 261
phylogenetic trees under the best-fit DNA/amino acid substitution model with 1000 262
bootstrap replications. Phylogenetic analyses were performed using the nucleotide 263
sequences of various CoV gene datasets: the whole genome, ORF1a, ORF1b, and the 264
membrane (M), S and RdRp genes. The best model of M was GTR+G, and the best for all 265
the others was GTR+G+I. Two additional protein-based trees were constructed under 266
WAG+G (S1 subunit of the S protein) and JTT+G (N protein). Branches with bootstrap 267
values< 70% were hidden in all phylogenetic trees. 268
DATA AND CODE AVAILABILITY 269
The dataset used in this study is provided as supplementary material (Tables S1 and 270
S2). This study did not generate code. 271
Supplemental Information 272
Table S1 Contigs taxonomically annotated by using BLASTx against 2845 CoV 273
reference genomes. Related to STAR Methods. 274
Table S2 Comparing nucleotide and amino acid sequence identity differences of ten 275
genes among Pangolin-CoV, SARS-CoV-2, and RaTG13. Related to Figure 1C. 276
References 277
1. Liu, P., Chen, W., and Chen, J.-P. (2019). Viral metagenomics revealed sendai virus 278
and coronavirus infection of Malayan Pangolins (Manis javanica). Viruses 11, 979. 279
2. Li, W., Shi, Z., Yu, M., Ren, W., Smith, C., Epstein, J.H., Wang, H., Crameri, G., Hu, 280
Z., Zhang, H., et al. (2005). Bats are natural reservoirs of SARS-like coronaviruses. 281
Science 310, 676-679. 282
3. Zhou, P., Yang, X.-L., Wang, X.-G., Hu, B., Zhang, L., Zhang, W., Si, H.-R., Zhu, Y., 283
Li, B., Huang, C.-L., et al. (2020). A pneumonia outbreak associated with a new 284
coronavirus of probable bat origin. Nature. doi: https://doi.org/10.1038/s41586-020-285
2012-7. 286
4. Cui, J., Li, F., and Shi, Z.-L. (2019). Origin and evolution of pathogenic coronaviruses. 287
Nat. Rev. Microbiol. 17, 181-192. 288
![Page 12: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/12.jpg)
5. Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., Zhang, L., Fan, G., Xu, J., Gu, 289
X., et al. (2020). Clinical features of patients infected with 2019 novel coronavirus in 290
Wuhan, China. Lancet 395, 497-506. 291
6. Wu, F., Zhao, S., Yu, B., Chen, Y.-M., Wang, W., Song, Z.-G., Hu, Y., Tao, Z.-W., Tian, 292
J.-H., Pei, Y.-Y., et al. (2020). A new coronavirus associated with human respiratory 293
disease in China. Nature. doi: https://doi.org/10.1038/s41586-020-2008-3. 294
7. Durbin, R.M., Altshuler, D., Durbin, R.M., Abecasis, G.R., Bentley, D.R., Chakravarti, 295
A., Clark, A.G., Collins, F.S., De La Vega, F.M., Donnelly, P., et al. (2010). A map of 296
human genome variation from population-scale sequencing. Nature 467, 1061-1073. 297
8. Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.L., Tyson, G.W., and 298
Nielsen, P.H. (2013). Genome sequences of rare, uncultured bacteria obtained by 299
differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533-538. 300
9. Lole, K.S., Bollinger, R.C., Paranjape, R.S., Gadkari, D., Kulkarni, S.S., Novak, N.G., 301
Ingersoll, R., Sheppard, H.W., and Ray, S.C. (1999). Full-length human 302
immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in 303
India, with evidence of intersubtype recombination. J. Virol. 73, 152-160. 304
10. Xiao, K., Zhai, J., Feng, Y., Zhou, N., Zhang, X., Zou, J.-J., Li, N., Guo, Y., Li, X., 305
Shen, X., et al. (2020). Isolation and characterization of 2019-nCoV-like coronavirus 306
from Malayan pangolins. bioRxiv, 2020.2002.2017.951335. 307
11. Lam, T.T.-Y., Shum, M.H.-H., Zhu, H.-C., Tong, Y.-G., Ni, X.-B., Liao, Y.-S., Wei, W., 308
Cheung, W.Y.-M., Li, W.-J., Li, L.-F., et al. (2020). Identification of 2019-nCoV 309
related coronaviruses in Malayan pangolins in southern China. bioRxiv, 310
2020.2002.2013.945485. 311
12. Tortorici, M.A., and Veesler, D. (2019). Structural insights into coronavirus entry. In 312
Advances in Virus Research, F.A. Rey, ed. (Academic Press), pp. 93-116. 313
13. Ge, X.-Y., Li, J.-L., Yang, X.-L., Chmura, A.A., Zhu, G., Epstein, J.H., Mazet, J.K., 314
Hu, B., Zhang, W., Peng, C., et al. (2013). Isolation and characterization of a bat 315
SARS-like coronavirus that uses the ACE2 receptor. Nature 503, 535-538. 316
14. Wong, S.K., Li, W., Moore, M.J., Choe, H., and Farzan, M. (2004). A 193-amino acid 317
![Page 13: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/13.jpg)
fragment of the SARS coronavirus S protein efficiently binds angiotensin-converting 318
enzyme 2. J. Biol. Chem. 279, 3197-3201. 319
15. Yan, R., Zhang, Y., Li, Y., Xia, L., Guo, Y., and Zhou, Q. (2020). Structural basis for 320
the recognition of the SARS-CoV-2 by full-length human ACE2. Science, eabb2762. 321
16. Millet, J.K., and Whittaker, G.R. (2014). Host cell entry of Middle East respiratory 322
syndrome coronavirus after two-step, furin-mediated activation of the spike protein. 323
Proc. Natl. Acad. Sci. USA. 111, 15214-15219. 324
17. Coutard, B., Valle, C., de Lamballerie, X., Canard, B., Seidah, N.G., and Decroly, E. 325
(2020). The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-326
like cleavage site absent in CoV of the same clade. Antiviral Res. 176, 104742. 327
18. Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T., Erichsen, S., 328
Schiergens, T.S., Herrler, G., Wu, N.-H., Nitsche, A., et al. (2020). SARS-CoV-2 cell 329
entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease 330
inhibitor. Cell. doi:10.1016/j.cell.2020.02.052. 331
19. Millet, J.K., and Whittaker, G.R. (2015). Host cell proteases: Critical determinants of 332
coronavirus tropism and pathogenesis. Virus Res. 202, 120-134. 333
20. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for 334
Illumina sequence data. Bioinformatics 30, 2114-2120. 335
21. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. 336
Nat. Meth. 9, 357-359. 337
22. Li, D., Liu, C.-M., Luo, R., Sadakane, K., and Lam, T.-W. (2015). MEGAHIT: an ultra-338
fast single-node solution for large and complex metagenomics assembly via succinct 339
de Bruijn graph. Bioinformatics 31, 1674-1676. 340
23. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., 341
Abecasis, G., Durbin, R., and Subgroup, G.P.D.P. (2009). The Sequence 342
Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079. 343
24. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and 344
Madden, T.L. (2009). BLAST+: architecture and applications. BMC Bioinformatics 345
10, 421. 346
![Page 14: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/14.jpg)
25. Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and 347
high throughput. Nucleic Acids Res. 32, 1792-1797. 348
26. Talavera, G., and Castresana, J. (2007). Improvement of phylogenies after removing 349
divergent and ambiguously aligned blocks from protein sequence alignments. Syst. 350
Biol. 56, 564-577. 351
27. Kumar, S., Stecher, G., Li, M., Knyaz, C., and Tamura, K. (2018). MEGA X: molecular 352
evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547-353
1549. 354
355
![Page 15: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/15.jpg)
KEY RESOURCES TABLE
REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited Data
Raw and analyzed data [1] PRJNA573298
Manis javanica reference genome NCBI sequence read archive (SRA)
PRJNA256023
SARS-CoV-2 reference genome GenBank MN908947
BatCov-RaTG13 genome NGDC (https://bigd.big.ac.cn/)
GWHABKP00000000
2845 Coronavirus reference genomes set ViPR https://www.viprbrc.org/brc/home.spg?decorator=corona
Software and Algorithms
Simplot [9] https://www.mybiosoftware.com/simplot-3-5-1-sequence-similarity-plotting.html
Trimmomatic [20] http://www.usadellab.org/cms/index.php?page=trimmomatic
Bowtie2 [21] http://bowtie-bio.sourceforge.net/bowtie2
MEGAHIT [22] https://github.com/voutcn/megahit
BLAST+ [24] ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST
SAMtools [23] http://samtools.sourceforge.net/
MUSCLE [25] http://drive5.com/muscle/
BioEdit San Diego Supercomputer Center
http://www.mybiosoftware.com/bioedit-7-0-9-biological-sequence-alignment-editor.html
Gblocks [26] http://molevol.cmima.csic.es/castresana/Gblocks.html
MEGA X [27] https://www.megasoftware.net/
Key Resource Table
![Page 16: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/16.jpg)
A
B
C
0 5000 10000 15000 20000
010
2030
40
Genome nucleotide position
Sequ
ensi
ng d
epth
Genome Coverage: 99.99% (19585nt/19587nt)Mean depth: 7.71
Genome nucleotide position300002500020000150001000050000
Perc
enta
ge n
ucle
otid
e id
entit
y
100
90
80
70
60
50
40
Beta-CoV/Wuhan-Hu-1 (91.02%)
SARS-CoV BJ01 (73.62)
BatCoV RaTG13 (90.55%)
Bat SARSr-CoV WIV1(73.94%)
Bat SARSr-CoV ZXC21(85.65%)Bat SARSr-CoV ZC45 (85.01%)
Query: Pangolin CoV
0 10000 20000 30000
EM N
101a 1b 3a
67a8S
E M N 10Mean
(DNA/AA %)
Mean (DNA/AA %)
1a 1b 3a 6 7a 8S
E M N 101a 1b 3a 6 7a 8S90.3/97.0 90.4/72.8 87.3/95.4 95.0/96.8 98.3/97.4 93.0/98.6 93.8/96.3 90.4/93.6 91.5/96.6 95.4/96.4 98.3/94.6 92.8/93.5
89.9/96.8 90.7/73.4 88.3/97.5 94.5/96.8 98.3/97.4 93.0/98.6 95.1/96.3 92.0/96.9 91.7/94.8 94.9/96.4 99.1/97.3 93.2/94.1
19,587 bpPangolin CoVPangolin
Human
Bat 29,855 bpBatCoV RaTG13
29,903 bpSARS-CoV-2
Figure 1
![Page 17: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/17.jpg)
RdRp
Human CoV 229E93
SARS-CoV SZ3
SARS-CoV BJ01
Bat SARSr-CoV YNLF31C
Bat SARSr-CoV WIV1
Bat SARSr-CoV SHC014
Bat SARSr-CoV Rp3
Bat SARSr-CoV WIV16
Bat SARSr-CoV Rs4231
Bat SARSr-CoV GX2013
Bat SARSr-CoV SC2018
Bat SARSr-CoV Rf1
Bat SARSr-CoV SX2013
Bat SARSr-CoV HuB2013
Bat SARSr-CoV ZXC21
Bat SARSr-CoV HKU3-1
Bat SARSr-CoV ZC45
Bat SARSr-CoV Longquan-140
Bat SARSr-CoV BM48-31 Pangolin-CoV
Bat CoV RaTG13
Beta-CoV/Wuhan-Hu-1
Beta-CoV/Wuhan/IPBCAMS-WH-01
Beta-CoV/Wuhan/IPBCAMS-WH-02
Beta-CoV/Wuhan/IPBCAMS-WH-03
Beta-CoV/Wuhan/IPBCAMS-WH-04
Beta-CoV/Wuhan/IPBCAMS-WH-05
Bat Hp-BetaCoV Zhejiang2013
MERS-CoV
Tylonycteris bat CoV HKU4
Pipistrellus bat CoV HKU5
Rousettus bat CoV HKU9
Human CoV HKU1
Mus musculus MHV-1
Human CoV OC43
TGEV Mink-CoV
PEDV
Scotophilus bat CoV 512
Rhinolophus bat CoV HKU2
Miniopterus bat CoV HKU8
Miniopterus bat CoV1
Human CoV NL63
100100
10081
100
100
100100
85
100
100
92
100
100
100
100
100100
100
99
10098
100
Whole Genome
Bet
a-C
oVA
lpha
-CoV
Bet
a-C
oVA
lpha
-CoV
Bat SARSr-CoV WIV16
Bat SARSr-CoV Rs4231
Bat SARSr-CoV WIV1
Bat SARSr-CoV SHC014
SARS-CoV SZ3
SARS-CoV BJ01 Bat SARSr-CoV YNLF31C
Bat SARSr-CoV GX2013
Bat SARSr-CoV Rp3
Bat SARSr-CoV SC2018
Bat SARSr-CoV HuB2013
Bat SARSr-CoV Rf1
Bat SARSr-CoV SX2013
Bat SARSr-CoV HKU3-1
Bat SARSr-CoV Longquan-140
Bat SARSr-CoV ZXC21 Bat SARSr-CoV ZC45
Pangolin-CoV Bat CoV RaTG13 Beta-CoV/Wuhan/IPBCAMS-WH-03 Beta-CoV/Wuhan-Hu-1 Beta-CoV/Wuhan/IPBCAMS-WH-02 Beta-CoV/Wuhan/IPBCAMS-WH-04 Beta-CoV/Wuhan/IPBCAMS-WH-01 Beta-CoV/Wuhan/IPBCAMS-WH-05
Bat SARSr-CoV BM48-31
Bat Hp-BetaCoV Zhejiang2013
Rousettus bat CoV HKU9
MERS-CoV
Tylonycteris bat CoV HKU4
Pipistrellus bat CoV HKU5
Mus musculus MHV-1
Human CoV OC43
TGEV
Mink-CoV
Rhinolophus bat CoV HKU2
Human CoV NL63
Human CoV 229E
PEDV
Scotophilus bat CoV 512
Miniopterus bat CoV HKU8
Miniopterus bat CoV1
Human CoV HKU1
100
100
100
100
100
81
100
100
100
100
99
100
100
100
100
100100
100
100
100
100
100100
100
100100
100
100
100
100100
100
100
100
0.200.50
SARS-CoV-2group SARS-CoV-2
group
Figure 2
![Page 18: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/18.jpg)
340 350 360 370 380 390 400 410 420 430. . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . .
Beta-CoV/Wuhan-Hu-1 P N I T N L C P F G E V F N A T R F A S V Y A W N R K R I S N C V A D Y S V L Y N S A S F S T F K C Y G V S P T K L N D L C F T N V Y A D S F V I R G D E V R Q I A P G Q T G K I A D Y N Y K L P D D F 432Beta-CoV/Wuhan/IPBCAMS-WH-02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Pangolin-CoV . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V . . . . . . . . . . . . . . R . . G . . . . . . . . . 432Bat_CoV_RaTG13 . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . 432SARS-CoV_SZ3 . . . . . . . . . . . . . . . . K . P . . . . . E . . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432SARS-CoV_BJ01 . . . . . . . . . . . . . . . . K . P . . . . . E . . K . . . . . . . . . . . . . . T F . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_WIV1 . . . . . . . . . . . . . . . . T . P . . . . . E . . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_WIV16 . . . . . . . . . . . . . . . . T . P . . . . . E . . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_ZXC21 . . . . . V . . . H K . . . . . . . P . . . . . E . T K . . D . I . . . T . F . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_ZC45 . . . . . V . . . H K . . . . . . . P . . . . . E . T K . . D . I . . . T . F . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_YNLF31C . . . . . . . . . D K . . . . . . . P . . . . . E . T K . . D . . . . . T . F . . . T . . . . . N . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_SX2013 . . . . . . . . . D K . . . . . . . P . . . . . E . T K . . D . . . . . T . F . . . T . . . . . N . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_Rf1 . . . . . . . . . D K . . . . . . . P . . . . . E . T K . . D . . . . . T . F . . . T . . . . . N . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_Rp3 . . . . . R . . . D K . . . . . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_GX2013 . . . . . R . . . D K . . . . . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_HKU3-1 . . . . . R . . . D K . . . . . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_Longquan-140 . . . . . R . . . D K . . . V . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_SC2018 . . . . . R . . . D K . . . . S . . P . . . . . E . I K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_HuB2013 . . . . . R . . . D R . . . . S . . P . . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432
440 450 460 470 480 490 500 510 520. . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . .
Beta-CoV/Wuhan-Hu-1 T G C V I A W N S N N L D S K V G G N Y N Y L Y R L F R K S N L K P F E R D I S T E I Y Q A G T P C N G V E G F N C Y F P L Q S Y G F Q P T N G V G Y Q P Y R V V V L S F E L L H A 522
Beta-CoV/Wuhan/IPBCAMS-WH-02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Pangolin-CoV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H . . . . . . . . . . . . . . . . . . . . N . 522Bat_CoV_RaTG13 . . . . . . . . . K H I . A . E . . . F . . . . . . . . . A . . . . . . . . . . . . . . . . . K . . . . Q T . L . . . Y . . Y R . . . Y . . D . . . H . . . . . . . . . . . . . N . 522SARS-CoV_SZ3 M . . . L . . . T R . I . A T S T . . . . . K . . Y L . H G K . R . . . . . . . N V P F S P D K . . T P - P - L . . . W . . K D . . . Y T . S . I . . . . . . . . . . . . . . . N . 520SARS-CoV_BJ01 M . . . L . . . T R . I . A T S T . . . . . K . . Y L . H G K . R . . . . . . . N V P F S P D K . . T P - P - L . . . W . . N D . . . Y T . T . I . . . . . . . . . . . . . . . N . 520Bat_SARSr-CoV_WIV1 . . . . L . . . T R . I . A T Q T . . . . . K . . S L . H G K . R . . . . . . . N V P F S P D K . . T P - P - . . . . W . . N D . . . Y I . . . I . . . . . . . . . . . . . . . N . 520Bat_SARSr-CoV_WIV16 . . . . L . . . T R . I . A T Q T . . . . . K . . S L . H G K . R . . . . . . . N V P F S P D K . . T P - P - . . . . W . . N D . . . Y I . . . I . . . . . . . . . . . . . . . N . 520Bat_SARSr-CoV_ZXC21 . . . . . . . . T A K Q . T G H - - - - - . F . . S H . S T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N . N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_ZC45 . . . . . . . . T A K Q . V G N - - - - - . F . . S H . S T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N . N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_YNLF31C . . . . . . . . T A K Y . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G A R T . S - . D . N Q N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_SX2013 . . . . . . . . T A K Q . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N Q Y V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_Rf1 . . . . . . . . T A K Q . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N Q N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_Rp3 . . . . . . . . T A K Q . Q G Q - - - - - . Y . . S H . . T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . Y . S V P . A . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_GX2013 . . . . . . . . T A K Q . T G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_HKU3-1 . . . . . . . . T A K H . T G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_Longquan-140 . . . . . . . . T A K Q . I G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_SC2018 . . . . . . . . T A K Q . T G S - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_HuB2013 . . . . . . . . T A K Q . T G Y - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503
99
91
99
84
72
98
96
100
9393
99
Figure 3
![Page 19: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/19.jpg)
S1/S2 cleavage site
Beta-CoV/Wuhan-Hu-1Beta-CoV/Wuhan/IPBCAMS-WH-02Beta-CoV/Wuhan/IPBCAMS-WH-03Beta-CoV/Wuhan/IPBCAMS-WH-05Beta-CoV/Wuhan/IPBCAMS-WH-01Beta-CoV/Wuhan/IPBCAMS-WH-04
Bat_CoV_RaTG13Pangolin-CoV
Bat_SARSr-CoV_ZXC21Bat_SARSr-CoV_ZC45
SARS-CoV_SZ3SARS-CoV_BJ01
Bat_SARSr-CoV_WIV1Bat_SARSr-CoV_SHC014Bat_SARSr-CoV_Rs4231Bat_SARSr-CoV_WIV16
Bat_SARSr-CoV_YNLF31CBat_SARSr-CoV_SX2013
Bat_SARSr-CoV_Rf1Bat_SARSr-CoV_Rp3
Bat_SARSr-CoV_GX2013Bat_SARSr-CoV_HKU3-1
Bat_SARSr-CoV_Longquan-140Bat_SARSr-CoV_SC2018
Bat_SARSr-CoV_HuB2013
Y Q T Q T N S P R R A R S V A S Q S I I A Y T M S L G. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . - - - - S . . . . . . . . . . . . . . . .. . . . . . - - - - S . . . S . . A . . . . . . . . .. H . A S I - - - - L . . T G Q K A . V . . . . . . .. H . A S I - - - - L . . T S Q K A . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . V S L - - - - L . . T S Q K . . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . V S S - - - - L . . T S Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .. H . A S L - - - - L . . T G Q K . . V . . . . . . .. H . A S H - - - - L . . T G Q K . . V . . . . . . .. H . A S T - - - - L . . . G Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .. H . A S T - - - - L . . T G Q K . . V . . . . . . .. H . A S V - - - - L . . T G Q K . . V . . . . . . .
680 690 700. . | . . . . | . . . . | . . . . | . . . . | . . . .
Figure 4
![Page 20: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/20.jpg)
SARS-CoV-2group
S M
0.50
Bet
a-C
oVA
lpha
-CoV
Bat SARSr-CoV WIV1
Bat SARSr-CoV SHC014
Bat SARSr-CoV WIV16
Bat SARSr-CoV YNLF31C
Bat SARSr-CoV Rf1
Bat SARSr-CoV SX2013
SARS-CoV SZ3
SARS-CoV BJ01
Bat SARSr-CoV SC2018
Bat SARSr-CoV Rs4231
Bat SARSr-CoV GX2013
Bat SARSr-CoV Rp3
Bat SARSr-CoV Longquan-140
Bat SARSr-CoV HuB2013
Bat SARSr-CoV HKU3-1
Bat SARSr-CoV BM48-31
Bat SARSr-CoV ZXC21
Bat SARSr-CoV ZC45
Pangolin-CoV
Bat CoV RaTG13
Beta-CoV/Wuhan-Hu-1
Beta-CoV/Wuhan/IPBCAMS-WH-01
Beta-CoV/Wuhan/IPBCAMS-WH-02
Beta-CoV/Wuhan/IPBCAMS-WH-03
Beta-CoV/Wuhan/IPBCAMS-WH-04
Beta-CoV/Wuhan/IPBCAMS-WH-05
Bat Hp-BetaCoV Zhejiang2013
Rousettus bat CoV HKU9
MERS-CoV
Tylonycteris bat CoV HKU4
Pipistrellus bat CoV HKU5
Human CoV HKU1
Mus musculus MHV-1
Human CoV OC43
TGEV
Mink-CoV
Rhinolophus bat CoV HKU2
Human CoV NL63
Human CoV 229E
PEDV
Scotophilus bat CoV 512
Miniopterus bat CoV HKU8
Miniopterus bat CoV1
99
10089
73
9970
100675564
91
77
10051
99
68
91
100
99
95
62100
69
100
99
86
98
9866
91
100
Bet
a-C
oVA
lpha
-CoV
Bat SARSr-CoV WIV16
Bat SARSr-CoV Rs4231
Bat SARSr-CoV WIV1
Bat SARSr-CoV SHC014
SARS-CoV SZ3
SARS-CoV BJ01
Bat SARSr-CoV YNLF31C
Bat SARSr-CoV GX2013
Bat SARSr-CoV Rp3
Bat SARSr-CoV SC2018
Bat SARSr-CoV HuB2013
Bat SARSr-CoV Rf1
Bat SARSr-CoV SX2013
Bat SARSr-CoV HKU3-1
Bat SARSr-CoV Longquan-140
Bat SARSr-CoV ZXC21
Bat SARSr-CoV ZC45
Pangolin-CoV
Bat CoV RaTG13
Beta-CoV/Wuhan/IPBCAMS-WH-03
Beta-CoV/Wuhan-Hu-1
Beta-CoV/Wuhan/IPBCAMS-WH-02
Beta-CoV/Wuhan/IPBCAMS-WH-04
Beta-CoV/Wuhan/IPBCAMS-WH-01
Beta-CoV/Wuhan/IPBCAMS-WH-05 Bat SARSr-CoV BM48-31
Bat Hp-BetaCoV Zhejiang2013
Rousettus bat CoV HKU9
MERS-CoV
Tylonycteris bat CoV HKU4
Pipistrellus bat CoV HKU5
Mus musculus MHV-1
Human CoV OC43
TGEV
Mink-CoV
Rhinolophus bat CoV HKU2
Human CoV NL63
Human CoV 229E
PEDV
Scotophilus bat CoV 512
Miniopterus bat CoV HKU8
Miniopterus bat CoV1
Human CoV HKU1
100
100
100
100
100
81
100
100
100
100
99
100
100
100
100
100
100
100
100
100
100
100100
100
100100
100
100
100
100100
100
100
100
0.50
SARS-CoV-2group
Figure S1. Phylogenetic relationship of CoVs based on the ORF1a gene (A) and ORF1b gene (B) nucleotide sequences. Related to Figure 2.
Supplemental Data
![Page 21: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/21.jpg)
orf1a
0.20
Bet
a-C
oVA
lpha
-CoV
Bat SARSr-CoV WIV16
Bat SARSr-CoV Rs4231
Bat SARSr-CoV WIV1 Bat SARSr-CoV SHC014
SARS-CoV SZ3
SARS-CoV BJ01 Bat SARSr-CoV GX2013
Bat SARSr-CoV Rp3
Bat SARSr-CoV YNLF31C
Bat SARSr-CoV SC2018
Bat SARSr-CoV HuB2013
Bat SARSr-CoV Rf1
Bat SARSr-CoV SX2013
Bat SARSr-CoV HKU3-1
Bat SARSr-CoV Longquan-140
Bat SARSr-CoV BM48-31
Bat SARSr-CoV ZXC21 Bat SARSr-CoV ZC45
Pangolin-CoV Bat CoV RaTG13 Beta-CoV/Wuhan/IPBCAMS-WH-01 Beta-CoV/Wuhan-Hu-1 Beta-CoV/Wuhan/IPBCAMS-WH-02 Beta-CoV/Wuhan/IPBCAMS-WH-03 Beta-CoV/Wuhan/IPBCAMS-WH-04 Beta-CoV/Wuhan/IPBCAMS-WH-05
Bat Hp-BetaCoV Zhejiang2013
Rousettus bat CoV HKU9
MERS-CoV
Tylonycteris bat CoV HKU4
Pipistrellus bat CoV HKU5
Human CoV HKU1
Mus musculus MHV-1
Human CoV OC43
TGEV
Mink-CoV
Rhinolophus bat CoV HKU2
Human CoV NL63
Human CoV 229E
PEDV
Scotophilus bat CoV 512
Miniopterus bat CoV HKU8
Miniopterus bat CoV1
98
90
100
100100
100
100
90
93
100
86
97
100
100
100
97
100
94
83
98100
100
100
100
100
10088
100
83
orf1b
0.20
Bet
a-C
oVA
lpha
-CoV
Bat SARSr-CoV WIV16
Bat SARSr-CoV Rs4231
Bat SARSr-CoV WIV1
Bat SARSr-CoV SHC014
Bat SARSr-CoV GX2013
Bat SARSr-CoV Rp3
Bat SARSr-CoV YNLF31C
SARS-CoV SZ3
SARS-CoV BJ01
Bat SARSr-CoV SC2018
Bat SARSr-CoV Rf1
Bat SARSr-CoV SX2013
Bat SARSr-CoV HuB2013
Bat SARSr-CoV HKU3-1
Bat SARSr-CoV Longquan-140
Bat SARSr-CoV ZXC21
Bat SARSr-CoV ZC45
Bat SARSr-CoV BM48-31
Pangolin-CoV
Bat CoV RaTG13
Beta-CoV/Wuhan-Hu-1
Beta-CoV/Wuhan/IPBCAMS-WH-01
Beta-CoV/Wuhan/IPBCAMS-WH-02
Beta-CoV/Wuhan/IPBCAMS-WH-03
Beta-CoV/Wuhan/IPBCAMS-WH-04
Beta-CoV/Wuhan/IPBCAMS-WH-05
Bat Hp-BetaCoV Zhejiang2013
Rousettus bat CoV HKU9
Tylonycteris bat CoV HKU4
Pipistrellus bat CoV HKU5
MERS-CoV
Human CoV HKU1
Mus musculus MHV-1
Human CoV OC43
TGEV
Mink-CoV
PEDV
Scotophilus bat CoV 512
Miniopterus bat CoV HKU8
Miniopterus bat CoV1
Rhinolophus bat CoV HKU2
Human CoV NL63
Human CoV 229E
100
100
87
97
95
100
100
100
10081
100
100
85
100
99
100
100
100
100
100
72
95100
86
100100
100
100
100
99
75
100
100
SARS-CoV-2group
SARS-CoV-2group
Figure S2. Phylogenetic relationship of CoVs based on the S gene (A) and M gene (B) nucleotide sequences. Related to Figure 2.
![Page 22: Journal pre-proof - Cell · 2020-03-10 · Journal pre-proof DOI: 10.1016/j.cub.2020.03.022 This is a of an accepted peer-reviewed article but is not yet the definitive version of](https://reader033.fdocuments.us/reader033/viewer/2022042122/5e9d14d74465260a671eaca6/html5/thumbnails/22.jpg)
Beta-CoV/Wuhan-Hu-1Beta-CoV/Wuhan/IPBCAMS-WH-02Beta-CoV/Wuhan/IPBCAMS-WH-03Beta-CoV/Wuhan/IPBCAMS-WH-05Beta-CoV/Wuhan/IPBCAMS-WH-01Beta-CoV/Wuhan/IPBCAMS-WH-04Bat_CoV_RaTG13Pangolin-CoVBat_SARSr-CoV_ZXC21Bat_SARSr-CoV_ZC45SARS-CoV_SZ3SARS-CoV_BJ01Bat_SARSr-CoV_WIV1Bat_SARSr-CoV_SHC014Bat_SARSr-CoV_WIV16Bat_SARSr-CoV_YNLF31CBat_SARSr-CoV_Rs4231Bat_SARSr-CoV_Longquan-140Bat_SARSr-CoV_HKU3-1Bat_SARSr-CoV_GX2013Bat_SARSr-CoV_Rf1Bat_SARSr-CoV_SX2013Bat_SARSr-CoV_HuB2013Bat_SARSr-CoV_Rp3Bat_SARSr-CoV_SC2018
Beta-CoV/Wuhan-Hu-1Beta-CoV/Wuhan/IPBCAMS-WH-02Beta-CoV/Wuhan/IPBCAMS-WH-03Beta-CoV/Wuhan/IPBCAMS-WH-05Beta-CoV/Wuhan/IPBCAMS-WH-01Beta-CoV/Wuhan/IPBCAMS-WH-04Bat_CoV_RaTG13Pangolin-CoVBat_SARSr-CoV_ZXC21Bat_SARSr-CoV_ZC45SARS-CoV_SZ3SARS-CoV_BJ01Bat_SARSr-CoV_WIV1Bat_SARSr-CoV_SHC014Bat_SARSr-CoV_WIV16Bat_SARSr-CoV_YNLF31CBat_SARSr-CoV_Rs4231Bat_SARSr-CoV_Longquan-140Bat_SARSr-CoV_HKU3-1Bat_SARSr-CoV_GX2013Bat_SARSr-CoV_Rf1Bat_SARSr-CoV_SX2013Bat_SARSr-CoV_HuB2013Bat_SARSr-CoV_Rp3Bat_SARSr-CoV_SC2018
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|MSDNGPQ-NQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAAIVLQLPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNSSRNSTPGSSRGTSPARMAGNGGDAAL.......-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-.............................P.................................................................................................................................................................................S...........-................A............P...........................R..............................................................E......N.............................................................T.............................-..SS............SDNS.....N...P.........................N.T..............K......................E........................E........................................................................................T.........-...S............SDNSK....N...P.........................N.T..............K......................E........................E........................................................................................T.........S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........P...S.........T.P.DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........P...S.........T..IDN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET........H-...S.S.......T...DN....G.N...P.........................E.R..Q..........G..............V........E................S.......E.......................N....T..................................GN................I.SG..ET.........P...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E.......................N....T..................................GN...........N......SG..ET.........-S..S.........T..ADN..D.G.....P.........................E.R.............GK........K....V........E................S.......E..V....................N.......................................GN...........N....L.SG..ET.........-S..S.........A..NDN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.............................S.........GN...........S....L.SG..ET.........S...S..S......T...DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V...I................N.......................................GN...........N....L.SG..ET.........-..CS.............DN..D.G.....P.........................G....Q..........GR.............V........E................S.......E..V....................N.........................N.............GN..T........N....V.SG..ET.........-...S.............DN..D.G...V.P.........................G....Q..........GR.............V........E................S.......E..V....................N.........................N.............GN..T........N....V.SG..ET.........-...S.............DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET.........-...S.........T...DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET.........-...S.........T...DN..D.G.....P.........................E...............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET..
230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|...ALLLLDRLNQLESKMSGKGQQQQGQTVTKKSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQKKQQTVTLLPAADLDDFSKQLQQSMS--SADSTQA*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.......................S.......................Q.................................................................................................................................................--.......*...............................................Q................................Q.................................................................------------------------------..-N.............--.......*............N.I................................Q..................................................................H..........Q...N.............................L......................E..........G--T.....*............N.V................................Q..................................................................H..........Q...N.............................L......................E..........G--T.....*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V.....................R..........Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q............D.........D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V.......P........................Q..................................................................H..........Q...N..........................T..A.P....P...P.........M....R...N...GA.......*..............V.......P........................Q............................I.....................................H..........Q...N..........................T..A.P........P.........M....R...H...GA.......*..............V................................Q......................D.T.........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................TS..............Q............D........................................S..........I.H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V.................S..............Q............D.....................................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................S..................................................................H..........Q...N..........................T..A.P....-...P.........M....R...N...GA.......*..............V..RS............................Q..................................................................H..........Q...N............I.............T..A.P........P.........M....R...N...GA.......*..............V................................Q..................................................................H..........Q...N.M..........A.............T..A.P........P.........M....R...N...GA.......*
100
100
90
52
65
100100
96
93
Figure S3. Amino acid sequence alignment of the N protein and its phylogeny. Related to Figure 2. Highly-conserved amino acid
residues in the N-protein marked by colours have diagnostic potential.