French Guiana
Virus Hunting in
Nacho Caballero
French Guiana
Rodents
Bats
Rodents
Bats
Leishmania
Capture
Capture Isolate viral particles
Capture Isolate viral particles
Extract RNA
Capture Isolate viral particles
Extract RNA
Sequence
Estimated read coverage
% reads with coverage smaller than x
Rodents
Estimated read coverage
% reads with coverage smaller than x
Rodents
Estimated read coverage
% reads with coverage smaller than x
Rodents Bats
Read
How can we estimate the coverage without a reference genome?
Read
How can we estimate the coverage without a reference genome?
K-mers
Read
How can we estimate the coverage without a reference genome?
How can we estimate the coverage without a reference genome?
1111111
How can we estimate the coverage without a reference genome?
78
1081136
78
1081136
Median k-mer count ≈
Read coverage
k-mers make it possible to align without a reference
Problem: each sequencing error introduces k erroneous k-mers
Problem: each sequencing error introduces k erroneous k-mers
78
1081136
Over a threshold, additional reads are redundant
5555535
Solution: digital normalization reduces redundancy and errors
Assembly
Assembly
SPADes
Assembly Alignment
Assembly Alignment
BLAST
Assembly TaxonomyAlignment
Assembly TaxonomyAlignment
NCBI
Problem: 67% of contigs in rodent dataset (serum) align to human sequences
Problem: 67% of contigs in rodent dataset (serum) align to human sequences
Night-heron coronavirus HKU19 (1 Kb) Simian hemorrhagic fever virus (300 bp) Equine arteritis virus (3.7 Kb) Possum nidovirus Rodent hepacivirus Chipmunk parvovirus Theiler's disease-associated virus Reticuloendotheliosis virus Mosquito VEM Anellovirus SDBVL A Porcine reproductive and respiratory syndrome virus Dragonfly-associated circular virus 1 Gemycircularvirus 3 Rodent pegivirus Cyclovirus PK5510 Hypericum japonicum associated circular DNA virus
Pig stool associated circular ssDNA virus (1Kb) Avian gyrovirus 2 Torque teno sus virus 1a Mosquito VEM virus SDBVL G Turdivirus 3
Problem: 92% of contigs in bat dataset (droppings) don’t align to anything in NCBI
Lymphocytic choriomeningitis virus (7kb) Hepatitis C virus Amphotropic murine leukemia virus Murid herpesvirus 1 Mosquito VEM Anellovirus SDBVL A Rat retrovirus SC1 Mason-Pfizer monkey virus (retrovirus) Eidolon helvum parvovirus 2 Periplaneta fuliginosa densovirus (also a parvovirus) Moloney murine sarcoma virus Sclerotinia sclerotiorum hypovirulence associated DNA virus 1
Problem: 95% of contigs in rodent dataset 2 (serum, spleen) align to mouse sequences
(2)
7 out of 10 samples contained more than 1Kb of Leishmania RNA virus (94% ident)
5 Kb genome
Lessons
Assume that 50% of your samples are going to fail
Lessons
Assume that 50% of your samples are going to fail
Lessons
Design a small experiment, then iterate
Assume that 50% of your samples are going to fail
Lessons
Design a small experiment, then iterate
Come up with excuses to learn
Top Related