Enhancing capacity for next generation sequencing (NGS ......Diagnostic mutation detection using...
Transcript of Enhancing capacity for next generation sequencing (NGS ......Diagnostic mutation detection using...
Enhancing capacity for next generation
sequencing (NGS) and genomics in health,
agricultural, ecological and environmental
applications in Kazakhstan
A Newton-Al-Farabi Partnership Programme Researcher links
workshop
September 20th-23rd 2016
Hotel Grand Voyage, Almaty, Kazakhstan
Organised by School of Biology, Univeristy of Leeds, UK &
Institute of Microbiology & Virology, Almaty, Kazakhstan
Workshop Overview
Next generation sequencing and genomics are technologies that have developed at an
astonishing rate in recent years. They have become fundamental to health research and
diagnostics, including genetic disease and cancer, infectious disease, bacterial drug resistance
and personalised drug treatments for patients. It is also central to many areas of agricultural,
ecological and environmental research and diagnostics. This workshop will bring together UK
researchers using NGS and genomics in different fields, with Kazakh scientists with the aim of
fostering links that will help enhance their capability to apply this new technology in their own
work, to identify new opportunities for collaborative research projects between the UK and
Kazakhstan and to promote career development of young scientists. Overall we hope this will
promote the ability of Kazakh scientists to apply cutting edge NGS/genomics techniques in
health care, agricultural and environmental endeavours and support new biotechnology
enterprises.
https://goodmanlab.org/research/workshops-meetings/next-generation-sequencing-
researcher-links-workshop-almaty-kazakhstan-september-18th-24th-2016/
Workshop #tag: #AlmatyNGS2016
Dr Simon Goodman School of Biology University of Leeds Woodhouse Lane Leeds LS2 9JT UK
Dr Kobey Karamendin Institute of Microbiology and Virology 103 Bogenbai batyr str. Almaty, 050010 Kazakhstan
Programme
Mon 19th September
Registration from 17.00 to 19.00
Tues 20th September
09.00-10.00 Registration
10.00-10.10 Opening remarks
10.10-11.00 Plenary Dr Aynur Akilzhanova (Nazarbayev University) Genomic research in Kazakhstan: Challenges and opportunities for clinical applications
11.00-11.30 Coffee break
11.30-12.20 Plenary Dr Ian Carr (St James’s Hospital, University of Leeds) Diagnostic mutation detection using Next Generation Sequencing for healthcare in the UK
12.20-12.50 Ms Morag Taylor (St James’s Hospital, University of Leeds) The Role of Next Generation Sequencing in colorectal cancer research
12.50-14.00 Lunch
14.00-14.20 Dr Saule Rakhimova (Nazarbayev University) Transcriptome profiling of oesophageal cancer: from sampling to sequencing on HiSeq2000
14.20-14.40 Dr Ulykbek Kairov (Nazarbayev University) Meta-analysis of cancer transcriptome profiles using an independent components method
14.40-15.00 Dr Ulykbek Kairov (Nazarbayev University) Analysis of human whole-transcriptome sequencing data from Illumina HiSeq2000 platform
15.00-15.20 Dr Niamh Forde (Faculty of Medicine, University of Leeds) Using ‘Omics’ to understand successful early pregnancy events in cattle: The perspective of a reproductive biologist
15.20-16.00 Coffee break
16.00-17.00 Vladislav Govorkovskiy (Illumina, CIS) Application of Illumina NGS-technologies in healthcare, research and agriculture
Overview of NGS technologies and panel discussion on NGS equipment/sequencing platforms Chaired by Ian Carr (St James’s Hospital, University of Leeds)
17.30-19.00 Poster session and speed networking
Weds 21st September
09.30-10.20 Plenary Dr Mary O’Connell (School of Biology, University of Leeds) Comparative genomics and Mechanisms of protein evolution
10.20-10.40 Dr Antonia Ford (School of Biological Sciences, University of Bangor) Genomic characterisation of wild tilapia populations
10.40-11.00 Ms Jennifer Stockdale (School of Biosciences, University of Cardiff) Hungry for more: Utilising Next Generation Sequencing to determine the dietary range of different species
11.00-11.30 Coffee break
11.30-11.50 Dr Elizabeth Duncan (School of Biology, University of Leeds) Understanding the molecular mechanisms of gene-environment interactions in insects
11.50-12.10 Dr Helen Hipperson (NERC Biomolecular Analysis Facility, University of Sheffield) Identifying genes affecting both adaptive divergence and reproductive isolation in Howea palms from Lord Howe Island using RNA-Seq
12.10-12.30 Dr Askhat Molkenov (Nazarbayev University) Peculiarities of bioinformatics processing and data conversion from Illumina HiSeq2000
12.30-12.50 Dr Deborah Dawson (NERC Biomolecular Analysis Facility, University of Sheffield) Support for biomolecular studies of the natural environment in the UK
12.50-14.00 Lunch
14.00-15.30 Discussion panel – designing and troubleshooting NGS projects Chaired by Morag Taylor (St James’s Hospital, University of Leeds)
15.30-16.00 Coffee break
16.00-17.00 Discussion panel – designing and troubleshooting NGS projects continued
Thurs 22nd September
09.30-10.20 Plenary Dr Chris Knight (Faculty of Life Sciences, University of Manchester) Testing evolutionary mechanisms: mutation in microbes and more
10.20-10.40 Dr Saule Daugalieva (Institute of Microbiology and Virology) NGS 16S sequencing for microbial identification
10.40-11.00 Raushan Nugmanova (National Center for Biotechnology) Study of mutation clusters in bacteria using Ion Torrent sequencing
11.00-11.30 Coffee break
11.30-11.50 Dr Jenny Dunn (Royal Society for Protection of Birds) Using next-generation sequencing to examine co-infection and environmental parasite transmission
11.50-12.10 Dr Alexander Shevtsov (National Center for Biotechnology) NGS sequencing of veterinary pathogens
12.10-12.30 Dr Aizhan Turmagambetova (Institute of Microbiology and Virology) NGS for ecological research applications: macrophages profiling in Kazakhstan lakes
12.30-12.50 Dr Kobey Karamendin (Institute of Microbiology and Virology) NGS 16S sequencing of necropsy material from Saiga antelope after a mass die-off in Spring 2015
12.50-14.00 Lunch
14.00-15.30 NGS bioinformatics & data analysis resources and pipelines: Overview, demonstrations and panel discussion Chaired by Helen Hipperson (NERC Biomolecular Analysis Facility, University of Sheffield)
15.30-16.00 Coffee break
16.00-17.00 Bioinformatics discussion continued
19.00-23.00 Conference dinner
Fri 23rd September
09.30-10.00 Rowan Kennedy (Newton-Al Farabi Partnership Programme) UK-Kazakhstan research funding opportunities
10.00-10.30 Dr Simon Goodman (School of Biology, University of Leeds) Overview of research structure, funding and career development in the UK
10.30-11.00 Coffee break
11.00-12.00 Break out groups - Identification of research priorities and collaboration opportunities for UK-Kazakh researchers
12.00-12.30 Report of break out groups and closing remarks
12.30-14.00 Lunch
Departures
Posters
Author Title
Ulan Kozhamkulov Laboratory of Genomic and Personalized Medicine, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
Whole genome sequencing of clinical isolates of M.tuberculosis with a different drug sensitivity profile on the Roche 454 GS FLX + platform
Ainur Akhmetova Laboratory of Genomic and Personalized Medicine, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
Creating a HaloPlex cardiogenetic panel and preparation of DNA libraries for the targeted sequencing of patients with arrhythmias
Nurlan Torokeldiev Medical School of the International Ala-Too University in Bishkek
Pattern of genetic variation, fine-scale genetic structure and footprints of natural selection in populations of Juglans regia L. in the southern Kyrgyz Republic
Vladislav Govorkovskiy Illumina representative, Belarus
Poster about NGS technology
Vladislav Govorkovskiy Illumina representative, Belarus
Poster about production
Abstracts
Dr Ian M. Carr, St James’s University Hospital, University of Leeds, UK
Diagnostic mutation detection using Next Generation Sequencing
Next generation sequencing (NGS) is a relatively new technology that can quickly and cheaply
generate huge amounts of sequence data. Consequently, it has rapidly found a wide range of
applications in both basic and translational research. These application range from de novo genome
assembly of large eukaryotic genomes to amplicon sequencing of huge cohorts. NGS also promises to
revolutionise diagnostic testing where it may prove cheaper than current testing methodologies, allow
the testing of low quality samples or allow the development of completely novel diagnostic tests.
In the UK, NGS technologies are seen as the future of many DNA based testing methodologies. The
Yorkshire Regional DNA Laboratory, in Leeds, was one of the first to offer NGS based diagnoses' and
has reported on over 6,000 cases. Currently, the Yorkshire Regional DNA Laboratory uses a range of
methodologies to identify mutations ranging from single base substitutions to large structural
rearrangements. I will discuss these advances in light of the population demographics in the Yorkshire
region and how the new tests are implemented alongside current best practises that it may either
replace or augment.
Dr Deborah Dawson, NERC NBAF Centre, University of Sheffield, UK
Support for biomolecular studies of the natural environment in the UK
In the UK, support is provided for molecular studies of the natural environment by the NERC
Biomolecular Analysis Facility. The Facility provides access to high-level genomics, metabolomics and
bioinformatics through its four nodes at Sheffield, Edinburgh, Liverpool and Birmingham.
The Facility offers the very latest, class-leading technologies, including next-generation sequencing
(Illumina and Pacific Biosciences), SNP genotyping, and high resolution MS and NMR metabolomic
platforms. Applications include de novo sequencing, metagenomics, epigenetics, sequence-capture,
sequencing-based genotyping and expression profiling (RNAseq, oligoarrays and NanoString). The
Facility also supports metabolomics, medium-scale genotyping, bioinformatics and advanced data
analysis techniques (genome and transcriptome assembly and annotation, expression analysis, etc.).
Each node takes the lead in providing support in one area. At Sheffield, access is provided to laboratory
facilities, equipment, training and expertise. The main call is for the development and application of
genetic markers for use in population genetics and behavioural ecology. We also support various other
techniques, including metabarcoding for genetic studies of diet. The service at Sheffield is based on a
well-proven arrangement, in which researchers visit the laboratory to complete their own analyses
under the supervision of someone experienced in the required technology. In most cases, the majority
of the bench work will be carried out by visitors to the Facility under the supervision of Facility staff.
Training is provided, as appropriate.
The Facility has supported over 200 projects and 150 PhD students. Our users have published over
300 publications from Facility-supported studies, including large numbers in high-ranking journals
such as Nature and Science.
Dr Elizabeth J. Duncan, School of Biology, Faculty of Biological Sciences, University of Leeds, UK
Understanding the molecular mechanisms of gene-environment interactions in insects.
The phenotype of a plant or animal is dependent on interactions between their genes and the
environment. Some plants and animals are even able to generate markedly different phenotypes in
response to a change in the environment, a phenomenon known as phenotypic plasticity.
Using the honeybee (Apis mellifera) and the pea aphid (Acyrthosiphon pisum) we have a developed
an analysis pipeline to begin to understand the molecular basis of how these gene-environment
interactions occur.
The honeybee and pea aphid both change the way they reproduce in response to changes in the
environment. In the honeybee hive only one female, the queen, usually reproduces. If the queen and
her pheromone are lost from the hive this triggers the normally sterile worker bees to become
reproductively active. Using a combination of techniques including RNA-seq to measure gene
expression and immunohistochemistry to determine which cell types in the ovary are affected we
have isolated a conserved signalling pathway as key to this process, Notch signalling. Among other
roles, Notch signalling has a key function in forming and maintaining stem cell niches and I propose
that these niches are key to gene-environment responses.
Epigenetic mechanisms, such as DNA methylation and histone modifications, also play a role in altering
the way animals respond to their environment and may also regulate stem-cell niches. To investigate
the role of epigenetic mechanisms in regulating the gene-environment interactions seen in the
honeybee I have used chromatin immunoprecipitation-sequencing (to investigate a particular histone
modification) and whole genome bisulphite sequencing to determine methylation patterns across the
genome.
Ultimately I aim to determine if there are conserved signalling pathways or regulatory networks that
control plasticity amongst diverse animals. Using these relatively simple and tractable systems to
understand the mechanisms of plasticity will allow us to understand, at a whole-organism level, how
animals are responding to their environment.
Jenny C. Dunn1, Rebecca C. Thomas2, Helen Hipperson3, Keith C. Hamer2 & Simon J. Goodman2
1 RSPB Centre for Conservation Science, Royal Society for the Protection of Birds, The Lodge, Potton
Road, Sandy, Bedfordshire, SG19 2DL, UK
2 School of Biology, Irene Manton Building, University of Leeds, Leeds. LS2 9JT, UK
3 NERC Biomolecular Analysis Facility, Department of Animal and Plant Sciences, University of
Sheffield, Western Bank, Sheffield, S10 2TN, UK
Using next-generation sequencing to examine co-infection and environmental parasite transmission
Co-infection with different parasites or multiple strains of the same parasite species is common in
natural systems and has implications for disease ecology and epidemiology. Traditional methods using
PCR either detect the dominant strain or return convoluted results from Sanger sequencing. Next-
generation sequencing (NGS) provides the opportunity to detect multiple strains of parasite
simultaneously from single samples, either from individuals or the environment. Here, I will describe
the application of NGS for parasite strain identification in a declining species of migratory bird, the
European Turtle Dove Streptopelia turtur. We screened blood samples and oral swabs for
haemoparasites and Trichomonas gallinae respectively, examining a single gene region (cytochrome
b) for haemoparasites, and two gene regions (ITS and FeDH) for Trichomonas gallinae. I will discuss
the laboratory methods and the bioinformatics analysis used, and discuss the applications of the
results in the context of ecology and conservation.
Dr Antonia G P Ford, School of Biological Sciences, Bangor University, Bangor, Gwynedd, LL57 2UW,
UK
Genomic characterisation of wild tilapia populations
Tilapia cichlid fish, and particularly the genus Oreochromis, are a mainstay of tropical aquaculture.
While most focus has been on strains of Nile tilapia (Oreochromis niloticus), several aquaculture
populations make use of hybrid lines and the ready hybridization of Oreochromis species. Future strain
enhancement may further benefit from the availability of additional wild genetic resources, which
have previously been used to enhance growth, environmental tolerance, control sex ratios, and
introduce genetic resistance to disease. However, existing native wild populations are frequently
poorly characterised and threatened by invasive tilapia species. Here, I will discuss an ongoing project
aiming to characterise wild populations of Oreochromis tilapia across a region of high cichlid
biodiversity, Tanzania, East Africa. Several introduced aquaculture tilapia species are found in wild
populations throughout Tanzania, where they are thought to compete with and hybridise with native
species. The project uses next generation sequencing (Illumina HiSeq) and SNP genotyping (Agena) to
survey wild populations to examine the extent and nature of introgression.
Dr Niamh Forde, Division of Reproduction and Early Development, Leeds Institute of Cardiovascular
and Metabolic Medicine, School of Medicine, University of Leeds, UK
Using ‘Omics’ to understand successful early pregnancy events in cattle: The perspective of a
reproductive biologist.
In most mammalian species studied, the majority of pregnancy loss occurs in the first three weeks of
pregnancy. A large proportion of this loss can be attributed to asynchrony between the embryo and
the endometrium and or dysregulation of the uterine environment. A number of key events are
required to support successful early pregnancy in cattle. Specifically, an adequate post-ovulatory rise
in the hormone progesterone (P4) in circulation which to alter the endometrial transcriptome, an
appropriate uterine environment with the secretions required to drive embryo development as well
as appropriate pregnancy recognition signalling by the conceptus to the endometrium to maintain P4
concentrations in circulation and to establish uterine receptivity to implantation. The focus of my talk
will be on how we utelised ‘omic’ technologies to understand how the hormone progesterone alters
the ability of the uterus to support successful early pregnancy. In addition, I will demonstrate how
using RNA sequencing technologies helped us to identify an earlier pregnancy recognition response to
an embryo in the endometrium and proposed some biomarkers of early pregnancy in cattle. I will also
demonstrate how we have used RNA sequencing to look at how the metabolic environment of the
mother can have an impact on the transcriptome of lots of different reproductive tissues. Finally, I will
sum up the limitations and the pitfalls of using these types of technologies to address your biological
question.
Helen Hipperson, LT Dunning, WJ Baker, RK Butlin, C Devaux, I Hutton, J Igea, AST Papadopulos, X Quan, CM Smadja, CGN Turnbull, TC Wilson, VS Savolainen
NERC NBAF Centre, University of Sheffield, UK
Identifying genes affecting both adaptive divergence and reproductive isolation in Howea palms from Lord Howe Island using RNA-Seq
Howea belmoreana and Howea forsteriana are sister species of palm, both endemic to Lord Howe Island (LHI; located in the Tasman Sea between Australia and New Zealand) where they have diverged in sympatry. Originally composed solely of volcanic substrate, the deposition of calcareous soil on LHI is thought to have led to ecological speciation. Currently, H. belmoreana adults are restricted to volcanic soils whilst H. forsteriana is also found on the younger calcarenite soil. There are several ecological differences between these habitats; the calcareous soils are dryer, have higher pH, and have increased salinity compared to volcanic soil. The species are largely reproductively isolated with a five week difference in peak flowering time between them, both in the wild and when cultivated in a common garden. Differences in the peak flowering times are also maintained regardless of the soil type that H. forsteriana occurs on. Genes that have a dual role in controlling ecological adaptation and flowering time may have played a direct role in Howea speciation. To characterise such pleiotropic genes we first used RNA-Seq to identify differentially expressed genes between the Howea species using three tissue types (floral, leaf and root) sampled from 36 trees distributed across LHI. We also examined loci with divergent coding sequences. From both analyses we identified 16 candidate genes that were associated with ecological differences between the species and/or flowering time divergence, and examined the effect that eight of these genes have on flowering time in Arabidopsis knockout mutants. Finally, we put forward six plausible ecological speciation loci, providing support for the hypothesis that pleiotropy could help to overcome the antagonism between selection and recombination during speciation with gene flow.
Dr Mary O’Connell, School of Biology, Faculty of Biological Sciences, University of Leeds, UK
Comparative genomics and Mechanisms of protein evolution
The relationship between sequence and function has proven difficult to fully elucidate but it is key to
understanding what makes a species unique. Here I will describe how we can help to bridge the gaps
in our understanding of the relationships (i) between species and (ii) between genotype and
phenotype, by adequately modelling major patterns in genomic sequence data. I will present results
from a small selection of large-scale comparative genomics studies and I will describe our approach
for identifying the evolution of species-specific proteins/protein functions using genome-scale data
and computational evolutionary models. Taking an applied evolutionary approach to modelling may
provide us with an increased understanding of species-specific response to disease/drugs at the
molecular level.
Dr Chris Knight, Faculty of Life Sciences, University of Manchester
Testing evolutionary mechanisms: mutation in microbes and more
Tackling major global challenges, such as the rise in antimicrobial resistance, requires a focus on the
fundamental evolutionary processes that underlie them. We are experimentally testing the
spontaneous evolution of antibiotic resistance in different microbes. Mutation rates have been
measured using phenotypic markers, including antibiotic resistances, for over 70 years. We find
patterns in this data suggesting that dense populations may evolve resistance at a lower rate than
sparse populations. Manipulating population densities in the laboratory, in either bacteria or yeast,
we can modify the mutation rate to several different antibiotic resistances by over an order of
magnitude. We find that this ‘density associated mutation rate plasticity’ (DAMP) requires an
evolutionarily ancient mutation avoidance mechanism, but is modified or mediated in particular
lineages, including by cell-cell interactions with the surrounding community. The next level is
therefore to consider the evolution of mixed microbial communities. We are considering both
experimental (mouse gut) and broader microbial meta-genetic data (soil communities), where we are
using novel approaches to distinguish the biologically interesting signals from a range of technical
confounders. Through a combination of modelling approaches and next generation sequence data we
are gaining a closer connection between our understandings of genotypic change, phenotype and
ecology. This will contribute to addressing major issues, including antimicrobial resistance, but at the
same time help shed new, molecular, light on classically understood evolutionary processes.
Jennifer Stockdale1, Jenny Dunn1,2, Joanna Redihough1, Helen Hipperson3, William Symondson1
1Cardiff University School of Biosciences, Sir Martin Evans Building, Museum Avenue, Cardiff, CF10
3AX.
2RSPB Centre for Conservation Science, The Lodge, Sandy, Bedfordshire. SG19 2DL
3 NERC Biomolecular Analysis Facility, Department of Animal and Plant Sciences, University of
Sheffield, Western Bank, Sheffield, S10 2TN, UK
Hungry for more: Utilising next generation sequencing to determine the dietary range of different
species.
Next generation sequencing (NGS) is increasingly being used to look at the complete dietary range of
species. To date, there have been ecological analyses using molecular scatology to study the diets of
killer whales and leopards at one end of the spectrum, with specialist termite-eating spiders at the
other. Molecular analyses consequently tend to be replacing more traditional techniques of
morphological analysis of faecal samples, stomach flushing, nest cameras and direct observation to
identify dietary components. At Cardiff University we are using NGS to determine the dietary ranges
of invertebrates and vertebrates in both temperature and tropical habitats. I will discuss three
ongoing projects examining diets of the Common Crane (which eats invertebrates and plants),
thrushes in farmland (which eat invertebrates), and the European Turtle Dove (which eats plant
seeds). Work on the recently reintroduced Common Crane to the Somerset levels will provide new
insight into Eurasian Crane diet, which may aid any potential future reintroductions and will help to
sustain these birds in Britain. NGS is also being used to monitor the diets of thrushes in farmland
landscapes of different complexity enabling us to link the prey found in their faeces with the use of
landscape elements (arable field, woodland etc.). Finally, we have been able to determine implications
of diet for Turtle Dove body condition, consider changes in diet over time and usage of bespoke habitat
management options with implications for conservation management.
Morag Taylor, Susan Richman, Tim Palmer, Henry Wood, Caroline Young, Phil Quirke
St James’s University Hospital, University of Leeds, UK
The Role of Next Generation Sequencing in Colorectal Cancer Research
In the United Kingdom (UK), colorectal cancer (CRC) is the fourth most common cancer, and the
second most common cause of cancer related deaths. It’s important to understand both the
prognostic and predictive markers of CRC to improve these statistics. The advent of next generation
sequencing (NGS) is allowing us to explore tumour profiling and mutation screening in new ways,
advancing the role of molecular pathology.
We are part of several multicentre clinical trials investigating CRC biomarkers to determine patient
treatment. Until now, we have favoured the lower cost sequencing alternatives, but with the
continuing reduction in sequencing costs, and the need to adapt assays quickly whilst keeping
technical time down, we are developing a pipeline to move our clinical trials to NGS.
We have investigated genomic heterogeneity in CRC. Using copy number variation data from NGS, we
have analysed the primary tumours and all distant metastases from eight patients who died of
advanced CRC. We generated phylogenetic trees for each patient to follow the evolution of the
disease.
The human microbiome has been studied for many years, but NGS has made this study area more
accessible. It has allowed for the identification of bacteria that were previously un-culturable. Studies
have shown the gut microbiome plays a role in CRC, but as yet, this isn’t fully understood. In order to
investigate this, we are developing a pipeline to study the human microbiome isolated from guaiac
faecal occult blood test cards, used worldwide to screen for CRC. We will follow this by doing a UK
study to investigate if the microbiome can be used as a future tool to predict CRC using 16s rRNA
sequencing on an Illumina MiSeq.
Ainur Akilzhanova
Laboratory of Genomic and Personalized Medicine, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
Genomic research in Kazakhstan: challenges and opportunities for clinical applications
Technological advancements are rapidly propelling the field of genome research forward. Advances in genetics and genomics such as the sequence of the human genome, the human haplotype map, open access databases, cheaper genotyping and chemical genomics have already transformed basic and translational biomedical research. At the National Laboratory Astana (NLA), Center for Life Sciences, Nazarbayev University several projects in the field of genomic and personalized medicine are conducting. The prioritized areas of research include genomics of multifactorial diseases, cancer genomics, bioinformatics, genetics of infectious diseases and population genomics. At present, DNA-based risk assessment for common complex diseases, application of molecular signatures for cancer
diagnosis and prognosis, genome-guided therapy, and dose selection of therapeutic drugs are the important issues in personalized medicine.
Kazakhstan is a unique country located in the middle of Central Asia, laying on the ancient Great Silk road. Kazakh populations have been strongly influenced by the nomadic lifestyle, and a long history of migration has led to admixture of western and Asian populations, which has molded the genetic architecture. Thus it is crucial to understand the genetic background of ethnic Kazakhs to properly investigate the genetic basis of common diseases or traits in Kazakh populations. To develop a personalized medicine program for Kazakhstan, we first need acquire personal genomic data for Kazakhs. To do so, we need a core of scientists who can: (1) design proper studies; (2) diagnose accurately; (3) sequence efficiently (using multiomic technologies); (4) analyze and maintain massive sequence data; (5) analyze the relations between genetic variants and phenotypes (i.e., disease status or biomarkers.
To further develop genomic and biomedical projects at NLA and in Kazakhstan the development of bioinformatics research and infrastructure is essential, as well as establishment of new collaborations in this field.
Widespread use of genetic tools will allow the identification of diseases before the onset of clinical symptoms, the individualization of drug treatment, and could induce individual behavioral changes on the basis of calculated disease risk. However, many challenges remain for the successful translation of genomic knowledge and technologies into health advances, such as medicines and diagnostics.
It is important to integrate research and education in the fields of genomics, personalized medicine and bioinformatics which will be possible with opening of new Medical Faculty in Nazarbayev University. Educating both those in practice and those in training about key concepts of genomics and, importantly, engaging them in the design of how this knowledge will be applied most effectively will rapidly bring the era of genomic medicine to patient care, resulting in improved health. And all of this must be based on good research and scientific platform which requires development of well-equipped modern laboratories, bioinformatics, qualified trained physicians and laboratory staff and understanding policy among population of the country.
Геномные исследования в Казахстане: проблемы и возможности для клинических приложений
Технологические достижения последних десятилетий продвинули область геномных и мультиомных исследований вперед. Достижения в области генетики и геномики, такие как определение последовательности генома человека, карты гаплотипов человека, базы данных открытого доступа, удешевление генотипирования и химической геномики уже трансформировали фундаментальные и трансляционные биомедицинские исследования. В National Laboratory Astana (NLA), Центре наук о жизни Назарбаев Университета реализуются проекты в области геномной и персонализированной медицины. Приоритетные направления исследований: геномика многофакторных заболеваний, рака, биоинформатика и вычислительная системная биология, генетика инфекционных заболеваний и популяционная геномика, мультиомные исследования, разработка и внедрение NGS методов в практику. В настоящее время оценка риска общих сложных заболеваний на основе изучения ДНК, применение молекулярных сигнатур для диагностики и прогноза рака, геном ассоциированной терапии и подборе дозы терапевтических препаратов являются важными вопросами персонализированной медицины.
Казахстан является уникальной страной, расположенной в центре Центральной Азии, лежит на древнем Великом шелковом пути. Казахстанское население было под сильным влиянием кочевого образа жизни, а также имеет долгую историю миграции, что привело к смеси западных и азиатских популяций, которые сформировали генетическую архитектуру народа. Таким образом, крайне важно, чтобы понять генетический фон этнических казахов, чтобы должным образом исследовать генетическую основу общих заболеваний или признаков у казахского населения. Для того, чтобы разработать индивидуальную программу медицины для Казахстана, мы в первую очередь необходимо получить личные данные геномов казахстанцев. Для этого нужен пул ученых, которые могут: (1) разработать и проводить надлежащие исследования; (2) точно диагностировать; (3) эффективно определять последовательности с использованием мультиомиксных технологий; (4) анализировать и поддерживать большие массивы данных последовательности; (5) анализировать отношения между генетическими вариантами и фенотипами (т.е. статусом заболевания или биомаркерами) и др…
В целях дальнейшего развития геномных и биомедицинских исследований в NLA и в Казахстане имеет важное значение развитие инфраструктуры и биоинформатики, а также налаживание сотрудничества с международными лабораториями и консорциумами в этой области, с клиниками и вузами и научными коллективами.
Широкое использование генетических инструментов позволит идентифицировать заболевания до появления клинических симптомов, индивидуализации лекарственной терапии, и может вызвать индивидуальные поведенческие изменения на основе
расчетного риска заболевания. Тем не менее, многие проблемы остаются актуальными для успешного перевода геномных знаний, технологий и достижений в практику в области здравоохранения.
Важно проводить интеграцию научных исследований и образования в области геномики, персонализированной медицины и биоинформатики, который становится возможным с открытием нового медицинского факультета в Назарбаев Университете. Обучение как на практике, так и в образовательном процессе ключевым концепциям геномной и персонализированной медицины и, что немаловажно, вовлечение в дизайн, как это знание будет наиболее эффективно применяться в клинической практике, могут способствовать более быстрому внедрению эры геномной медицины в Казахстане, что может привести к улучшению здоровья народа. И все это должно быть основано на хорошей исследовательской и научной платформе, которая требует хорошо оборудованных современных лабораторий, развития биоинформатики, наличия квалифицированных подготовленных врачей и персонала лаборатории, а также подготовленного населения страны.
Ulykbek Kairov
Laboratory of Bioinformatics and Computational Systems Biology, Center for Life Sciences,
National Laboratory Astana, Nazarbayev University
Analysis of human whole-transcriptome sequencing data from Illumina HiSeq2000 platform
The high-throughput genomic technologies and particularly Illumina HiSeq2000 next-generation
sequencing platform have a major impact on studying cancer. Illumina HiSeq2000 NGS platform
generating up to 600 GB of sequencing data per run. Huge amount of sequencing data requires
application of reproducible bioinformatics methods, mathematical and statistical approaches for
analysis. Transcriptomic profiling of cancer specimens with Illumina HiSeq2000 NGS platform has
provided a comprehensive opportunity for in-depth investigation of gene expression and affected
molecular pathways. In our study we aimed to perform comprehensive analysis of sequencing data
from HiSeq2000 platform to identify affected molecular pathways and extract meaningful molecular
signals from oesophageal cancer specimens of Kazakhstani patients.
Анализ данных полных транскриптомов с платформы секвенирования нового поколения
Illumina HiSeq2000.
Высокопроизводительные геномные технологии, в частности, платформа секвенирования
нового поколения Illumina HiSeq2000, являются значимыми в современном изучении
онкологических заболеваний. Платформа секвенирования нового поколения Illumina HiSeq2000
генерирует до 600 Гб данных за один запуск. Генерируемые огромные массивы данных требуют
применения воспроизводимых биоинформатических методов и нестандартных
математических и статистических подходов анализа. Транскриптомное профилирование
опухолевых образцов с применением платформы Illumina HiSeq2000 NGS открывает новые
возможности масштабного исследования генетической экспрессии и поиска ключевых
молекулярных сетей. Наше исследование направлено на проведение всестороннего анализа
генетической экспрессии для поиска молекулярных сигналов в транскриптомных профилях
казахстанских пациентов с диагнозом рак пищевода.
Ulykbek Kairov
Laboratory of Bioinformatics and Computational Systems Biology, Center for Life Sciences, National
Laboratory Astana, Nazarbayev University
Meta-analysis of cancer transcriptome profiles using Independent Component Analysis
The high-throughput genomic technologies such a microarray technology and next-generation
sequencing have a major impact on studying cancer. Huge amount of genomic data requires
application of reproducible analytical approaches. In our study we demonstrated application of
Independent Component Analysis method to do meta-analysis of breast cancer gene expression data.
We identified from 7 to 8 reproducible components in all four breast cancer datasets and developed
graph-based approach to meta-analysis and interpretation of these independent components such
that each of them was associated with a small gene network. Using analysis of these networks, we
provided a tentative interpretation of stably reproducible components. Thus, we found that various
factors such as proliferation, immune response, contamination of tumor cells by lymphocytes and
normal tissues affect gene expression in breast cancer.
Мета-анализ раковых транскриптомов с применением Метода Независимых Компонент.
Высокопроизводительные геномные технологии, такие как технологии высокоплотных
микрочипов и секвенирования нового поколения Illumina HiSeq2000, вносят значительный
вклад в современное изучение онкологических заболеваний. Огромные массивы геномных
данных требуют применения воспроизводимых аналитических методов. В нашем исследовании
мы продемонстрировали способо применения Метода Независимых Компонент для мета-
анализа наборов данных с раком молочной железы. Было обнаружено от 7 до 8
воспроизводимых компонент во всех наборах рака молочной железы и разработали подход с
применением теории графов для проведения мета-анализа и интерпретации независимых
компонент.
Saule Rakhimova
National Laboratory Astana, Nazarbayev University)
Transcriptome profiling of oesophageal cancer: from biomaterial sampling to sequencing on
HiSeq2000.
The report presents the study of transcriptome profile of esophageal squamous cell carcinoma using
NGS technology. Description of work includes the following steps: sampling of biological material,
nucleic acids isolation, library preparation, library validation methods used in the laboratory.
Esophageal cancer is the sixth common cancer in Kazakhstan, and usually not detected until it has
progressed to an advanced incurable stage. More than 80% of the cancer cases and deaths occur in
developing countries and Central and East Asia. Aim of study: to identify genetic basis of esophageal
cancer by performing whole human transcriptome sequencing study in Kazakhstan.
Patient recruitment was carried out on the Thoracic surgery department, Oncology Center, Astana.
We include only patient with confirmed informed consent and confirmed diagnosis of esophageal
squamous cell carcinoma, to whom was performed radical surgery (Ivor Lewis esophagectomy), and
was available blood analysis, biochemical data, CT, X-ray, histopathological data.
Materials: pairs of freshly frozen (after RNA later solution) esophageal cancer tissue specimen and
normal tissue specimen.
Methods: RNA isolation, Library preparation, Library validation, Hybridization on flow cell, Sequencing
on HiSeq 2000.
For RNA isolation and purification was used Qiagen kits, for library preparation was used Tru Seq RNA
sample preparation kit, all procedures were performed according to Illumina protocols.
Сауле Рахимова (ЧУ «National Laboratory Astana», Назарбаев Университет) – Транскриптомный
профиль рака пищевода: от забора материала до секвенирования на HiSeq2000.
В докладе представлено использование NGS технологии в выполнении научного проекта по
изучению транскриптомного профиля плоскоклеточного рака пищевода. Описание работы
включает следующие этапы: забор биоматериала, выделение нуклеиновых кислот, подготовку
библиотек, методы валидации библиотек использованные на базе лаборатории.
Рак пищевода занимает шестое место в структуре онкопатологии в Казахстане, и, как правило,
не обнаруживается, пока заболевание не прогрессирует до запущеных стадий. Более 80%
случаев заболеваемости и смертности приходится на развивающиеся страны и страны
Центральной и Восточной Азии. Цель исследования: выявление генетических основ рака
пищевода на основе исследования полного транскриптома в Казахстане.
Рекрутинг пациентов проводилось на базе отделения торакальной хирургии, онкологического
центра, г. Астана. В исследование были включены пациенты с: подписанными формами
информированного согласия, подтвержденным клиническим диагнозом - плоскоклеточного
рака пищевода, которым проводилась радикальная операция (Ivor Lewis Esophagectomy), с
результатами анализа крови, биохимических данных, КТ, рентген-обследования,
гистологическим заключением.
Материалы: пара свежезамороженной (либо образца тканей в РНК стабилизирующем растворе)
ткани пищевода с нормального участка и центра опухоли.
В работе использованлись методы выделения и очистки РНК, подготовка библиотек, различные
методы валидации библиотек, гибридизация библиотек на проточную ячейку, секвенирование
на HiSeq 2000
Для выделения и очистки РНК использовали наборы Qiagen, для подготовки библиотеки
использовали Tru Seq RNA sample preparation kit, все процедуры проводились в соответствии с
протоколами Illumina.
Vladislav Govorovskiy
Illumina representative, Belarus
Application of Illumina NGS-technologies in healthcare, research and agriculture
Next-generation sequencing (NGS) technologies transform biological and medical research.
Researchers around the world use next-generation sequencing systems to drive genetic analysis at
higher rate.
Ongoing development of Sequence By Synthesis (SBS) technology provides possibilities to drive
various researches in spheres of interest: science, healthcare, agriculture, forensic, reproductive
medicine. Development of modern devices, such as MiniSeq, HiSeq 4000, HiSeq X, has provided
customers with new possibilities in the NGS sphere. Computational power of those machines coupled
with ongoing designing of kits and panels it has opened great prospects for modern research.
Illumina platform also allow using alternative methods of sample preparation that extends the
potential use of the system. Variety of discussed methods and their potential combinations provide
considerable scope expansion for modern science and medicine.
Секвенирование нового поколения преобразило исследования в сфере биологии и медицины.
Исследователи по всему миру используют системы NGS для продвижения генетических
анализов до ранее недостижимого уровня.
Постоянное совершенствование технологии Sequence By Synthesis (SBS) от компании Illumina
даёт возможность проводить всё более разносторонние исследования в разнообразных сферах
научных исследований, здравоохранения, сельского хозяйства, криминалистики,
репродуктивной медицины.
Появление современного оборудования, такого как MiniSeq, HiSeq 4000, HiSeq X, дало
пользователям новые возможности в сфере NGS, а вместе с постоянно совершенствующимися
наборами и панелями, это открыло огромные перспективы для современных исследований.
Платформа Illumina также дает возможность использовать альтернативные методики
пробоподготовки, что расширяет потенциал использования данной системы.
Разнообразие и комбинация данных методов и их потенциальные комбинации предоставляют
значимое расширение границ для современной науки и медицины.
Askhat Molkenov
Nazarbayev University, Kazakhstan
Peculiarities of Bioinformatics Processing and Data Conversion from Illumina HiSeq2000
High throughput next generation sequencing platforms provided new opportunities to scientists in
genomic research field. Nowadays there are carried out large-scale genomic studies of different
organisms, including humans, animals, plants and bacteria with the usage of next generation
sequencing technologies. Modern bioinformatics is a synthesis of biological, information and technical
disciplines aimed to solve scientific problems. In this report, I will present some methods and examples
used in the daily analytical protocols on the base of Laboratory of Bioinformatics and Computational
Systems Biology for the analysis of genomic data from Illumina HiSeq 2000.
Особенности биоинформатической обработки и преобразования данных с платформы HiSeq
2000.
Высокопроизводительные платформы секвенирования нового поколения открыли перед
учеными новые возможности для геномных исследований. В настоящее время проводятся
масштабные геномные исследования различных организмов, в том числе людей, животных,
растений и бактерий с применением технологий секвенирования нового поколения.
Современная биоинформатика представляет собой синтез биологических, информационных и
технических дисциплин, направленных на решение научных задач. В своем докладе я
представлю некоторые методы и примеры, используемые в ежедневных аналитических
протоколах на базе Лаборатории Биоинформатики и вычислительной системной биологии для
анализа геномных данных с Illumina HiSeq2000.
Saule Daugalieva
Institute of Microbiology and Virology, Kazakhstan
NGS 16S sequencing for microbial identification
Laboratory shared of Institute Microbiology and Virology was established in 2014. In the laboratory,
performed the molecular genetic studies on research projects carried out in our institute. The
laboratory is equipped with modern equipment and everything necessary for the research to date. In
the laboratory there are: an Eppendorf PCR cycler, real-time PCR Applied Biosystems 7500, 8-capillary
sequencer Applied Biosystems 3500, next generation sequencer MiSeq Illumina. In addition, there are
accessories: spectrophotometer Quibit, Ajilent Bioanalyzer 2100 gel documentation system Vilber
Lourmat ECX- F15.M, and chromatography mass spectrometer Shimadzu LCMS-860.
Department of Microbiology of our Institute conducted Molecular following types studies of
microorganisms: oil degraded, cellulose degraded, nitrogen-fixing, lactic acid, plant pathogens,
bacteria, fungi and yeast. The main areas of research are the identification of microorganisms by PCR
analysis and sequencing, full genome analysis, and identification of specific genes. In the near future
we plan to hold the soil and water metagenomic analysis from different regions of Kazakhstan and of
the environment.
In 2014, we performed full genome analysis of 14 species of bacteria on the NGS-sequencer MiSeq
Illumina. At this sequencer we performed 16S metagenomic analysis of 120 strains of bacterial
cultures. Following the acquisition of capillary sequencer, we have conducted with the help of the
identification of 12 species of fungi, and 8 species of yeast, as well as 110 species of bacteria.
When conducting full genome analysis on MiSeq instrument we used a set of sample preparation and
Nextera XT kit for sequencing MiSeq Kit v2.
16S metagenomic analysis for libraries prepared using indexes Nextera XT Index Kit (24 Indexes, 96
Samples) Illumina using KAPA HIFI HOTSTART READY MIX. Purification was carried out using reagent
Ampure XP beads on the magnetic stand. Quantity and quality of the libraries was determined with a
spectrophotometer Quibit 2.0, and 2100 Bioanalyzer Ajilent 2100 and by horizontal gel
electrophoresis. These libraries were normalized and pooled. As a control, was added Phix Control v.3.
Sequencing was performed using a set MiSeq Kit v.2 (500 cycles) and MiSeq Kit v.3 (600 cycles). The
processing of the results was performed using MiSeq Reporter program and 16S metagenomic
program on Illumina website.
Лаборатория коллективного пользования института микробиологии и вирусологии создана в
2014 году. В лаборатории выполняются молекулярно-генетические исследования по научным
проектам, проводимым в нашем институте. Лаборатория оснащена современным
оборудованием и всем необходимым для проведения научных исследований на современном
уровне. В лаборатории имеются: ПЦР-амплификатор Eppendorf, ПЦР реал-тайм Applied
Biosystems 7500, 8-капиллярный секвенатор Applied Biosystems 3500, секвенатор нового
поколения MiSeq Illumina. Кроме того, имеется вспомогательное оборудование:
спектрофотометр Quibit, биоанализатор Ajilent 2100, система документирования гелей Vilber
Lourmat ECX- F15.M, а также система хроматомасспектрометрии Shimadzu LCMS-860.
Отделом микробиологии института проводятся молекулярные исследования следующих видов
микроорганизмов: нефтеокисляющих, целлюлозолитеческих, азотфиксирующих,
молочнокислых, бактерий-фитопатогенов, грибов и дрожжей. Основными направлениями
исследований являются идентификация штаммов микроорганизмов методами ПЦР-анализа и
секвенирования, полногеномный анализ, а также идентификация определенных генов. В
ближайшее время мы планируем проведение метагеномного анализа почвы, воды различных
регионов Казахстана и объектов окружающей среды.
В 2014 году нами проведен полногеномный анализ 14 видов бактерий на NGS-секвенаторе
MiSeq Illumina. На данном секвенаторе нами проведен 16S метагеномный анализ около 120
штаммов бактериальных культур. После приобретения капиллярного секвенатора, мы провели
с его помощью идентификацию 12 видов грибов и 8 видов дрожжей, а также 110 видов
бактерий.
При проведении полногеномного анализа на приборе MiSeq мы использовали набор для
пробоподготовки Nextera XT и набор для секвенирования MiSeq Kit v2.
Библиотеки для 16S метагеномного анализа готовили с помощью индексов Nextera XT Index Kit
(24 Indexes, 96 Samples) Illumina с применением KAPA HIFI HOTSTART READY MIX. Очистку
проводили с помощью реагента Ampure XP beads на магнитном штативе. Количество и качество
библиотек определяли на спектрофотометре Quibit 2.0, биоанализаторе Ajilent 2100 и методом
горизонтального гель-электрофореза. Полученные библиотеки нормализовали и объединяли.
В качестве контроля добавляли Phix Control v.3. Секвенирование проводили с помощью набора
MiSeq Kit v.2 (500 циклов) и MiSeq Kit v.3 (600 циклов). Обработку полученных результатов
проводили с помощью программы MiSeq Reporter и программы 16S metagenomic на сайте
Illumina.
Raushan Nugmanova
National Center for Biotechnology, Kazakhstan
Study of mutation clusters using Ion Torrent
Phenomenon of nonuniform pattern of mutations in the genome has been observed for many years.
Recent studies have shown presence of certain mutation clusters in cancer genomes, yeast cells, Big
Blue mice, retroelements as well as in bacteria under the pressure of the DNA damaging agents. Such
clusters were detected in the particular regions of the genome, accumulating within a number of
generations. Deep understanding of mutagenesis effect became possible with the development of
next generation sequencing, emergence of which provides deep and sensitive analysis of broad range
of mutations at high speed, generating high quality data. Therefore such approach is widely used in
genome-wide studies. The study of mutagenesis in bacteria is crucial as it might work as a potential
anti-bacterial treatment or may show new aspects of bacterial genome organization. As the previously
conducted study revealed presence of mutation clusters in the several E.coli genomes after the
mutagenesis by ethyl methanesulphonate (EMS), it is important to see whether this phenomenon is
unique only for Gram-negative E.coli, or also might be found in Gram-positive bacteria species as
B.subtilis after the EMS treatment. The use of Ion-Torrent Next-Generation Sequencing technology
allows analyzing several bacterial genomes in one run. The results of the current study showed
presence of mutation clusters in the genome of B.subtilis. In addition, further work is required to
understand molecular basics of mutation clusters in ΔAda and ΔMutS E.coli strains.
Феномен неравномерного рспределения мутаций в геноме наблюдается на протяжении многих
лет. Недавние исследования показали наличие определенных кластеров мутаций в геномах
рака, дрожжах, Big Blue mouse, ретроэлементах, а также в бактериях под действием ДНК-
повреждающих агентов. Подобные кластеры были обнаружены в определенных регионах
генома, накопливаясь в течение нескольких поколений. Глубокое понимание эффекта
мутагенеза стало возможным с развитием секвенирования нового поколения, появление
которого обеспечивает детальный и точный анализ широкого спектра мутаций на высокой
скорости, генерируя высококачественные данные. Поэтому подобный метод широко
распространен в геномных исследованиях. Изучение мутагенеза в бактериях необходимо, так
как это может стать потенциальным антибактериальным лечением или может открыть новые
аспекты организации генома бактерий. Так как предыдущее исследование показало наличие
кластеров в геноме кишечной палочки после мутагенеза этилметаносульфонатом, важно
проследить является ли данный феномен уникальным только для грамотрицательнойой E.coli
или же также присутствует в грамоположительном B.subtilis. Использование технологии Ion-
Torrent Next-Generation Sequencing позволяет проанализировать несколько геномов бактерий
за один пробег. Результаты данного исследования показали наличие мутационных кластеров в
геноме B.subtilis. К тому же требуется дальнейшая работа, чтобы понять молекулярные основы
кластеров мутаций в штаммах ΔAda и ΔMutS E.coli.
Alexander Shevtsov
National Center for Biotechnology, Kazakhstan
NGS sequencing of veterinary pathogens
The previous two decades have led to a reduction in saiga populations by 95%, which connected with
the uncontrolled providence in the period 1994-2003. Various measures helped to reverse the
situation, and in 2013 in Kazakhstan, the number of saiga population has increased 5 times and
amounted to 110 thousand. However, despite the growth of the saiga population in Kazakhstan, they
are still in danger of extinction from infectious diseases. The main cause of the mass death of saiga in
Kazakhstan was recognized as pasteurellosis, a zoonotic disease of vertebrate animals, which is the
etiological agent of P. multocida. Despite the high ecological damage done by pasteurellosis there is a
little information about the genetic factors of the high pathogenesis of causative agent selected from
the saiga. In this research there was carried out whole genome sequencing of three strains of P.
multocida. The strain of P. multocida Z-1 was isolated from the Ural population fallen during the
outbreak in 2010 which killed a third of the population (11,920 individuals). Strains of P. multocida Z-
3 and P. multocida K-1 isolated from Betpakdalasaiga populations during outbreaks of 2012 and 2013.
Whole genome sequencing with the using IonTorrent allowed to get 2,184,434 readings for strain P.
multocida Z-1, 2,212,653 readings for strain P. multocida Z-3, 1,893,014 readings for strain P.
multocida K-1, with an average length of about 160 bp reads . The collected genomes were as follows:
2288383, 2336270 and 2303903 bp respectively. Despite the fact that two saiga populations do not
cross in the wild, strains of P. multocida isolated from them have a large set of identical genes (2025),
which is comprised of 92.6%-95,8% of the predicted proteins, which exceeds the numerical value of
the major genes previously analyzed Pasteurella spp. Meanwhile the strains from bekpakdalasaiga
populations have the largest pool of common genes compared with the Ural population. A
comparative analysis of the genomes of strains isolated from Saiga 11 with the genomes of strains
isolated from mammals and 3 genomes of birds revealed the unique genes of strain Z1 (24 genes), Z3
(35 genes) and K1 (21 genes). Most of these genes are identical bacteriophages or were small
predicted proteins of unknown function (s1). 40 genes have been characterized for all three analyzed
strains.
Предшествующие два десятилетия привели к сокращению популяций сайги на 95%, что связано
с неконтролируемым промыслом в период 1994-2003 годов. Различные меры позволили
переломить ситуацию, и уже в 2013 году в Казахстане численность популяции сайгаков
увеличилась в 5 раз и составила 110 тыс. Однако, несмотря на рост популяций сайги в
Казахстане, им до настоящего времени угрожает опасность исчезновения от инфекционных
заболеваний. Основной причиной массовой гибели сайгаков в Казахстане был признан
пастереллез, зоонозная болезнь позвоночных животных, этиологическим агентом которой
является P.multocida. Несмотря на высокий экологический урон от данного заболевания мало
информации о генетических факторов высокого патогенеза возбудителя выделенного от сайги.
В данном исследовании было проведено полногеномное секвенирование трех штаммов P.
multocida. Штамм P. multocida Z-1 был изолирован от сайгака уральской популяции павшего в
период вспышки 2010 г. унесшей треть популяции (11920 особей). Штаммы P. multocida Z-3 и P.
multocida К-1 изолированы от сайгаков бетпакдалинской популяции в период вспышек 2012 и
2013 годов.
Полногеномное секвенирование с использованием IonTorrent, позволило получить 2,184,434
прочтений для штамма P. multocida Z-1, 2,212,653 прочтений для штамма P. multocida Z-3 и
1,893,014 прочтений для штамма P. multocida K-1, со средней длиной прочтений около 160 п.н.
Собранные геномы составили: 2288383, 2336270 и 2303903 п.н.
соответственно. Несмотря на то что две популяции сайги в естественных условиях не
пересекаются, штаммы P. multocida выделенные от них имеют большой набор идентичных
генов (2025), который составил 92,6%-95,8% предсказанных белков, что превышает численное
значение основных генов ранее анализируемых пастерелл. При этом штаммы от сайгаков
бекпактдалинской популяции имеют наибольший пул общих генов, в сравнении с уральской
популяцией. Сравнительный анализ геномов штаммов выделенных от сайги 11 с геномами
штаммов выделенных от млекопитающих и 3 геномами птиц позволил выявить уникальные
гены для штамма Z1 (24 гена), Z3 (35 генов) и K1 (21 ген). Большинство из этих генов идентичны
бактериофагам или короткими белками с неизвестной функцией. Сорок генов были характерны
для всех трех анализируемых штаммов.
Aizhan Turmagambetova
Institute of Microbiology and Virology, Kazakhstan
Detection of viruses in environmental samples using NGS
Diagnostics of viral infections is on the verge of creating of new theories, hypotheses and discoveries
with the advent of NGS. This is due to several reasons, the most important of which are a multiple
increase of data about the availability of viruses in the environment, including soil, water, feces, air,
etc., as well as the ability to analyze of viruses without their cultivation.
In our research, we studied the biodiversity of viruses in the water reservoirs of Almaty region.
Sequencing was carried out by a double-barrel shotgun method. In this case the useful information
could be obtained by paired-end sequencing of DNA fragment. These two sequences are oriented in
opposite directions and along of the length of the fragment can be separated from each other, and
also can be used for genome assembling using different software. In our research was used the HiSeq
sequencing system and Edena software. Total contigs were 447,000 with a length of 200 to 80,000 bp.
The Metavir2 program selected the 184,431 contigs and the 249,780 of which was identified as viral
gene sequences, and 157,000 of which are previously unknown viral sequences.
Bacteriophages, algae viruses and viruses of the protozoa were the 97% of total viruses of this water
sample. Other 3% included the viruses capable of causing of the disease of animals, higher plants and
humans. Among them: 2 families of retro-transcribing viruses (Retroviridae, Caulimoviridae), 2
families of single-stranded RNA viruses (ssRNA viruses), family of single-stranded DNA virus (ssDNA
viruses - Inoviridae), family of double-stranded RNA virus (dsRNA viruses - Endornaviridae) and 20
families of double-stranded DNA viruses (dsDNA viruses, among them Herpesviridae, etc) were
detected.
Thus, the NGS is opening a new era in the development of monitoring of viral infections that allows
take a different look at the ecology of viruses.
Диагностика виусов в образцах окуружающей среды с помощью массивного параллельного
секвенирования
С появлением массивного параллельного секвенирования (NGS – next generation sequencing)
диагностика вирусных инфекций стоит на пороге создания новых теорий, гипотез и получения
новых открытий. Это обусловлено рядом причин основными из которых являются многократное
увеличение данных о наличии вирусов в окружающей среде, включая почву, воду, экскременты,
воздух и т.д., а также способность анализировать наличие вирусов без их предварительного
культивирования.
В наших исследованиях проводилось изучение биоразнообразия вирусов в водоемах
Алматинской области. Секвенирование проводилось методом двуствольного дробовика. В этом
случае полезная информация может быть получена при секвенировании парных концов
фрагмента ДНК. Эти две последовательности ориентированы в противоположных направлениях
и по длине фрагмента могут быть отдельны друг от друга, но могут быть использованы для
сборки геномов с помощью различного программного обеспечения. В наших исследованиях
был использован секвенатор HiSeq и программа Edena.
Было получено 447000 контигов с длиной от 200 до 80000 пар оснований, из которых программа
Metavir2 отобрала 184431, в которых идентифицировала последовательности 249780 вирусных
гена, при этом 157000 из них, это ранее неизвестные вирусные последовательности.
97% вирусов данного водного образца составляли бактериофаги (5 семейств), вирусы
водорослей (1 семейство) и простейших (1 семейство). Остальные 3% пришлись на вирусы
способные вызывать заболевания животных, высших растений и человека. Обнаружено 2
семейства ретро транскрибируемых вирусов (Retroviridae, Caulimoviridae), 2 семейства вирусов
с одноцепочечной РНК (ssRNA viruses), 1 семейство вирусов с одноцепочечной ДНК (ssDNA
viruses - Inoviridae), 1 семейство вирусов с двуцепочечной РНК (dsRNA viruses - Endornaviridae) и
20 семейств вирусов с двуцепочечной ДНК (dsDNA viruses, среди которых такие как Herpesviridae
и т.д.).
Таким образом, NGS открыло новый этап в развитии мониторинга вирусных инфекций,
позволяющий по-другому взглянуть на экологию вирусов.
Kobey Karamendin
Institute of Microbiology and Virology, Kazakhstan
NGS 16S sequencing of necropsy material from Saiga antelope after a mass die-off in Spring 2015
During metagenomic studies using MiSeq sequencer to identify bacterial infections pathogens in Saiga
it was determined that 89.05% of all short reads were of bacteria of the genus Pasteurella, among
which the Pasteurella multocida species reached 48.32%. Other species were: Pasteurella eae - 10.75
%, Pasteurella pneumotropica - 4.06 %, Unclassified at Species level - 34.91 %.
При метагеномных исследованиях на секвенаторе MiSeq для выявления возбудителей всех
бактериальных инфекций определено, что 89.05 % всех коротких прочтений составляли
бактерии рода Pasteurella, среди которых преобладал вид Pasteurella multocida и составил 48,32
%. Из других видов обнаружены: Pasteurella eae - 10.75 %, Pasteurella pneumotropica - 4.06 %,
неопознанные виды - 34.91 %.
Participants
Name Area of research Country Institute
Abdikerim, Saltanat Molecular genetics KZ IGGC
Akhmetova, Ainur Genetics of Human Diseases KZ NU
Akilzhanova, Ainur Genomic and Personalized medicine KZ NU
Alexyuk, Madina antiviral protection research, metagenomics KZ IMV
Amirbekov, Aday Immunogenetic aspects of cancer screening KZ MU
Amirgazin, Asylulan Bacterial genomics KZ NCB
Bogoyavlenskiy, Andrey antiviral protection research, metagenomics
KZ IMV
Daugaliyeva, Saule Microbiology, metagenomics KZ IMV
Jantayeva, Kira Population genetics KZ MU
Jarmukhanov, Zharkyn Human genetics KZ NCB
Kachieva, Zulfiya Human diseases KZ MU
Kahbatkyzy, Nurzhibek Population genetics KZ IGGC
Kairov, Ulykbek Bioinformatics & KZ NU
Kamalova, Dinara Bacterial genomics KZ NCB
Karamendin, Kobey Viral ecology, evolution KZ IMV
Kozhamkulov, Ulan Microbiology, molecular epidemiology KZ NU
Kulnazarov, Batyr Microbiology, metagenomics KZ IMV
Kuzovleva, Elena Population genetics KZ IGGC
Kydyrmanov, Aidyn Viral ecology, evolution KZ IMV
Moldakozhayev, Alibek Viral ecology, evolution KZ IMV
Nugmanova, Raushan Bacterial genomics KZ NCB
Nurmoldin, Shalkar Thyroid cancer research KZ MU
Perfilyeva, Anastasiya Molecular genetics KZ IGGC
Rakhimova, Saule Genetic studies of multifactorial diseases KZ NU
Shevtsov, Alexandr Bacterial genomics KZ NCB
Turmagambetova, Aizhan antiviral protection research, metagenomics
KZ IMV
Zholdybayeva, Elena Viral genetics KZ NCB
Zhunussova, Gulnur Molecular genetics KZ IGGC
Torokeldiev, Nurlan Population genetics KRG IAUB
Zhanibek Egizbayev Illumina representative KZ ILLM
Govorovskiy, Vladislav Illumina representative BLR ILLM
Carr, Ian Bioinformatics & health UK UoL
Dawson, Deborah Population & ecological genetics UK UoSh
Duncan, Elizabeth Genomics & evolutionary biology UK UoL
Dunn, Jennifer Disease ecology, conservation UK RSPB
Ford, Antonia Population & ecological genetics, genomics UK UoB
Forde, Niamh Reproductive biology UK UoL
Goodman, Simon Population genetics, disease ecology, conservation UK UoL
Hipperson, Helen Bioinformatics & population genetics UK UoSh
Knight, Christopher Microbial systems biology UK UoM
O'Connell, Mary Computational biology UK UoL
Stockdale, Jennifer Disease ecology, conservation UK UoC
Taylor, Morag Cancer genetics UK UoL
KZ – Kazakhstan
UK – United kingdom
KRG – Kyrgyzstan
BLR - Belarus
UoL - University of Leeds
UoM - University of Manchester
UoSh - University of Sheffield
UoC - University of Cardiff
UoB - University of Bangor
RSPB - Royal Society for Protection of Birds
ILLM – Illumina Corp. IMV - Institute of Microbiology and Virology
MU – Medical University
NCB - National Center for Biotechnology
IGGC – Institute of General Genetics and Cytology
NU - Nazarbayev University
IAUB - International Ala-Too University in Bishkek