Enhancing capacity for next generation sequencing (NGS ......Diagnostic mutation detection using...

Enhancing capacity for next generation

sequencing (NGS) and genomics in health,

agricultural, ecological and environmental

applications in Kazakhstan

A Newton-Al-Farabi Partnership Programme Researcher links

workshop

September 20th-23rd 2016

Hotel Grand Voyage, Almaty, Kazakhstan

Organised by School of Biology, Univeristy of Leeds, UK &

Institute of Microbiology & Virology, Almaty, Kazakhstan

Workshop Overview

Next generation sequencing and genomics are technologies that have developed at an

astonishing rate in recent years. They have become fundamental to health research and

diagnostics, including genetic disease and cancer, infectious disease, bacterial drug resistance

and personalised drug treatments for patients. It is also central to many areas of agricultural,

ecological and environmental research and diagnostics. This workshop will bring together UK

researchers using NGS and genomics in different fields, with Kazakh scientists with the aim of

fostering links that will help enhance their capability to apply this new technology in their own

work, to identify new opportunities for collaborative research projects between the UK and

Kazakhstan and to promote career development of young scientists. Overall we hope this will

promote the ability of Kazakh scientists to apply cutting edge NGS/genomics techniques in

health care, agricultural and environmental endeavours and support new biotechnology

enterprises.

https://goodmanlab.org/research/workshops-meetings/next-generation-sequencing-

researcher-links-workshop-almaty-kazakhstan-september-18th-24th-2016/

Workshop #tag: #AlmatyNGS2016

Dr Simon Goodman School of Biology University of Leeds Woodhouse Lane Leeds LS2 9JT UK

Dr Kobey Karamendin Institute of Microbiology and Virology 103 Bogenbai batyr str. Almaty, 050010 Kazakhstan

https://goodmanlab.org/research/workshops-meetings/next-generation-sequencing-researcher-links-workshop-almaty-kazakhstan-september-18th-24th-2016/

https://goodmanlab.org/research/workshops-meetings/next-generation-sequencing-researcher-links-workshop-almaty-kazakhstan-september-18th-24th-2016/

Programme

Mon 19th September

Registration from 17.00 to 19.00

Tues 20th September

09.00-10.00 Registration

10.00-10.10 Opening remarks

10.10-11.00 Plenary Dr Aynur Akilzhanova (Nazarbayev University) Genomic research in Kazakhstan: Challenges and opportunities for clinical applications

11.00-11.30 Coffee break

11.30-12.20 Plenary Dr Ian Carr (St James’s Hospital, University of Leeds) Diagnostic mutation detection using Next Generation Sequencing for healthcare in the UK

12.20-12.50 Ms Morag Taylor (St James’s Hospital, University of Leeds) The Role of Next Generation Sequencing in colorectal cancer research

12.50-14.00 Lunch

14.00-14.20 Dr Saule Rakhimova (Nazarbayev University) Transcriptome profiling of oesophageal cancer: from sampling to sequencing on HiSeq2000

14.20-14.40 Dr Ulykbek Kairov (Nazarbayev University) Meta-analysis of cancer transcriptome profiles using an independent components method

14.40-15.00 Dr Ulykbek Kairov (Nazarbayev University) Analysis of human whole-transcriptome sequencing data from Illumina HiSeq2000 platform

15.00-15.20 Dr Niamh Forde (Faculty of Medicine, University of Leeds) Using ‘Omics’ to understand successful early pregnancy events in cattle: The perspective of a reproductive biologist


16.00-17.00 Vladislav Govorkovskiy (Illumina, CIS) Application of Illumina NGS-technologies in healthcare, research and agriculture

Overview of NGS technologies and panel discussion on NGS equipment/sequencing platforms Chaired by Ian Carr (St James’s Hospital, University of Leeds)

17.30-19.00 Poster session and speed networking

Weds 21st September

09.30-10.20 Plenary Dr Mary O’Connell (School of Biology, University of Leeds) Comparative genomics and Mechanisms of protein evolution

10.20-10.40 Dr Antonia Ford (School of Biological Sciences, University of Bangor) Genomic characterisation of wild tilapia populations

10.40-11.00 Ms Jennifer Stockdale (School of Biosciences, University of Cardiff) Hungry for more: Utilising Next Generation Sequencing to determine the dietary range of different species


11.30-11.50 Dr Elizabeth Duncan (School of Biology, University of Leeds) Understanding the molecular mechanisms of gene-environment interactions in insects

11.50-12.10 Dr Helen Hipperson (NERC Biomolecular Analysis Facility, University of Sheffield) Identifying genes affecting both adaptive divergence and reproductive isolation in Howea palms from Lord Howe Island using RNA-Seq

12.10-12.30 Dr Askhat Molkenov (Nazarbayev University) Peculiarities of bioinformatics processing and data conversion from Illumina HiSeq2000

12.30-12.50 Dr Deborah Dawson (NERC Biomolecular Analysis Facility, University of Sheffield) Support for biomolecular studies of the natural environment in the UK

12.50-14.00 Lunch

14.00-15.30 Discussion panel – designing and troubleshooting NGS projects Chaired by Morag Taylor (St James’s Hospital, University of Leeds)


16.00-17.00 Discussion panel – designing and troubleshooting NGS projects continued

Thurs 22nd September

09.30-10.20 Plenary Dr Chris Knight (Faculty of Life Sciences, University of Manchester) Testing evolutionary mechanisms: mutation in microbes and more

10.20-10.40 Dr Saule Daugalieva (Institute of Microbiology and Virology) NGS 16S sequencing for microbial identification

10.40-11.00 Raushan Nugmanova (National Center for Biotechnology) Study of mutation clusters in bacteria using Ion Torrent sequencing


11.30-11.50 Dr Jenny Dunn (Royal Society for Protection of Birds) Using next-generation sequencing to examine co-infection and environmental parasite transmission

11.50-12.10 Dr Alexander Shevtsov (National Center for Biotechnology) NGS sequencing of veterinary pathogens

12.10-12.30 Dr Aizhan Turmagambetova (Institute of Microbiology and Virology) NGS for ecological research applications: macrophages profiling in Kazakhstan lakes

12.30-12.50 Dr Kobey Karamendin (Institute of Microbiology and Virology) NGS 16S sequencing of necropsy material from Saiga antelope after a mass die-off in Spring 2015

12.50-14.00 Lunch

14.00-15.30 NGS bioinformatics & data analysis resources and pipelines: Overview, demonstrations and panel discussion Chaired by Helen Hipperson (NERC Biomolecular Analysis Facility, University of Sheffield)


16.00-17.00 Bioinformatics discussion continued

19.00-23.00 Conference dinner

Fri 23rd September

09.30-10.00 Rowan Kennedy (Newton-Al Farabi Partnership Programme) UK-Kazakhstan research funding opportunities

10.00-10.30 Dr Simon Goodman (School of Biology, University of Leeds) Overview of research structure, funding and career development in the UK


11.00-12.00 Break out groups - Identification of research priorities and collaboration opportunities for UK-Kazakh researchers

12.00-12.30 Report of break out groups and closing remarks

12.30-14.00 Lunch

Departures

Posters

Author Title

Ulan Kozhamkulov Laboratory of Genomic and Personalized Medicine, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan

Whole genome sequencing of clinical isolates of M.tuberculosis with a different drug sensitivity profile on the Roche 454 GS FLX + platform

Ainur Akhmetova Laboratory of Genomic and Personalized Medicine, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan

Creating a HaloPlex cardiogenetic panel and preparation of DNA libraries for the targeted sequencing of patients with arrhythmias

Nurlan Torokeldiev Medical School of the International Ala-Too University in Bishkek

Pattern of genetic variation, fine-scale genetic structure and footprints of natural selection in populations of Juglans regia L. in the southern Kyrgyz Republic

Vladislav Govorkovskiy Illumina representative, Belarus

Poster about NGS technology

Vladislav Govorkovskiy Illumina representative, Belarus

Poster about production

Abstracts

Dr Ian M. Carr, St James’s University Hospital, University of Leeds, UK

Diagnostic mutation detection using Next Generation Sequencing

Next generation sequencing (NGS) is a relatively new technology that can quickly and cheaply

generate huge amounts of sequence data. Consequently, it has rapidly found a wide range of

applications in both basic and translational research. These application range from de novo genome

assembly of large eukaryotic genomes to amplicon sequencing of huge cohorts. NGS also promises to

revolutionise diagnostic testing where it may prove cheaper than current testing methodologies, allow

the testing of low quality samples or allow the development of completely novel diagnostic tests.

In the UK, NGS technologies are seen as the future of many DNA based testing methodologies. The

Yorkshire Regional DNA Laboratory, in Leeds, was one of the first to offer NGS based diagnoses' and

has reported on over 6,000 cases. Currently, the Yorkshire Regional DNA Laboratory uses a range of

methodologies to identify mutations ranging from single base substitutions to large structural

rearrangements. I will discuss these advances in light of the population demographics in the Yorkshire

region and how the new tests are implemented alongside current best practises that it may either

replace or augment.

Dr Deborah Dawson, NERC NBAF Centre, University of Sheffield, UK

Support for biomolecular studies of the natural environment in the UK

In the UK, support is provided for molecular studies of the natural environment by the NERC

Biomolecular Analysis Facility. The Facility provides access to high-level genomics, metabolomics and

bioinformatics through its four nodes at Sheffield, Edinburgh, Liverpool and Birmingham.

The Facility offers the very latest, class-leading technologies, including next-generation sequencing

(Illumina and Pacific Biosciences), SNP genotyping, and high resolution MS and NMR metabolomic

platforms. Applications include de novo sequencing, metagenomics, epigenetics, sequence-capture,

sequencing-based genotyping and expression profiling (RNAseq, oligoarrays and NanoString). The

Facility also supports metabolomics, medium-scale genotyping, bioinformatics and advanced data

analysis techniques (genome and transcriptome assembly and annotation, expression analysis, etc.).

Each node takes the lead in providing support in one area. At Sheffield, access is provided to laboratory

facilities, equipment, training and expertise. The main call is for the development and application of

genetic markers for use in population genetics and behavioural ecology. We also support various other

techniques, including metabarcoding for genetic studies of diet. The service at Sheffield is based on a

well-proven arrangement, in which researchers visit the laboratory to complete their own analyses

under the supervision of someone experienced in the required technology. In most cases, the majority

of the bench work will be carried out by visitors to the Facility under the supervision of Facility staff.

Training is provided, as appropriate.

The Facility has supported over 200 projects and 150 PhD students. Our users have published over

300 publications from Facility-supported studies, including large numbers in high-ranking journals

such as Nature and Science.

Dr Elizabeth J. Duncan, School of Biology, Faculty of Biological Sciences, University of Leeds, UK

Understanding the molecular mechanisms of gene-environment interactions in insects.

The phenotype of a plant or animal is dependent on interactions between their genes and the

environment. Some plants and animals are even able to generate markedly different phenotypes in

response to a change in the environment, a phenomenon known as phenotypic plasticity.

Using the honeybee (Apis mellifera) and the pea aphid (Acyrthosiphon pisum) we have a developed

an analysis pipeline to begin to understand the molecular basis of how these gene-environment

interactions occur.

The honeybee and pea aphid both change the way they reproduce in response to changes in the

environment. In the honeybee hive only one female, the queen, usually reproduces. If the queen and

her pheromone are lost from the hive this triggers the normally sterile worker bees to become

reproductively active. Using a combination of techniques including RNA-seq to measure gene

expression and immunohistochemistry to determine which cell types in the ovary are affected we

have isolated a conserved signalling pathway as key to this process, Notch signalling. Among other

roles, Notch signalling has a key function in forming and maintaining stem cell niches and I propose

that these niches are key to gene-environment responses.

Epigenetic mechanisms, such as DNA methylation and histone modifications, also play a role in altering

the way animals respond to their environment and may also regulate stem-cell niches. To investigate

the role of epigenetic mechanisms in regulating the gene-environment interactions seen in the

honeybee I have used chromatin immunoprecipitation-sequencing (to investigate a particular histone

modification) and whole genome bisulphite sequencing to determine methylation patterns across the

genome.

Ultimately I aim to determine if there are conserved signalling pathways or regulatory networks that

control plasticity amongst diverse animals. Using these relatively simple and tractable systems to

understand the mechanisms of plasticity will allow us to understand, at a whole-organism level, how

animals are responding to their environment.

Jenny C. Dunn1, Rebecca C. Thomas2, Helen Hipperson3, Keith C. Hamer2 & Simon J. Goodman2

1 RSPB Centre for Conservation Science, Royal Society for the Protection of Birds, The Lodge, Potton

Road, Sandy, Bedfordshire, SG19 2DL, UK

2 School of Biology, Irene Manton Building, University of Leeds, Leeds. LS2 9JT, UK

3 NERC Biomolecular Analysis Facility, Department of Animal and Plant Sciences, University of

Sheffield, Western Bank, Sheffield, S10 2TN, UK

Using next-generation sequencing to examine co-infection and environmental parasite transmission

Co-infection with different parasites or multiple strains of the same parasite species is common in

natural systems and has implications for disease ecology and epidemiology. Traditional methods using

PCR either detect the dominant strain or return convoluted results from Sanger sequencing. Next-

generation sequencing (NGS) provides the opportunity to detect multiple strains of parasite

simultaneously from single samples, either from individuals or the environment. Here, I will describe

the application of NGS for parasite strain identification in a declining species of migratory bird, the

European Turtle Dove Streptopelia turtur. We screened blood samples and oral swabs for

haemoparasites and Trichomonas gallinae respectively, examining a single gene region (cytochrome

b) for haemoparasites, and two gene regions (ITS and FeDH) for Trichomonas gallinae. I will discuss

the laboratory methods and the bioinformatics analysis used, and discuss the applications of the

results in the context of ecology and conservation.

Dr Antonia G P Ford, School of Biological Sciences, Bangor University, Bangor, Gwynedd, LL57 2UW,

UK

Genomic characterisation of wild tilapia populations

Tilapia cichlid fish, and particularly the genus Oreochromis, are a mainstay of tropical aquaculture.

While most focus has been on strains of Nile tilapia (Oreochromis niloticus), several aquaculture

populations make use of hybrid lines and the ready hybridization of Oreochromis species. Future strain

enhancement may further benefit from the availability of additional wild genetic resources, which

have previously been used to enhance growth, environmental tolerance, control sex ratios, and

introduce genetic resistance to disease. However, existing native wild populations are frequently

poorly characterised and threatened by invasive tilapia species. Here, I will discuss an ongoing project

aiming to characterise wild populations of Oreochromis tilapia across a region of high cichlid

biodiversity, Tanzania, East Africa. Several introduced aquaculture tilapia species are found in wild

populations throughout Tanzania, where they are thought to compete with and hybridise with native

species. The project uses next generation sequencing (Illumina HiSeq) and SNP genotyping (Agena) to

survey wild populations to examine the extent and nature of introgression.

Dr Niamh Forde, Division of Reproduction and Early Development, Leeds Institute of Cardiovascular

and Metabolic Medicine, School of Medicine, University of Leeds, UK

Using ‘Omics’ to understand successful early pregnancy events in cattle: The perspective of a

reproductive biologist.

In most mammalian species studied, the majority of pregnancy loss occurs in the first three weeks of

pregnancy. A large proportion of this loss can be attributed to asynchrony between the embryo and

the endometrium and or dysregulation of the uterine environment. A number of key events are

required to support successful early pregnancy in cattle. Specifically, an adequate post-ovulatory rise

in the hormone progesterone (P4) in circulation which to alter the endometrial transcriptome, an

appropriate uterine environment with the secretions required to drive embryo development as well

as appropriate pregnancy recognition signalling by the conceptus to the endometrium to maintain P4

concentrations in circulation and to establish uterine receptivity to implantation. The focus of my talk

will be on how we utelised ‘omic’ technologies to understand how the hormone progesterone alters

the ability of the uterus to support successful early pregnancy. In addition, I will demonstrate how

using RNA sequencing technologies helped us to identify an earlier pregnancy recognition response to

an embryo in the endometrium and proposed some biomarkers of early pregnancy in cattle. I will also

demonstrate how we have used RNA sequencing to look at how the metabolic environment of the

mother can have an impact on the transcriptome of lots of different reproductive tissues. Finally, I will

sum up the limitations and the pitfalls of using these types of technologies to address your biological

question.

Helen Hipperson, LT Dunning, WJ Baker, RK Butlin, C Devaux, I Hutton, J Igea, AST Papadopulos, X Quan, CM Smadja, CGN Turnbull, TC Wilson, VS Savolainen

NERC NBAF Centre, University of Sheffield, UK

Identifying genes affecting both adaptive divergence and reproductive isolation in Howea palms from Lord Howe Island using RNA-Seq

Howea belmoreana and Howea forsteriana are sister species of palm, both endemic to Lord Howe Island (LHI; located in the Tasman Sea between Australia and New Zealand) where they have diverged in sympatry. Originally composed solely of volcanic substrate, the deposition of calcareous soil on LHI is thought to have led to ecological speciation. Currently, H. belmoreana adults are restricted to volcanic soils whilst H. forsteriana is also found on the younger calcarenite soil. There are several ecological differences between these habitats; the calcareous soils are dryer, have higher pH, and have increased salinity compared to volcanic soil. The species are largely reproductively isolated with a five week difference in peak flowering time between them, both in the wild and when cultivated in a common garden. Differences in the peak flowering times are also maintained regardless of the soil type that H. forsteriana occurs on. Genes that have a dual role in controlling ecological adaptation and flowering time may have played a direct role in Howea speciation. To characterise such pleiotropic genes we first used RNA-Seq to identify differentially expressed genes between the Howea species using three tissue types (floral, leaf and root) sampled from 36 trees distributed across LHI. We also examined loci with divergent coding sequences. From both analyses we identified 16 candidate genes that were associated with ecological differences between the species and/or flowering time divergence, and examined the effect that eight of these genes have on flowering time in Arabidopsis knockout mutants. Finally, we put forward six plausible ecological speciation loci, providing support for the hypothesis that pleiotropy could help to overcome the antagonism between selection and recombination during speciation with gene flow.

Dr Mary O’Connell, School of Biology, Faculty of Biological Sciences, University of Leeds, UK

Comparative genomics and Mechanisms of protein evolution

The relationship between sequence and function has proven difficult to fully elucidate but it is key to

understanding what makes a species unique. Here I will describe how we can help to bridge the gaps

in our understanding of the relationships (i) between species and (ii) between genotype and

phenotype, by adequately modelling major patterns in genomic sequence data. I will present results

from a small selection of large-scale comparative genomics studies and I will describe our approach

for identifying the evolution of species-specific proteins/protein functions using genome-scale data

and computational evolutionary models. Taking an applied evolutionary approach to modelling may

provide us with an increased understanding of species-specific response to disease/drugs at the

molecular level.

Dr Chris Knight, Faculty of Life Sciences, University of Manchester

Testing evolutionary mechanisms: mutation in microbes and more

Tackling major global challenges, such as the rise in antimicrobial resistance, requires a focus on the

fundamental evolutionary processes that underlie them. We are experimentally testing the

spontaneous evolution of antibiotic resistance in different microbes. Mutation rates have been

measured using phenotypic markers, including antibiotic resistances, for over 70 years. We find

patterns in this data suggesting that dense populations may evolve resistance at a lower rate than

sparse populations. Manipulating population densities in the laboratory, in either bacteria or yeast,

we can modify the mutation rate to several different antibiotic resistances by over an order of

magnitude. We find that this ‘density associated mutation rate plasticity’ (DAMP) requires an

evolutionarily ancient mutation avoidance mechanism, but is modified or mediated in particular

lineages, including by cell-cell interactions with the surrounding community. The next level is

therefore to consider the evolution of mixed microbial communities. We are considering both

experimental (mouse gut) and broader microbial meta-genetic data (soil communities), where we are

using novel approaches to distinguish the biologically interesting signals from a range of technical

confounders. Through a combination of modelling approaches and next generation sequence data we

are gaining a closer connection between our understandings of genotypic change, phenotype and

ecology. This will contribute to addressing major issues, including antimicrobial resistance, but at the

same time help shed new, molecular, light on classically understood evolutionary processes.

Jennifer Stockdale1, Jenny Dunn1,2, Joanna Redihough1, Helen Hipperson3, William Symondson1

1Cardiff University School of Biosciences, Sir Martin Evans Building, Museum Avenue, Cardiff, CF10

3AX.

2RSPB Centre for Conservation Science, The Lodge, Sandy, Bedfordshire. SG19 2DL

3 NERC Biomolecular Analysis Facility, Department of Animal and Plant Sciences, University of

Sheffield, Western Bank, Sheffield, S10 2TN, UK

Hungry for more: Utilising next generation sequencing to determine the dietary range of different

species.

Next generation sequencing (NGS) is increasingly being used to look at the complete dietary range of

species. To date, there have been ecological analyses using molecular scatology to study the diets of

killer whales and leopards at one end of the spectrum, with specialist termite-eating spiders at the

other. Molecular analyses consequently tend to be replacing more traditional techniques of

morphological analysis of faecal samples, stomach flushing, nest cameras and direct observation to

identify dietary components. At Cardiff University we are using NGS to determine the dietary ranges

of invertebrates and vertebrates in both temperature and tropical habitats. I will discuss three

ongoing projects examining diets of the Common Crane (which eats invertebrates and plants),

thrushes in farmland (which eat invertebrates), and the European Turtle Dove (which eats plant

seeds). Work on the recently reintroduced Common Crane to the Somerset levels will provide new

insight into Eurasian Crane diet, which may aid any potential future reintroductions and will help to

sustain these birds in Britain. NGS is also being used to monitor the diets of thrushes in farmland

landscapes of different complexity enabling us to link the prey found in their faeces with the use of

landscape elements (arable field, woodland etc.). Finally, we have been able to determine implications

of diet for Turtle Dove body condition, consider changes in diet over time and usage of bespoke habitat

management options with implications for conservation management.

Morag Taylor, Susan Richman, Tim Palmer, Henry Wood, Caroline Young, Phil Quirke

St James’s University Hospital, University of Leeds, UK

The Role of Next Generation Sequencing in Colorectal Cancer Research

In the United Kingdom (UK), colorectal cancer (CRC) is the fourth most common cancer, and the

second most common cause of cancer related deaths. It’s important to understand both the

prognostic and predictive markers of CRC to improve these statistics. The advent of next generation

sequencing (NGS) is allowing us to explore tumour profiling and mutation screening in new ways,

advancing the role of molecular pathology.

We are part of several multicentre clinical trials investigating CRC biomarkers to determine patient

treatment. Until now, we have favoured the lower cost sequencing alternatives, but with the

continuing reduction in sequencing costs, and the need to adapt assays quickly whilst keeping

technical time down, we are developing a pipeline to move our clinical trials to NGS.

We have investigated genomic heterogeneity in CRC. Using copy number variation data from NGS, we

have analysed the primary tumours and all distant metastases from eight patients who died of

advanced CRC. We generated phylogenetic trees for each patient to follow the evolution of the

disease.

The human microbiome has been studied for many years, but NGS has made this study area more

accessible. It has allowed for the identification of bacteria that were previously un-culturable. Studies

have shown the gut microbiome plays a role in CRC, but as yet, this isn’t fully understood. In order to

investigate this, we are developing a pipeline to study the human microbiome isolated from guaiac

faecal occult blood test cards, used worldwide to screen for CRC. We will follow this by doing a UK

study to investigate if the microbiome can be used as a future tool to predict CRC using 16s rRNA

sequencing on an Illumina MiSeq.

Ainur Akilzhanova

Laboratory of Genomic and Personalized Medicine, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan

Genomic research in Kazakhstan: challenges and opportunities for clinical applications

Technological advancements are rapidly propelling the field of genome research forward. Advances in genetics and genomics such as the sequence of the human genome, the human haplotype map, open access databases, cheaper genotyping and chemical genomics have already transformed basic and translational biomedical research. At the National Laboratory Astana (NLA), Center for Life Sciences, Nazarbayev University several projects in the field of genomic and personalized medicine are conducting. The prioritized areas of research include genomics of multifactorial diseases, cancer genomics, bioinformatics, genetics of infectious diseases and population genomics. At present, DNA-based risk assessment for common complex diseases, application of molecular signatures for cancer

diagnosis and prognosis, genome-guided therapy, and dose selection of therapeutic drugs are the important issues in personalized medicine.

Kazakhstan is a unique country located in the middle of Central Asia, laying on the ancient Great Silk road. Kazakh populations have been strongly influenced by the nomadic lifestyle, and a long history of migration has led to admixture of western and Asian populations, which has molded the genetic architecture. Thus it is crucial to understand the genetic background of ethnic Kazakhs to properly investigate the genetic basis of common diseases or traits in Kazakh populations. To develop a personalized medicine program for Kazakhstan, we first need acquire personal genomic data for Kazakhs. To do so, we need a core of scientists who can: (1) design proper studies; (2) diagnose accurately; (3) sequence efficiently (using multiomic technologies); (4) analyze and maintain massive sequence data; (5) analyze the relations between genetic variants and phenotypes (i.e., disease status or biomarkers.

To further develop genomic and biomedical projects at NLA and in Kazakhstan the development of bioinformatics research and infrastructure is essential, as well as establishment of new collaborations in this field.

Widespread use of genetic tools will allow the identification of diseases before the onset of clinical symptoms, the individualization of drug treatment, and could induce individual behavioral changes on the basis of calculated disease risk. However, many challenges remain for the successful translation of genomic knowledge and technologies into health advances, such as medicines and diagnostics.

It is important to integrate research and education in the fields of genomics, personalized medicine and bioinformatics which will be possible with opening of new Medical Faculty in Nazarbayev University. Educating both those in practice and those in training about key concepts of genomics and, importantly, engaging them in the design of how this knowledge will be applied most effectively will rapidly bring the era of genomic medicine to patient care, resulting in improved health. And all of this must be based on good research and scientific platform which requires development of well-equipped modern laboratories, bioinformatics, qualified trained physicians and laboratory staff and understanding policy among population of the country.

Геномные исследования в Казахстане: проблемы и возможности для клинических приложений

Технологические достижения последних десятилетий продвинули область геномных и мультиомных исследований вперед. Достижения в области генетики и геномики, такие как определение последовательности генома человека, карты гаплотипов человека, базы данных открытого доступа, удешевление генотипирования и химической геномики уже трансформировали фундаментальные и трансляционные биомедицинские исследования. В National Laboratory Astana (NLA), Центре наук о жизни Назарбаев Университета реализуются проекты в области геномной и персонализированной медицины. Приоритетные направления исследований: геномика многофакторных заболеваний, рака, биоинформатика и вычислительная системная биология, генетика инфекционных заболеваний и популяционная геномика, мультиомные исследования, разработка и внедрение NGS методов в практику. В настоящее время оценка риска общих сложных заболеваний на основе изучения ДНК, применение молекулярных сигнатур для диагностики и прогноза рака, геном ассоциированной терапии и подборе дозы терапевтических препаратов являются важными вопросами персонализированной медицины.

Казахстан является уникальной страной, расположенной в центре Центральной Азии, лежит на древнем Великом шелковом пути. Казахстанское население было под сильным влиянием кочевого образа жизни, а также имеет долгую историю миграции, что привело к смеси западных и азиатских популяций, которые сформировали генетическую архитектуру народа. Таким образом, крайне важно, чтобы понять генетический фон этнических казахов, чтобы должным образом исследовать генетическую основу общих заболеваний или признаков у казахского населения. Для того, чтобы разработать индивидуальную программу медицины для Казахстана, мы в первую очередь необходимо получить личные данные геномов казахстанцев. Для этого нужен пул ученых, которые могут: (1) разработать и проводить надлежащие исследования; (2) точно диагностировать; (3) эффективно определять последовательности с использованием мультиомиксных технологий; (4) анализировать и поддерживать большие массивы данных последовательности; (5) анализировать отношения между генетическими вариантами и фенотипами (т.е. статусом заболевания или биомаркерами) и др…

В целях дальнейшего развития геномных и биомедицинских исследований в NLA и в Казахстане имеет важное значение развитие инфраструктуры и биоинформатики, а также налаживание сотрудничества с международными лабораториями и консорциумами в этой области, с клиниками и вузами и научными коллективами.

Широкое использование генетических инструментов позволит идентифицировать заболевания до появления клинических симптомов, индивидуализации лекарственной терапии, и может вызвать индивидуальные поведенческие изменения на основе

расчетного риска заболевания. Тем не менее, многие проблемы остаются актуальными для успешного перевода геномных знаний, технологий и достижений в практику в области здравоохранения.

Важно проводить интеграцию научных исследований и образования в области геномики, персонализированной медицины и биоинформатики, который становится возможным с открытием нового медицинского факультета в Назарбаев Университете. Обучение как на практике, так и в образовательном процессе ключевым концепциям геномной и персонализированной медицины и, что немаловажно, вовлечение в дизайн, как это знание будет наиболее эффективно применяться в клинической практике, могут способствовать более быстрому внедрению эры геномной медицины в Казахстане, что может привести к улучшению здоровья народа. И все это должно быть основано на хорошей исследовательской и научной платформе, которая требует хорошо оборудованных современных лабораторий, развития биоинформатики, наличия квалифицированных подготовленных врачей и персонала лаборатории, а также подготовленного населения страны.

Ulykbek Kairov

Laboratory of Bioinformatics and Computational Systems Biology, Center for Life Sciences,

National Laboratory Astana, Nazarbayev University

Analysis of human whole-transcriptome sequencing data from Illumina HiSeq2000 platform

The high-throughput genomic technologies and particularly Illumina HiSeq2000 next-generation

sequencing platform have a major impact on studying cancer. Illumina HiSeq2000 NGS platform

generating up to 600 GB of sequencing data per run. Huge amount of sequencing data requires

application of reproducible bioinformatics methods, mathematical and statistical approaches for

analysis. Transcriptomic profiling of cancer specimens with Illumina HiSeq2000 NGS platform has

provided a comprehensive opportunity for in-depth investigation of gene expression and affected

molecular pathways. In our study we aimed to perform comprehensive analysis of sequencing data

from HiSeq2000 platform to identify affected molecular pathways and extract meaningful molecular

signals from oesophageal cancer specimens of Kazakhstani patients.

Анализ данных полных транскриптомов с платформы секвенирования нового поколения

Illumina HiSeq2000.

Высокопроизводительные геномные технологии, в частности, платформа секвенирования

нового поколения Illumina HiSeq2000, являются значимыми в современном изучении

онкологических заболеваний. Платформа секвенирования нового поколения Illumina HiSeq2000

генерирует до 600 Гб данных за один запуск. Генерируемые огромные массивы данных требуют

применения воспроизводимых биоинформатических методов и нестандартных

математических и статистических подходов анализа. Транскриптомное профилирование

опухолевых образцов с применением платформы Illumina HiSeq2000 NGS открывает новые

возможности масштабного исследования генетической экспрессии и поиска ключевых

молекулярных сетей. Наше исследование направлено на проведение всестороннего анализа

генетической экспрессии для поиска молекулярных сигналов в транскриптомных профилях

казахстанских пациентов с диагнозом рак пищевода.

Ulykbek Kairov

Laboratory of Bioinformatics and Computational Systems Biology, Center for Life Sciences, National

Laboratory Astana, Nazarbayev University

Meta-analysis of cancer transcriptome profiles using Independent Component Analysis

The high-throughput genomic technologies such a microarray technology and next-generation

sequencing have a major impact on studying cancer. Huge amount of genomic data requires

application of reproducible analytical approaches. In our study we demonstrated application of

Independent Component Analysis method to do meta-analysis of breast cancer gene expression data.

We identified from 7 to 8 reproducible components in all four breast cancer datasets and developed

graph-based approach to meta-analysis and interpretation of these independent components such

that each of them was associated with a small gene network. Using analysis of these networks, we

provided a tentative interpretation of stably reproducible components. Thus, we found that various

factors such as proliferation, immune response, contamination of tumor cells by lymphocytes and

normal tissues affect gene expression in breast cancer.

Мета-анализ раковых транскриптомов с применением Метода Независимых Компонент.

Высокопроизводительные геномные технологии, такие как технологии высокоплотных

микрочипов и секвенирования нового поколения Illumina HiSeq2000, вносят значительный

вклад в современное изучение онкологических заболеваний. Огромные массивы геномных

данных требуют применения воспроизводимых аналитических методов. В нашем исследовании

мы продемонстрировали способо применения Метода Независимых Компонент для мета-

анализа наборов данных с раком молочной железы. Было обнаружено от 7 до 8

воспроизводимых компонент во всех наборах рака молочной железы и разработали подход с

применением теории графов для проведения мета-анализа и интерпретации независимых

компонент.

Saule Rakhimova

National Laboratory Astana, Nazarbayev University)

Transcriptome profiling of oesophageal cancer: from biomaterial sampling to sequencing on

HiSeq2000.

The report presents the study of transcriptome profile of esophageal squamous cell carcinoma using

NGS technology. Description of work includes the following steps: sampling of biological material,

nucleic acids isolation, library preparation, library validation methods used in the laboratory.

Esophageal cancer is the sixth common cancer in Kazakhstan, and usually not detected until it has

progressed to an advanced incurable stage. More than 80% of the cancer cases and deaths occur in

developing countries and Central and East Asia. Aim of study: to identify genetic basis of esophageal

cancer by performing whole human transcriptome sequencing study in Kazakhstan.

Patient recruitment was carried out on the Thoracic surgery department, Oncology Center, Astana.

We include only patient with confirmed informed consent and confirmed diagnosis of esophageal

squamous cell carcinoma, to whom was performed radical surgery (Ivor Lewis esophagectomy), and

was available blood analysis, biochemical data, CT, X-ray, histopathological data.

Materials: pairs of freshly frozen (after RNA later solution) esophageal cancer tissue specimen and

normal tissue specimen.

Methods: RNA isolation, Library preparation, Library validation, Hybridization on flow cell, Sequencing

on HiSeq 2000.

For RNA isolation and purification was used Qiagen kits, for library preparation was used Tru Seq RNA

sample preparation kit, all procedures were performed according to Illumina protocols.

Сауле Рахимова (ЧУ «National Laboratory Astana», Назарбаев Университет) – Транскриптомный

профиль рака пищевода: от забора материала до секвенирования на HiSeq2000.

В докладе представлено использование NGS технологии в выполнении научного проекта по

изучению транскриптомного профиля плоскоклеточного рака пищевода. Описание работы

включает следующие этапы: забор биоматериала, выделение нуклеиновых кислот, подготовку

библиотек, методы валидации библиотек использованные на базе лаборатории.

Рак пищевода занимает шестое место в структуре онкопатологии в Казахстане, и, как правило,

не обнаруживается, пока заболевание не прогрессирует до запущеных стадий. Более 80%

случаев заболеваемости и смертности приходится на развивающиеся страны и страны

Центральной и Восточной Азии. Цель исследования: выявление генетических основ рака

пищевода на основе исследования полного транскриптома в Казахстане.

Рекрутинг пациентов проводилось на базе отделения торакальной хирургии, онкологического

центра, г. Астана. В исследование были включены пациенты с: подписанными формами

информированного согласия, подтвержденным клиническим диагнозом - плоскоклеточного

рака пищевода, которым проводилась радикальная операция (Ivor Lewis Esophagectomy), с

результатами анализа крови, биохимических данных, КТ, рентген-обследования,

гистологическим заключением.

Материалы: пара свежезамороженной (либо образца тканей в РНК стабилизирующем растворе)

ткани пищевода с нормального участка и центра опухоли.

В работе использованлись методы выделения и очистки РНК, подготовка библиотек, различные

методы валидации библиотек, гибридизация библиотек на проточную ячейку, секвенирование

на HiSeq 2000

Для выделения и очистки РНК использовали наборы Qiagen, для подготовки библиотеки

использовали Tru Seq RNA sample preparation kit, все процедуры проводились в соответствии с

протоколами Illumina.

Vladislav Govorovskiy

Illumina representative, Belarus

Application of Illumina NGS-technologies in healthcare, research and agriculture

Next-generation sequencing (NGS) technologies transform biological and medical research.

Researchers around the world use next-generation sequencing systems to drive genetic analysis at

higher rate.

Ongoing development of Sequence By Synthesis (SBS) technology provides possibilities to drive

various researches in spheres of interest: science, healthcare, agriculture, forensic, reproductive

medicine. Development of modern devices, such as MiniSeq, HiSeq 4000, HiSeq X, has provided

customers with new possibilities in the NGS sphere. Computational power of those machines coupled

with ongoing designing of kits and panels it has opened great prospects for modern research.

Illumina platform also allow using alternative methods of sample preparation that extends the

potential use of the system. Variety of discussed methods and their potential combinations provide

considerable scope expansion for modern science and medicine.

Секвенирование нового поколения преобразило исследования в сфере биологии и медицины.

Исследователи по всему миру используют системы NGS для продвижения генетических

анализов до ранее недостижимого уровня.

Постоянное совершенствование технологии Sequence By Synthesis (SBS) от компании Illumina

даёт возможность проводить всё более разносторонние исследования в разнообразных сферах

научных исследований, здравоохранения, сельского хозяйства, криминалистики,

репродуктивной медицины.

Появление современного оборудования, такого как MiniSeq, HiSeq 4000, HiSeq X, дало

пользователям новые возможности в сфере NGS, а вместе с постоянно совершенствующимися

наборами и панелями, это открыло огромные перспективы для современных исследований.

Платформа Illumina также дает возможность использовать альтернативные методики

пробоподготовки, что расширяет потенциал использования данной системы.

Разнообразие и комбинация данных методов и их потенциальные комбинации предоставляют

значимое расширение границ для современной науки и медицины.

Askhat Molkenov

Nazarbayev University, Kazakhstan

Peculiarities of Bioinformatics Processing and Data Conversion from Illumina HiSeq2000

High throughput next generation sequencing platforms provided new opportunities to scientists in

genomic research field. Nowadays there are carried out large-scale genomic studies of different

organisms, including humans, animals, plants and bacteria with the usage of next generation

sequencing technologies. Modern bioinformatics is a synthesis of biological, information and technical

disciplines aimed to solve scientific problems. In this report, I will present some methods and examples

used in the daily analytical protocols on the base of Laboratory of Bioinformatics and Computational

Systems Biology for the analysis of genomic data from Illumina HiSeq 2000.

Особенности биоинформатической обработки и преобразования данных с платформы HiSeq

2000.

Высокопроизводительные платформы секвенирования нового поколения открыли перед

учеными новые возможности для геномных исследований. В настоящее время проводятся

масштабные геномные исследования различных организмов, в том числе людей, животных,

растений и бактерий с применением технологий секвенирования нового поколения.

Современная биоинформатика представляет собой синтез биологических, информационных и

технических дисциплин, направленных на решение научных задач. В своем докладе я

представлю некоторые методы и примеры, используемые в ежедневных аналитических

протоколах на базе Лаборатории Биоинформатики и вычислительной системной биологии для

анализа геномных данных с Illumina HiSeq2000.

Saule Daugalieva

Institute of Microbiology and Virology, Kazakhstan

NGS 16S sequencing for microbial identification

Laboratory shared of Institute Microbiology and Virology was established in 2014. In the laboratory,

performed the molecular genetic studies on research projects carried out in our institute. The

laboratory is equipped with modern equipment and everything necessary for the research to date. In

the laboratory there are: an Eppendorf PCR cycler, real-time PCR Applied Biosystems 7500, 8-capillary

sequencer Applied Biosystems 3500, next generation sequencer MiSeq Illumina. In addition, there are

accessories: spectrophotometer Quibit, Ajilent Bioanalyzer 2100 gel documentation system Vilber

Lourmat ECX- F15.M, and chromatography mass spectrometer Shimadzu LCMS-860.

Department of Microbiology of our Institute conducted Molecular following types studies of

microorganisms: oil degraded, cellulose degraded, nitrogen-fixing, lactic acid, plant pathogens,

bacteria, fungi and yeast. The main areas of research are the identification of microorganisms by PCR

analysis and sequencing, full genome analysis, and identification of specific genes. In the near future

we plan to hold the soil and water metagenomic analysis from different regions of Kazakhstan and of

the environment.

In 2014, we performed full genome analysis of 14 species of bacteria on the NGS-sequencer MiSeq

Illumina. At this sequencer we performed 16S metagenomic analysis of 120 strains of bacterial

cultures. Following the acquisition of capillary sequencer, we have conducted with the help of the

identification of 12 species of fungi, and 8 species of yeast, as well as 110 species of bacteria.

When conducting full genome analysis on MiSeq instrument we used a set of sample preparation and

Nextera XT kit for sequencing MiSeq Kit v2.

16S metagenomic analysis for libraries prepared using indexes Nextera XT Index Kit (24 Indexes, 96

Samples) Illumina using KAPA HIFI HOTSTART READY MIX. Purification was carried out using reagent

Ampure XP beads on the magnetic stand. Quantity and quality of the libraries was determined with a

spectrophotometer Quibit 2.0, and 2100 Bioanalyzer Ajilent 2100 and by horizontal gel

electrophoresis. These libraries were normalized and pooled. As a control, was added Phix Control v.3.

Sequencing was performed using a set MiSeq Kit v.2 (500 cycles) and MiSeq Kit v.3 (600 cycles). The

processing of the results was performed using MiSeq Reporter program and 16S metagenomic

program on Illumina website.

Лаборатория коллективного пользования института микробиологии и вирусологии создана в

2014 году. В лаборатории выполняются молекулярно-генетические исследования по научным

проектам, проводимым в нашем институте. Лаборатория оснащена современным

оборудованием и всем необходимым для проведения научных исследований на современном

уровне. В лаборатории имеются: ПЦР-амплификатор Eppendorf, ПЦР реал-тайм Applied

Biosystems 7500, 8-капиллярный секвенатор Applied Biosystems 3500, секвенатор нового

поколения MiSeq Illumina. Кроме того, имеется вспомогательное оборудование:

спектрофотометр Quibit, биоанализатор Ajilent 2100, система документирования гелей Vilber

Lourmat ECX- F15.M, а также система хроматомасспектрометрии Shimadzu LCMS-860.

Отделом микробиологии института проводятся молекулярные исследования следующих видов

микроорганизмов: нефтеокисляющих, целлюлозолитеческих, азотфиксирующих,

молочнокислых, бактерий-фитопатогенов, грибов и дрожжей. Основными направлениями

исследований являются идентификация штаммов микроорганизмов методами ПЦР-анализа и

секвенирования, полногеномный анализ, а также идентификация определенных генов. В

ближайшее время мы планируем проведение метагеномного анализа почвы, воды различных

регионов Казахстана и объектов окружающей среды.

В 2014 году нами проведен полногеномный анализ 14 видов бактерий на NGS-секвенаторе

MiSeq Illumina. На данном секвенаторе нами проведен 16S метагеномный анализ около 120

штаммов бактериальных культур. После приобретения капиллярного секвенатора, мы провели

с его помощью идентификацию 12 видов грибов и 8 видов дрожжей, а также 110 видов

бактерий.

При проведении полногеномного анализа на приборе MiSeq мы использовали набор для

пробоподготовки Nextera XT и набор для секвенирования MiSeq Kit v2.

Библиотеки для 16S метагеномного анализа готовили с помощью индексов Nextera XT Index Kit

(24 Indexes, 96 Samples) Illumina с применением KAPA HIFI HOTSTART READY MIX. Очистку

проводили с помощью реагента Ampure XP beads на магнитном штативе. Количество и качество

библиотек определяли на спектрофотометре Quibit 2.0, биоанализаторе Ajilent 2100 и методом

горизонтального гель-электрофореза. Полученные библиотеки нормализовали и объединяли.

В качестве контроля добавляли Phix Control v.3. Секвенирование проводили с помощью набора

MiSeq Kit v.2 (500 циклов) и MiSeq Kit v.3 (600 циклов). Обработку полученных результатов

проводили с помощью программы MiSeq Reporter и программы 16S metagenomic на сайте

Illumina.

Raushan Nugmanova

National Center for Biotechnology, Kazakhstan

Study of mutation clusters using Ion Torrent

Phenomenon of nonuniform pattern of mutations in the genome has been observed for many years.

Recent studies have shown presence of certain mutation clusters in cancer genomes, yeast cells, Big

Blue mice, retroelements as well as in bacteria under the pressure of the DNA damaging agents. Such

clusters were detected in the particular regions of the genome, accumulating within a number of

generations. Deep understanding of mutagenesis effect became possible with the development of

next generation sequencing, emergence of which provides deep and sensitive analysis of broad range

of mutations at high speed, generating high quality data. Therefore such approach is widely used in

genome-wide studies. The study of mutagenesis in bacteria is crucial as it might work as a potential

anti-bacterial treatment or may show new aspects of bacterial genome organization. As the previously

conducted study revealed presence of mutation clusters in the several E.coli genomes after the

mutagenesis by ethyl methanesulphonate (EMS), it is important to see whether this phenomenon is

unique only for Gram-negative E.coli, or also might be found in Gram-positive bacteria species as

B.subtilis after the EMS treatment. The use of Ion-Torrent Next-Generation Sequencing technology

allows analyzing several bacterial genomes in one run. The results of the current study showed

presence of mutation clusters in the genome of B.subtilis. In addition, further work is required to

understand molecular basics of mutation clusters in ΔAda and ΔMutS E.coli strains.

Феномен неравномерного рспределения мутаций в геноме наблюдается на протяжении многих

лет. Недавние исследования показали наличие определенных кластеров мутаций в геномах

рака, дрожжах, Big Blue mouse, ретроэлементах, а также в бактериях под действием ДНК-

повреждающих агентов. Подобные кластеры были обнаружены в определенных регионах

генома, накопливаясь в течение нескольких поколений. Глубокое понимание эффекта

мутагенеза стало возможным с развитием секвенирования нового поколения, появление

которого обеспечивает детальный и точный анализ широкого спектра мутаций на высокой

скорости, генерируя высококачественные данные. Поэтому подобный метод широко

распространен в геномных исследованиях. Изучение мутагенеза в бактериях необходимо, так

как это может стать потенциальным антибактериальным лечением или может открыть новые

аспекты организации генома бактерий. Так как предыдущее исследование показало наличие

кластеров в геноме кишечной палочки после мутагенеза этилметаносульфонатом, важно

проследить является ли данный феномен уникальным только для грамотрицательнойой E.coli

или же также присутствует в грамоположительном B.subtilis. Использование технологии Ion-

Torrent Next-Generation Sequencing позволяет проанализировать несколько геномов бактерий

за один пробег. Результаты данного исследования показали наличие мутационных кластеров в

геноме B.subtilis. К тому же требуется дальнейшая работа, чтобы понять молекулярные основы

кластеров мутаций в штаммах ΔAda и ΔMutS E.coli.

Alexander Shevtsov

National Center for Biotechnology, Kazakhstan

NGS sequencing of veterinary pathogens

The previous two decades have led to a reduction in saiga populations by 95%, which connected with

the uncontrolled providence in the period 1994-2003. Various measures helped to reverse the

situation, and in 2013 in Kazakhstan, the number of saiga population has increased 5 times and

amounted to 110 thousand. However, despite the growth of the saiga population in Kazakhstan, they

are still in danger of extinction from infectious diseases. The main cause of the mass death of saiga in

Kazakhstan was recognized as pasteurellosis, a zoonotic disease of vertebrate animals, which is the

etiological agent of P. multocida. Despite the high ecological damage done by pasteurellosis there is a

little information about the genetic factors of the high pathogenesis of causative agent selected from

the saiga. In this research there was carried out whole genome sequencing of three strains of P.

multocida. The strain of P. multocida Z-1 was isolated from the Ural population fallen during the

outbreak in 2010 which killed a third of the population (11,920 individuals). Strains of P. multocida Z-

3 and P. multocida K-1 isolated from Betpakdalasaiga populations during outbreaks of 2012 and 2013.

Whole genome sequencing with the using IonTorrent allowed to get 2,184,434 readings for strain P.

multocida Z-1, 2,212,653 readings for strain P. multocida Z-3, 1,893,014 readings for strain P.

multocida K-1, with an average length of about 160 bp reads . The collected genomes were as follows:

2288383, 2336270 and 2303903 bp respectively. Despite the fact that two saiga populations do not

cross in the wild, strains of P. multocida isolated from them have a large set of identical genes (2025),

which is comprised of 92.6%-95,8% of the predicted proteins, which exceeds the numerical value of

the major genes previously analyzed Pasteurella spp. Meanwhile the strains from bekpakdalasaiga

populations have the largest pool of common genes compared with the Ural population. A

comparative analysis of the genomes of strains isolated from Saiga 11 with the genomes of strains

isolated from mammals and 3 genomes of birds revealed the unique genes of strain Z1 (24 genes), Z3

(35 genes) and K1 (21 genes). Most of these genes are identical bacteriophages or were small

predicted proteins of unknown function (s1). 40 genes have been characterized for all three analyzed

strains.

Предшествующие два десятилетия привели к сокращению популяций сайги на 95%, что связано

с неконтролируемым промыслом в период 1994-2003 годов. Различные меры позволили

переломить ситуацию, и уже в 2013 году в Казахстане численность популяции сайгаков

увеличилась в 5 раз и составила 110 тыс. Однако, несмотря на рост популяций сайги в

Казахстане, им до настоящего времени угрожает опасность исчезновения от инфекционных

заболеваний. Основной причиной массовой гибели сайгаков в Казахстане был признан

пастереллез, зоонозная болезнь позвоночных животных, этиологическим агентом которой

является P.multocida. Несмотря на высокий экологический урон от данного заболевания мало

информации о генетических факторов высокого патогенеза возбудителя выделенного от сайги.

В данном исследовании было проведено полногеномное секвенирование трех штаммов P.

multocida. Штамм P. multocida Z-1 был изолирован от сайгака уральской популяции павшего в

период вспышки 2010 г. унесшей треть популяции (11920 особей). Штаммы P. multocida Z-3 и P.

multocida К-1 изолированы от сайгаков бетпакдалинской популяции в период вспышек 2012 и

2013 годов.

Полногеномное секвенирование с использованием IonTorrent, позволило получить 2,184,434

прочтений для штамма P. multocida Z-1, 2,212,653 прочтений для штамма P. multocida Z-3 и

1,893,014 прочтений для штамма P. multocida K-1, со средней длиной прочтений около 160 п.н.

Собранные геномы составили: 2288383, 2336270 и 2303903 п.н.

соответственно. Несмотря на то что две популяции сайги в естественных условиях не

пересекаются, штаммы P. multocida выделенные от них имеют большой набор идентичных

генов (2025), который составил 92,6%-95,8% предсказанных белков, что превышает численное

значение основных генов ранее анализируемых пастерелл. При этом штаммы от сайгаков

бекпактдалинской популяции имеют наибольший пул общих генов, в сравнении с уральской

популяцией. Сравнительный анализ геномов штаммов выделенных от сайги 11 с геномами

штаммов выделенных от млекопитающих и 3 геномами птиц позволил выявить уникальные

гены для штамма Z1 (24 гена), Z3 (35 генов) и K1 (21 ген). Большинство из этих генов идентичны

бактериофагам или короткими белками с неизвестной функцией. Сорок генов были характерны

для всех трех анализируемых штаммов.

Aizhan Turmagambetova


Detection of viruses in environmental samples using NGS

Diagnostics of viral infections is on the verge of creating of new theories, hypotheses and discoveries

with the advent of NGS. This is due to several reasons, the most important of which are a multiple

increase of data about the availability of viruses in the environment, including soil, water, feces, air,

etc., as well as the ability to analyze of viruses without their cultivation.

In our research, we studied the biodiversity of viruses in the water reservoirs of Almaty region.

Sequencing was carried out by a double-barrel shotgun method. In this case the useful information

could be obtained by paired-end sequencing of DNA fragment. These two sequences are oriented in

opposite directions and along of the length of the fragment can be separated from each other, and

also can be used for genome assembling using different software. In our research was used the HiSeq

sequencing system and Edena software. Total contigs were 447,000 with a length of 200 to 80,000 bp.

The Metavir2 program selected the 184,431 contigs and the 249,780 of which was identified as viral

gene sequences, and 157,000 of which are previously unknown viral sequences.

Bacteriophages, algae viruses and viruses of the protozoa were the 97% of total viruses of this water

sample. Other 3% included the viruses capable of causing of the disease of animals, higher plants and

humans. Among them: 2 families of retro-transcribing viruses (Retroviridae, Caulimoviridae), 2

families of single-stranded RNA viruses (ssRNA viruses), family of single-stranded DNA virus (ssDNA

viruses - Inoviridae), family of double-stranded RNA virus (dsRNA viruses - Endornaviridae) and 20

families of double-stranded DNA viruses (dsDNA viruses, among them Herpesviridae, etc) were

detected.

Thus, the NGS is opening a new era in the development of monitoring of viral infections that allows

take a different look at the ecology of viruses.

Диагностика виусов в образцах окуружающей среды с помощью массивного параллельного

секвенирования

С появлением массивного параллельного секвенирования (NGS – next generation sequencing)

диагностика вирусных инфекций стоит на пороге создания новых теорий, гипотез и получения

новых открытий. Это обусловлено рядом причин основными из которых являются многократное

увеличение данных о наличии вирусов в окружающей среде, включая почву, воду, экскременты,

воздух и т.д., а также способность анализировать наличие вирусов без их предварительного

культивирования.

В наших исследованиях проводилось изучение биоразнообразия вирусов в водоемах

Алматинской области. Секвенирование проводилось методом двуствольного дробовика. В этом

случае полезная информация может быть получена при секвенировании парных концов

фрагмента ДНК. Эти две последовательности ориентированы в противоположных направлениях

и по длине фрагмента могут быть отдельны друг от друга, но могут быть использованы для

сборки геномов с помощью различного программного обеспечения. В наших исследованиях

был использован секвенатор HiSeq и программа Edena.

Было получено 447000 контигов с длиной от 200 до 80000 пар оснований, из которых программа

Metavir2 отобрала 184431, в которых идентифицировала последовательности 249780 вирусных

гена, при этом 157000 из них, это ранее неизвестные вирусные последовательности.

97% вирусов данного водного образца составляли бактериофаги (5 семейств), вирусы

водорослей (1 семейство) и простейших (1 семейство). Остальные 3% пришлись на вирусы

способные вызывать заболевания животных, высших растений и человека. Обнаружено 2

семейства ретро транскрибируемых вирусов (Retroviridae, Caulimoviridae), 2 семейства вирусов

с одноцепочечной РНК (ssRNA viruses), 1 семейство вирусов с одноцепочечной ДНК (ssDNA

viruses - Inoviridae), 1 семейство вирусов с двуцепочечной РНК (dsRNA viruses - Endornaviridae) и

20 семейств вирусов с двуцепочечной ДНК (dsDNA viruses, среди которых такие как Herpesviridae

и т.д.).

Таким образом, NGS открыло новый этап в развитии мониторинга вирусных инфекций,

позволяющий по-другому взглянуть на экологию вирусов.

Kobey Karamendin


NGS 16S sequencing of necropsy material from Saiga antelope after a mass die-off in Spring 2015

During metagenomic studies using MiSeq sequencer to identify bacterial infections pathogens in Saiga

it was determined that 89.05% of all short reads were of bacteria of the genus Pasteurella, among

which the Pasteurella multocida species reached 48.32%. Other species were: Pasteurella eae - 10.75

%, Pasteurella pneumotropica - 4.06 %, Unclassified at Species level - 34.91 %.

При метагеномных исследованиях на секвенаторе MiSeq для выявления возбудителей всех

бактериальных инфекций определено, что 89.05 % всех коротких прочтений составляли

бактерии рода Pasteurella, среди которых преобладал вид Pasteurella multocida и составил 48,32

%. Из других видов обнаружены: Pasteurella eae - 10.75 %, Pasteurella pneumotropica - 4.06 %,

неопознанные виды - 34.91 %.

Participants

Name Area of research Country Institute

Abdikerim, Saltanat Molecular genetics KZ IGGC

Akhmetova, Ainur Genetics of Human Diseases KZ NU

Akilzhanova, Ainur Genomic and Personalized medicine KZ NU

Alexyuk, Madina antiviral protection research, metagenomics KZ IMV

Amirbekov, Aday Immunogenetic aspects of cancer screening KZ MU

Amirgazin, Asylulan Bacterial genomics KZ NCB

Bogoyavlenskiy, Andrey antiviral protection research, metagenomics

KZ IMV

Daugaliyeva, Saule Microbiology, metagenomics KZ IMV

Jantayeva, Kira Population genetics KZ MU

Jarmukhanov, Zharkyn Human genetics KZ NCB

Kachieva, Zulfiya Human diseases KZ MU

Kahbatkyzy, Nurzhibek Population genetics KZ IGGC

Kairov, Ulykbek Bioinformatics & KZ NU

Kamalova, Dinara Bacterial genomics KZ NCB

Karamendin, Kobey Viral ecology, evolution KZ IMV

Kozhamkulov, Ulan Microbiology, molecular epidemiology KZ NU

Kulnazarov, Batyr Microbiology, metagenomics KZ IMV

Kuzovleva, Elena Population genetics KZ IGGC

Kydyrmanov, Aidyn Viral ecology, evolution KZ IMV

Moldakozhayev, Alibek Viral ecology, evolution KZ IMV

Nugmanova, Raushan Bacterial genomics KZ NCB

Nurmoldin, Shalkar Thyroid cancer research KZ MU

Perfilyeva, Anastasiya Molecular genetics KZ IGGC

Rakhimova, Saule Genetic studies of multifactorial diseases KZ NU

Shevtsov, Alexandr Bacterial genomics KZ NCB

Turmagambetova, Aizhan antiviral protection research, metagenomics

KZ IMV

Zholdybayeva, Elena Viral genetics KZ NCB

Zhunussova, Gulnur Molecular genetics KZ IGGC

Torokeldiev, Nurlan Population genetics KRG IAUB

Zhanibek Egizbayev Illumina representative KZ ILLM

Govorovskiy, Vladislav Illumina representative BLR ILLM

Carr, Ian Bioinformatics & health UK UoL

Dawson, Deborah Population & ecological genetics UK UoSh

Duncan, Elizabeth Genomics & evolutionary biology UK UoL

Dunn, Jennifer Disease ecology, conservation UK RSPB

Ford, Antonia Population & ecological genetics, genomics UK UoB

Forde, Niamh Reproductive biology UK UoL

Goodman, Simon Population genetics, disease ecology, conservation UK UoL

Hipperson, Helen Bioinformatics & population genetics UK UoSh

Knight, Christopher Microbial systems biology UK UoM

O'Connell, Mary Computational biology UK UoL

Stockdale, Jennifer Disease ecology, conservation UK UoC

Taylor, Morag Cancer genetics UK UoL

KZ – Kazakhstan

UK – United kingdom

KRG – Kyrgyzstan

BLR - Belarus

UoL - University of Leeds

UoM - University of Manchester

UoSh - University of Sheffield

UoC - University of Cardiff

UoB - University of Bangor

RSPB - Royal Society for Protection of Birds

ILLM – Illumina Corp. IMV - Institute of Microbiology and Virology

MU – Medical University

NCB - National Center for Biotechnology

IGGC – Institute of General Genetics and Cytology

NU - Nazarbayev University

IAUB - International Ala-Too University in Bishkek

Enhancing capacity for next generation sequencing (NGS ......Diagnostic mutation detection using...

Documents

Transcript of Enhancing capacity for next generation sequencing (NGS ......Diagnostic mutation detection using...