Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing,...
-
Upload
domenic-jeffrey-curtis -
Category
Documents
-
view
214 -
download
0
Transcript of Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing,...
![Page 1: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/1.jpg)
Basic Concepts of Bioinformatics
Mitcon Biopharma Chaitanya Velhal
![Page 2: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/2.jpg)
What is BIOINFORMATICS
All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts of biological information in databases. gene sequences, biological activity/function, pharmacological activity, biological structure, molecular structure, protein-protein interactions, gene expression. Uses computers and statistical techniques to accomplish research objectives, for example, to discover a new pharmaceutical or herbicide.
![Page 3: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/3.jpg)
3
Biology Chemistry
StatisticsComputer
Science
Bioinformatics
![Page 4: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/4.jpg)
Bioinformatics encompasses the use of tools and techniques from three separate disciplines;
•Molecular biology (the source of the data to be analyzed),• Computer science (supplies the hardware for running analysis and the networks to communicate the results),• Data analysis algorithms which strictly define bioinformatics.
![Page 5: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/5.jpg)
All of the information needed to build an organism is contained in its DNA. If we could understand it, we would know how life works.◦ Preventing and curing diseases like cancer
(which is caused by mutations in DNA) and inherited diseases.
◦ Curing infectious diseases (everything from AIDS and malaria to the common cold). If we understand how a microorganism works, we can figure out how to block it.
◦ Understanding genetic and evolutionary relationships between species
◦ Understanding genetic relationships between humans. Projects exist to understand human genetic diversity. Also, sequencing the Neanderthal genome. Ancient DNA: currently it is thought that under ideal
conditions (continuously kept frozen), there is a limit of about 1 million years for DNA survival. So, Jurassic Park will probably remain fiction.
Why it’s useful
![Page 6: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/6.jpg)
DNA is just a long string of 4 letters (nucleotides, or bases): Adenine, Guanine, Cytosine, and Thymine.◦ Which we will just refer to as A,
C, G, and T
Each DNA molecule has 2 strands, with the bases paired in the center◦ A on one strand always pairs
with T on the other strand◦ G pairs with C.◦ the strands run in opposite
directions (like roads) Since the two DNA strands
are complementary, there is no need to write down both strands
DNA
![Page 7: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/7.jpg)
each chromosome is a long piece of DNA◦ B. megaterium genome is a circle (like most
bacteria) of about 5 million bases.◦ Human chromosomes are 100-200 million bases
long. We have 46 chromosomes (2 sets of 23, one set from each parent).
genes are just regions on that DNA. It is not obvious where genes are if you look at a DNA sequence.◦ there is a lot of DNA that is not part of genes: in
humans only 2% at most of the DNA is part of any gene.
◦ Bacteria use more of their DNA: 80% of the B. meg chromosome is genes.
B. meg has about 1 gene per 1000 base pairs (bp) of DNA. About 5000 genes
Humans have about 25,000 genes. ◦ We are far more complicated than bacteria:
regulation of the genes is very complicated in humans
◦ We use the same gene in different ways in different tissues
Chromosomes and Genes
![Page 8: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/8.jpg)
Most genes code for proteins: each gene contains the information necessary to make one protein.
Proteins are the most important type of macromolecule.◦ Structure: collagen in skin, keratin in hair,
crystallin in eye.◦ Enzymes: all metabolic transformations,
building up, rearranging, and breaking down of organic compounds, are done by enzymes, which are proteins.
◦ Transport: oxygen in the blood is carried by hemoglobin, everything that goes in or out of a cell (except water and a few gasses) is carried by proteins.
◦ Also: nutrition (egg yolk), hormones, defense, movement
Genes and Proteins
![Page 9: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/9.jpg)
Proteins are long chains of amino acids. There are 20 different amino acids coded in
DNA There are only 4 DNA bases, so you need 3 DNA
bases to code for the 20 amino acids◦ 4 x 4 x 4 = 64 possible 3 base combinations
(codons)◦ Each codon codes for one amino acid◦ Most amino acids have more than one possible
codon Genes start at a start codon and end at a stop
codon. 3 codons are stop codons: all genes end at a
stop codon. Start codons are a bit trickier, since they are
used in the middle of genes as well as at the beginning◦ in eukaryotes, ATG is always the start codon,
making Methionine (Met) the first amino acid in all proteins (but in many proteins it is immediately removed).
◦ In prokaryotes, ATG, GTG, or TTG can be used as a start codon. B. meg prefers ATG, but about 30% of the genes start with GTG or TTG.
The Genetic Code
In bioinformatics, we generallyignore the fact that RNA uses thebase uracil (U) in place of T.
![Page 10: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/10.jpg)
Brief history of bioinformatics: other important steps
• Development of sequence retrieval methods (1970-80s)
• Development of principles of sequence alignment (1980s)
• Prediction of RNA secondary structure (1980s)
• Prediction of protein secondary structure and 3D (1980-90s)
• The FASTA and BLAST methods for DB search (1980-90s)
• Prediction of genes (1990s)
• Studies of complete genome sequences (late 1990s –2000s)
![Page 11: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/11.jpg)
Organizing biological knowledge in databases Biological raw data are stored in public databanks (such as Genbank or EMBL for primary DNA sequences).
The data can be submitted and accessed via the world wide web.
Protein sequence databanks like trEMBL provide the most likely translation of all coding sequences in the EMBL databank. Sequence data are prominent, but also other data are stored, e. g. yeast two–hybrid screens, expression arrays, systematic gene–knock–out experiments, and metabolic pathways.
![Page 12: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/12.jpg)
Data Schema in Warehousing :A Gene Expression Example
Gene ExpressionWarehouse
ProteinDisease
SNP
Enzyme
Pathway
Known Gene
SequenceCluster
Affy Fragment
Sequence
LocusLink
MGD
ExPASySwissProt
PDBOMIM
NCBIdbSNP
ExPASyEnzyme
KEGG
SPAD
UniGene
Genbank
NMR
Metabolite
![Page 13: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/13.jpg)
“Ten Important Bioinformatics Databases”
GenBank www.ncbi.nlm.nih.gov nucleotide sequences
Ensembl www.ensembl.org human/mouse genome (and others)
PubMed www.ncbi.nlm.nih.gov literature references
NR www.ncbi.nlm.nih.gov protein sequences
SWISS-PROT www.expasy.ch protein sequences
InterPro www.ebi.ac.uk protein domains
OMIM www.ncbi.nlm.nih.gov genetic diseases
Enzymes www.chem.qmul.ac.uk enzymes
PDB www.rcsb.org/pdb/ protein structures
KEGG www.genome.ad.jp metabolic pathways
Source: Bioinformatics for Dummies
![Page 14: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/14.jpg)
Genome
Protein
Gene = DNA
RNA Primary Sequence
Gene therapy DrugsInhibitors/activators
DNA binding drugs RNA binding drugs
Central dogma of modern drug discovery
![Page 15: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/15.jpg)
Drug DesignThe information present in DNA is expressed via RNA molecules into proteins which are responsible for carrying out various activities.
This information flow is called the central dogma of molecular biology .
Potential drugs can bind to DNA, RNA or proteins to suppress or enhance the action at any stage in the pathway
![Page 16: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/16.jpg)
All organisms self replicate due to the presence of genetic material DNA, the polynucleotide consisting of four bases Adenine (A), Thymine (T), Guanine (G) and Cytosine (C)
The entire DNA content of the cell is known as the genome. The segment of genome that is transcribed into RNA is called a gene.
Hereditary information is transferred in the form of genes containing the four bases. Understanding these genes is one of the modern day challenges.
![Page 17: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/17.jpg)
History of BioinformaticsYear Subject Name MBP
(Millions of base pairs)
1995 Haemophilus Influenza 1.8
1996 Bakers Yeast 12.1
1997 E.Coli 4.7
2000 Pseudomonas aeruginosa A. ThalianaD. Melonagaster
6.3100180
2001 Human Genome 3,000
2002 House Mouse 2,500
![Page 18: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/18.jpg)
We have sequenced and identified genes. So we know what they do
The sequences are stored in databases
So if we find a new gene in the human genome we compare it with the already found genes which are stored in the databases.
Since there are large number of databases we cannot do sequence alignment for each and every sequence
So heuristics must be used again.
18
Database Searches
![Page 19: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/19.jpg)
Sequence info is stored in databases
So that they can be manipulated easily
The db are located at diff places They exchange info on a daily
basis so that they are up-to-date and are in sync
Primary db – sequence data
19
Databases
![Page 20: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/20.jpg)
As there are many db which one to search? Some are good in some aspects and weak in others?
Composite db is the answer – which has several db for its base data
Search on these db is indexed and streamlined so that the same stored sequence is not searched twice in different db
20
Composite DB
![Page 21: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/21.jpg)
OWL has these as their primary db◦ SWISS PROT (top priority)◦ PIR◦ GenBank◦ NRL-3D
21
Composite DB
![Page 22: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/22.jpg)
Because of the multicellular structure, each cell type does gene expression in a different way –although each cell has the same content as far as the genetic
i.e. All the information for a liver cell to be a liver cell is also present on nose cell, so gene expression is the only thing that differentiates
22
Genomics
![Page 23: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/23.jpg)
Gene in sequence data – needle in a haystack
However as the needle is different from the haystack genes are not diff from the rest of the sequence data
Is whole array of nt we try to find and border mark a set o nt as a gene
This is one of the challenges of bioinformatics
23
Genomics - Finding Genes
![Page 24: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/24.jpg)
Organism Genome Size (Mb) bp * 1,000,000
Gene Number
Web Site
Yeast 13.5 6,241 http://genome-www.stanford.edu/Saccharomyces
Fruit Flies 180 13,601 http://flybase.bio.indiana.edu
Homo Sapiens
3,000 45,000 http://www.ncbi.nlm.nih.gov/genome/guide
![Page 25: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/25.jpg)
Proteome is the sum total of an organisms proteins
More difficult than genomics
◦ 4 20◦ Simple chemical makeup complex◦ Can duplicate can’t
25
Proteomics
![Page 26: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/26.jpg)
Is one of the biggest challenges of bioinformatics and esp. biochemistry
No algorithm is there now to consistently predict the structure of proteins
26
Protein Structure Prediction
![Page 27: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/27.jpg)
Comparative Modeling◦ Target proteins structure is compared with related
proteins◦ Proteins with similar sequences are searched for
structures
27
Structure Prediction methods
![Page 28: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/28.jpg)
The taxonomical system reflects evolutionary relationships
Phylogenetic trees are things which reflect the evolutionary relationship thru a picture/graph
Rooted trees where there is only one ancestor Un rooted trees just showing the relationship Phylogenetic trees reconstruction algorithms are
also an area of research
28
Phylogenetics
![Page 29: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/29.jpg)
Pharmacogenomics◦ Not all drugs work on all patients, some good drugs cause
death in some patients◦ So by doing a gene analysis before the treatment the
offensive drugs can be avoided◦ Also drugs which cause death to most can be used on a
minority to whose genes that drug is well suited.◦ Customized treatment
Gene Therapy◦ Replace or supply the defective or missing gene◦ E.g: Insulin and Factor VIII or Haemophilia
29
Medical Implications
![Page 30: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/30.jpg)
Diagnosis of disease◦ Identification of genes which cause
the disease will help detect disease at early stage e.g. Huntington disease -
Symptoms – uncontrollable dance like movements, mental disturbance, personality changes and intellectual impairment
Death in 10-15 years The gene responsible for the disease
has been identified Contains excessively repeated sections
of CAG So once analyzed the couple can be
counseled
30
Diagnosis of Disease
![Page 31: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/31.jpg)
Can go up to 15yrs and $700million One of the goals of bioinformatics is to
reduce the time and cost involved with it. The process
◦ Discovery Computational methods can improves this
◦ Testing
31
Drug Design
![Page 32: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/32.jpg)
Target identification◦ Identifying the molecule on which the germs
relies for its survival◦ Then we develop another molecule i.e. drug
which will bind to the target◦ So the germ will not be able to interact with
the target.◦ Proteins are the most common targets
32
Discovery
![Page 33: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/33.jpg)
For example HIV produces HIV protease which is a protein and which in turn eat other proteins
This HIV protease has an active site where it binds to other molecules
So HIV drug will go and bind with that active site◦ Easily said than done!
33
Discovery…
![Page 34: Mitcon Biopharma Chaitanya Velhal. What is BIOINFORMATICS All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts.](https://reader035.fdocuments.us/reader035/viewer/2022062518/5697bf931a28abf838c8fe02/html5/thumbnails/34.jpg)
Lead compounds are the molecules that go and bind to the target protein’s active site
Traditionally this has been a trial and error method
Now this is being moved into the realm of computers
34
Discovery…