Drug Target Discoveryby
Genome Analysis
genome
AREXIS
Model Drug
97% total67% finished
Species # of genes %known functionE. coli 4.289 62S. cerevisie 6.217 65
C. elegans 19.000 ? M. musculus 30-50.000 ≈10H. sapiens 30-50.000 ≈15
gap
time
x106
Link genesto
biologicalfunctions
Link genesto
biologicalfunctions
20001995
0.5
1
1990
Bioinformatik?
• Bioinformatik - det forskingsområde som behandlar och analyserar “bioinformation”
• Bioinformation - den information som finns lagrad i:– genom-data (gener, genuttryck, genfunktion, etc i relation till den
organism som härbärgerar genomet i fråga) – biologiska sekvenser och,– relationer mellan biologiska sekvenser, med avseende på biologiska
organismers funktion (metabolism, hälsa, etc)
• Bioinformatik skall ge idéer och förslag till nya våta experiment
• Forskare med bioinformatik som experimentellt verktyg (in silico biologi)
Animal models
Why animal models?
• Genetically homogeneous• Controlled environmental
influence • Large family sizes give optimal
statistical power• Tools to define and characterise
disease causative genes and mechanisms
• In vivo validation and in vivo pharmacology
• Increase productivity• Higher resolution
Marketingof new
products
Clinical development
Drugdiscovery
Targetvalidation
Targetdiscovery
Genetic analysis
Diseasemodels
Research and development strategy
Academicpartners
Industrialpartners
Arexis
Integrated biology-driven discovery
In vivo pharmacology
Medicinal chemistryHuman patient materialsComparative biology
Clinical scienceBiotechnology expertiseBioinformatics
Functional genomics
Arexis
Explo
rato
ry
rese
arch
Prioritised projects
X
X
X
XType 2 diabetes
Obesity
Inflammatory diseases
Metabolic diseases
Multiple sclerosis
Skin inflammation
Immunotherapy
Rheumatoid arthritis
Pre-c
linic
al
deve
lopm
ent
Clinic
al
deve
lopm
ent
SCCE
Muc. A
AMPK
X
R&D project overview
Research collaborations
Sub-contracts Partnerships
Targeted In-licensing
Drug development/commercialisation
Target and Drug discovery
Input to the Arexis pipeline and portfolio
Revenue sources
Commercialisation process Early Mid Late
Spin-offopportunities
Access feesResearch funding
Milestone payments
Royalties
Target and Drug discovery
Business model
Organisation build-up plan
2001 2002 2003 2004 2005 2006
Management 3 3 4 5 5 5Administration 2 3 4 5 5
Accumulated 3 5 7 9 10 10
R&DBioinformatics 2 3 5 8 10 11Biology 3 10 21 32 45 57Chemistry 2 4 6 8 13Clinical development 1 2 3 4 6
Accumulated 5 16 32 49 67 87
Total 8 21 39 58 77 97
Management & Administration
Anders Vedin, Chairman of the BoardProfessor, Senior Advisor InnovationsKapital AB
Henry Geraedts, Deputy Chairman of the BoardPhD, Independent director, 3i
Carl ChristenssonCEO SEB Företagsinvest
Rikard HolmdahlProfessor of Medical inflammation, founder
Lennart HanssonPhD, Chief Executive Officer
Leif AnderssonProfessor of Animal Genetics, founder
Curt LönnströmChief Executive Officer of Ryda Bruk
Board of Directors
databasewith annotatedexperiments
Ensemblauto-annotated
genomes
curatedgene
structures
Targetdatabase
relevantgenes
pointers tophenotype-related
genes
Affymetrixexperiment, and
experimentaldata
Expression profiling
pointers todisease loci
aGDBgenetic/linkage
data
Genetic approaches in silico approaches
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
integration
phenotype-relatedpathways
Commercialpartners
Academicpartners
Academicpartners
DASDASDASDASArexis-users
Arexis-users
IT System Architecture
DASDASDASDASArexis-users
Arexis-users
LDAPLDAP
documents
Arexis intranet
vpnvpn
economyeconomy
business devbusiness dev
GIMGIM
aGDBaGDB
Research System Architecture
tools forsequenceanalysis
tools forsequenceanalysis
tools forexpression
data analysis
tools forexpression
data analysis
project B
common ancestor
mouse homo
project C
homomouse
rat
common ancestor
AMPK
common ancestor
homomouse
pig
??
Tissue section of skeletal muscle fiber from Hampshire pigs
Normal rn+/rn+ Mutant RN-/rn+ or RN-/RN-
AMPK
Tissue distribution of AMPK -chains
AMPK
3
2
1
1
2
21
AMP-activated kinase (AMPK) - a heterotrimeric enzyme
Hea
rt
Bra
in
Pla
cent
a
Lung
Live
r
Mus
cle
Kid
ney
Pan
crea
s
Spl
een
Thy
roid
gla
nd
Pro
stat
e
Tes
tis
Ova
ry
Sm
all i
ntes
tine
Col
on
Per
iphe
ral B
lood
A skeletal muscle-specific variant of AMPK
1
2
3
Modified from Shepherd et al. NEJM 1999
AMPK
AMPK
Pathways regulating glucosetransport in muscle cells
chr. 5 mouse
chr. 7 human
Link to patophysiology?
Pathway analysis!
Experimental validation
genetic mapping
AMP
AMP
AMP
AMPKK
P
ProteinPhosphatase
2C
AMPK
AMPK
Malonyl CoA Fatty acid
PAcetyl-CoA
Carboxylase
Malonyl-CoADecarboxylase
P
Acetyl-CoACarboxylase
Malonyl-CoADecarboxylase
inactive
active
Acetyl CoA
Increased glucose uptake
Decreased glycogen degradation
Increased amount of GLUT4
ProteinPhosphatase
2A
SusceptibleDA rat
ResistantE3 rat
Pristane induced arthritis in the rat
human (2.4 Mbp)
mou
se (
1 M
bp)
duplicated genomic segments
position ofmouse gene
Genomics data Expression data
integrate / analyse / visualise
Reconstruction of Pathway
NOVEL
Drug Target
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Database resources
EMBL
GenBank
DNAprotein
PIR
Swissprot
families
BlocksPfam
genomesbibliography
GDB
MGI
AceDB
MIPS
NCBINCBI EBIEBI
motifs
ProSite
PubMed
DDBJDDBJNational Center for Biotechnology Information
European Bioinformatics InstituteDNA Databank of Japan
04/18/23
Where do sequences come from?
mRNA
DNA
protein
cDNA sequence
•Directed / small-scale•Random / large-scale
• Expressed Sequence Tag [EST]
genomic sequence
•Directed / small-scale•Large-scale : BAC, YACs
protein sequence
•Directed, very little
Sequence databasesNucleotide databases
GenBank EMBLDDBJ
International Nucleotide Sequence Database
Collaboration
Sequence databasesPrimary vs secondary databases
• Primary database = sequence database
– eg EMBL, GenBank, SWISSPROT– Each record describes individual
sequence– Can be contain either nucleotide or
protein sequences
Seq 1
ACGTTT
Seq 2
CTAGAC
Seq 3
TTCTGA
Sequence databasesPrimary vs secondary databases
• Secondary database = pattern database
– eg PROSITE, PRINTS, BLOCKS, Pfam– Each record describes a set of
sequences– Set can be expressed as a motif,
multiple sequence alignment or probabilistic model
Pattern 1accagtgtacgactct
Pattern 2tacgtagctacctacctaggtagc
Pattern 3ttcgatgtcattcgatcgcatccgatcgtc
Sequence databasesNucleotide databases
• How do the databases compare?– Three databases are 99.99% identical– Annotations can be slightly different
• How often are they updated?– New release of databases every 3 months– Interim releases - EMBL-new
• Can the annotations be trusted?– Not always - some estimates suggest 25% are incorrect
EST
Non-EST
Sequence databasesNucleotide databases
• EMBL is subdivided into EST and non-EST sequences
hum
vrt
rod
mam
Sequence databasesProtein databases
GenBank EMBL
GenPept TrEMBL
PIR SWISSPROT
Sequence databasesProtein databases
EMBL
• 13,700,000 entries
• TrEMBL split into:– SP-TrEMBL - Sequences destined for SWISSPROT– REM-TrEMBL - Remaining sequences
REM
SP
SWISSPROT
• Sequences manually moved to SWISSPROT
• Because it is manually curated, annotations are reliable!
• 106,602 entries
TrEMBL • 558,150 entries
• Coding sequences automatically translated
Sequence databasesSummary
• EMBL is main nucleotide sequence database (Europe)• TrEMBL is an automated translation of EMBL• SWISSPROT is main curated protein database• Between main releases, interim releases are made
– eg EMBL-new, TrEMBL-new, SWISSPROT-new• EMBL is subdivided into EST / non-EST then by species• Annotations can be trusted in SWISSPROT, not in EMBL• Accession numbers uniquely identify a sequence and remain
constant when entries are updated
Basics of sequence searching Methods
Method Accuracy Duration Example
Rigorous +++++ +++++ Smith-Waterman
Heuristic ++ + BLAST, FASTA
Probabilistic ++++ +++ HMM
• Probabilistic methods are best, but can be slow and difficult to use
• Rigorous are good when used on a small subset of sequences, but too slow to search large sequence database
• Heuristic methods are the best place to start
Basics of sequence searchingTerminology
• Sensitivity vs Selectivity– Sensitivity searching will find weaker hits– Selectivity searching less likely to find unrelated hits– Increased sensitivity means more true positives– Increased selectivity means fewer false positives
Searching with BLASTHow it works
Find identical stretches of nucleotides in two sequences
Query sequence
Sequence in database
HSPExtend regions of similarity as far as possible
HSP 1 HSP 2
Identify all regions of similarity
Local vs global comparisonsThe nature of proteins
• Proteins consist of functional and structural units - domains
Local vs global comparisonsWhat is a local and global comparison?
Global comparison attempts to match all of one sequence against another
Local comparison attempts to match short stretches of one sequence with another
Local vs global comparisonsWhen should each technique be used?
• Global comparisons– Closely related sequences– Same general structure of sequence– Roughly equal lengths
• Local comparisons– Sequences not closely related– Sequence fragments– Interested in identifying common domains
Local vs global comparisons When should each technique be used?
Global comparison will attempt to match all of one sequence against another even when sequences share only one common domain Common
domain Non-matching
domains
Global comparison should only be used if the sequences being compared have a common domain structure
Common domain
Common domain
Domain uniqueto one sequence
Local vs global comparisonsSummary
• Proteins are organised into domains• Local comparisons find short stretches of similarity• Global comparisons match the whole length of one
sequence against another• Local comparisons should be used unless sequences
are closely related and have identical domain structures.
Searching with BLASTSearch with DNA or protein?
• Use DNA if– There are frameshifts - common in ESTs– Interested in evolution (3rd base in codon hidden in translation)
• Otherwise, use protein sequence. Why?– Two DNA sequences can be aligned in six ways– Each alignment can give scores, therefore more partial matches– Therefore there is more noise associated with comparison– Statistical significance of good hits are thus reduced.
Searching with FASTABLAST vs FASTA
• Advantages of BLAST– Faster than FASTA
– Reports all high-scoring local alignments
• Advantages of FASTA– More sensitive - approaches that of rigorous methods
– Faster than rigorous methods
– E-values are more accurate
– Better handling of frameshifts - important for ESTs.
Basics of sequence searchingSummary
• Sequence searching is complicated because we want to find partial matches
• Search method should be sensitive and selective • Rigorous methods are much more sensitive than
heuristic methods, but are too slow
Secondary databasesDatabases available - Prosite
• 1492 regular expressions• Each entry consists of two files
– Text file with information on family
– A regular expression and matching sequences
ID PROTEIN_KINASE_TYR; PATTERN.AC PS00109;
DT APR-1990 (CREATED); DEC-1992 (DATA UPDATE); JUL-1998 (INF UPDATE).
DE Tyrosine protein kinases specific active-site signature.
PA [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N-[LIVMFYC](3).
Secondary databasesDatabases available - Pfam
• Split into two sections– Pfam-A 3,071 HMMs (Curated)– Pfam-B 36,700 HMMs (Not curated)
• Each entry consists of description and alignmentID IL7AC PF01415DE Interleukin 7/9 familyAU Ponting CP, Schultz J, Bork PAL ClustalwBM hmmbuild HMM SEEDBM hmmcalibrate --seed 0 HMMDR PROSITE; PDOC00228;CC IL-7 is a cytokine that acts as a growth factor for earlyCC lymphoid cells of both B- and T-cell lineages. IL-9 is aCC multi-functional cytokine. IL7_BOVIN/28-172 DISGKDGGAYQNVLMVNIDD-LDNMINFDSNCLNNEPNFFKKHSCDDNKEASFLNRASRKIL7_HUMAN/28-173 DIEGKDGKQYESVLMVSIDQLLDSMKEIGSNCLNNEFNFFKRHICDANKEGMFLFRAARKIL7_MOUSE/28-152 HIKDKEGKAYESVLMISIDE-LDKMTGTDSNCPNNEPNFFRKHVCDDTKEAAFLNRAARK.
Secondary databasesDatabases available - InterPro
Biotechhuset modell
Biotechhuset Vy mot sydväst
Biotechhuset Annedal
http://www.arexis.com
Top Related