Download - Drug Target Discovery by Genome Analysis genome AREXIS ModelDrug.

Drug Target Discoveryby

Genome Analysis

genome

AREXIS

Model Drug

97% total67% finished

Species # of genes %known functionE. coli 4.289 62S. cerevisie 6.217 65

C. elegans 19.000 ? M. musculus 30-50.000 ≈10H. sapiens 30-50.000 ≈15

gap

time

x106

Link genesto

biologicalfunctions

Link genesto

biologicalfunctions

20001995

0.5

1

1990

Bioinformatik?

• Bioinformatik - det forskingsområde som behandlar och analyserar “bioinformation”

• Bioinformation - den information som finns lagrad i:– genom-data (gener, genuttryck, genfunktion, etc i relation till den

organism som härbärgerar genomet i fråga) – biologiska sekvenser och,– relationer mellan biologiska sekvenser, med avseende på biologiska

organismers funktion (metabolism, hälsa, etc)

• Bioinformatik skall ge idéer och förslag till nya våta experiment

• Forskare med bioinformatik som experimentellt verktyg (in silico biologi)

Animal models

Why animal models?

• Genetically homogeneous• Controlled environmental

influence • Large family sizes give optimal

statistical power• Tools to define and characterise

disease causative genes and mechanisms

• In vivo validation and in vivo pharmacology

• Increase productivity• Higher resolution

Marketingof new

products

Clinical development

Drugdiscovery

Targetvalidation

Targetdiscovery

Genetic analysis

Diseasemodels

Research and development strategy

Academicpartners

Industrialpartners

Arexis

Integrated biology-driven discovery

In vivo pharmacology

Medicinal chemistryHuman patient materialsComparative biology

Clinical scienceBiotechnology expertiseBioinformatics

Functional genomics

Arexis

Explo

rato

ry

rese

arch

Prioritised projects

X

X

X

XType 2 diabetes

Obesity

Inflammatory diseases

Metabolic diseases

Multiple sclerosis

Skin inflammation

Immunotherapy

Rheumatoid arthritis

Pre-c

linic

al

deve

lopm

ent

Clinic

al

deve

lopm

ent

SCCE

Muc. A

AMPK

X

R&D project overview

Research collaborations

Sub-contracts Partnerships

Targeted In-licensing

Drug development/commercialisation

Target and Drug discovery

Input to the Arexis pipeline and portfolio

Revenue sources

Commercialisation process Early Mid Late

Spin-offopportunities

Access feesResearch funding

Milestone payments

Royalties

Target and Drug discovery

Business model

Organisation build-up plan

2001 2002 2003 2004 2005 2006

Management 3 3 4 5 5 5Administration 2 3 4 5 5

Accumulated 3 5 7 9 10 10

R&DBioinformatics 2 3 5 8 10 11Biology 3 10 21 32 45 57Chemistry 2 4 6 8 13Clinical development 1 2 3 4 6

Accumulated 5 16 32 49 67 87

Total 8 21 39 58 77 97

Management & Administration

Anders Vedin, Chairman of the BoardProfessor, Senior Advisor InnovationsKapital AB

Henry Geraedts, Deputy Chairman of the BoardPhD, Independent director, 3i

Carl ChristenssonCEO SEB Företagsinvest

Rikard HolmdahlProfessor of Medical inflammation, founder

Lennart HanssonPhD, Chief Executive Officer

Leif AnderssonProfessor of Animal Genetics, founder

Curt LönnströmChief Executive Officer of Ryda Bruk

Board of Directors

databasewith annotatedexperiments

Ensemblauto-annotated

genomes

curatedgene

structures

Targetdatabase

relevantgenes

pointers tophenotype-related

genes

Affymetrixexperiment, and

experimentaldata

Expression profiling

pointers todisease loci

aGDBgenetic/linkage

data

Genetic approaches in silico approaches

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

integration

phenotype-relatedpathways

Commercialpartners

Academicpartners

Academicpartners

DASDASDASDASArexis-users

Arexis-users

IT System Architecture

DASDASDASDASArexis-users

Arexis-users

LDAPLDAP

mail

documents

Arexis intranet

vpnvpn

economyeconomy

business devbusiness dev

GIMGIM

aGDBaGDB

Research System Architecture

tools forsequenceanalysis

tools forsequenceanalysis

tools forexpression

data analysis

tools forexpression

data analysis

project B

common ancestor

mouse homo

project C

homomouse

rat

common ancestor

AMPK

common ancestor

homomouse

pig

??

Tissue section of skeletal muscle fiber from Hampshire pigs

Normal rn+/rn+ Mutant RN-/rn+ or RN-/RN-

AMPK

Tissue distribution of AMPK -chains

AMPK

3

2

1

1

2

21

AMP-activated kinase (AMPK) - a heterotrimeric enzyme

Hea

rt

Bra

in

Pla

cent

a

Lung

Live

r

Mus

cle

Kid

ney

Pan

crea

s

Spl

een

Thy

roid

gla

nd

Pro

stat

e

Tes

tis

Ova

ry

Sm

all i

ntes

tine

Col

on

Per

iphe

ral B

lood

A skeletal muscle-specific variant of AMPK

1

2

3

Modified from Shepherd et al. NEJM 1999

AMPK

AMPK

Pathways regulating glucosetransport in muscle cells

chr. 5 mouse

chr. 7 human

Link to patophysiology?

Pathway analysis!

Experimental validation

genetic mapping

AMP

AMP

AMP

AMPKK

P

ProteinPhosphatase

2C

AMPK

AMPK

Malonyl CoA Fatty acid

PAcetyl-CoA

Carboxylase

Malonyl-CoADecarboxylase

P

Acetyl-CoACarboxylase

Malonyl-CoADecarboxylase

inactive

active

Acetyl CoA

Increased glucose uptake

Decreased glycogen degradation

Increased amount of GLUT4

ProteinPhosphatase

2A

SusceptibleDA rat

ResistantE3 rat

Pristane induced arthritis in the rat

human (2.4 Mbp)

mou

se (

1 M

bp)

duplicated genomic segments

position ofmouse gene

Genomics data Expression data

integrate / analyse / visualise

Reconstruction of Pathway

NOVEL

Drug Target

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Database resources

EMBL

GenBank

DNAprotein

PIR

Swissprot

families

BlocksPfam

genomesbibliography

GDB

MGI

AceDB

MIPS

NCBINCBI EBIEBI

motifs

ProSite

PubMed

DDBJDDBJNational Center for Biotechnology Information

European Bioinformatics InstituteDNA Databank of Japan

04/18/23

Where do sequences come from?

mRNA

DNA

protein

cDNA sequence

•Directed / small-scale•Random / large-scale

• Expressed Sequence Tag [EST]

genomic sequence

•Directed / small-scale•Large-scale : BAC, YACs

protein sequence

•Directed, very little

Sequence databasesNucleotide databases

GenBank EMBLDDBJ

International Nucleotide Sequence Database

Collaboration

Sequence databasesPrimary vs secondary databases

• Primary database = sequence database

– eg EMBL, GenBank, SWISSPROT– Each record describes individual

sequence– Can be contain either nucleotide or

protein sequences

Seq 1

ACGTTT

Seq 2

CTAGAC

Seq 3

TTCTGA

Sequence databasesPrimary vs secondary databases

• Secondary database = pattern database

– eg PROSITE, PRINTS, BLOCKS, Pfam– Each record describes a set of

sequences– Set can be expressed as a motif,

multiple sequence alignment or probabilistic model

Pattern 1accagtgtacgactct

Pattern 2tacgtagctacctacctaggtagc

Pattern 3ttcgatgtcattcgatcgcatccgatcgtc


• How do the databases compare?– Three databases are 99.99% identical– Annotations can be slightly different

• How often are they updated?– New release of databases every 3 months– Interim releases - EMBL-new

• Can the annotations be trusted?– Not always - some estimates suggest 25% are incorrect

EST

Non-EST


• EMBL is subdivided into EST and non-EST sequences

hum

vrt

rod

mam

Sequence databasesProtein databases

GenBank EMBL

GenPept TrEMBL

PIR SWISSPROT

Sequence databasesProtein databases

EMBL

• 13,700,000 entries

• TrEMBL split into:– SP-TrEMBL - Sequences destined for SWISSPROT– REM-TrEMBL - Remaining sequences

REM

SP

SWISSPROT

• Sequences manually moved to SWISSPROT

• Because it is manually curated, annotations are reliable!

• 106,602 entries

TrEMBL • 558,150 entries

• Coding sequences automatically translated

Sequence databasesSummary

• EMBL is main nucleotide sequence database (Europe)• TrEMBL is an automated translation of EMBL• SWISSPROT is main curated protein database• Between main releases, interim releases are made

– eg EMBL-new, TrEMBL-new, SWISSPROT-new• EMBL is subdivided into EST / non-EST then by species• Annotations can be trusted in SWISSPROT, not in EMBL• Accession numbers uniquely identify a sequence and remain

constant when entries are updated

Basics of sequence searching Methods

Method Accuracy Duration Example

Rigorous +++++ +++++ Smith-Waterman

Heuristic ++ + BLAST, FASTA

Probabilistic ++++ +++ HMM

• Probabilistic methods are best, but can be slow and difficult to use

• Rigorous are good when used on a small subset of sequences, but too slow to search large sequence database

• Heuristic methods are the best place to start

Basics of sequence searchingTerminology

• Sensitivity vs Selectivity– Sensitivity searching will find weaker hits– Selectivity searching less likely to find unrelated hits– Increased sensitivity means more true positives– Increased selectivity means fewer false positives

Searching with BLASTHow it works

Find identical stretches of nucleotides in two sequences

Query sequence

Sequence in database

HSPExtend regions of similarity as far as possible

HSP 1 HSP 2

Identify all regions of similarity

Local vs global comparisonsThe nature of proteins

• Proteins consist of functional and structural units - domains

Local vs global comparisonsWhat is a local and global comparison?

Global comparison attempts to match all of one sequence against another

Local comparison attempts to match short stretches of one sequence with another

Local vs global comparisonsWhen should each technique be used?

• Global comparisons– Closely related sequences– Same general structure of sequence– Roughly equal lengths

• Local comparisons– Sequences not closely related– Sequence fragments– Interested in identifying common domains

Local vs global comparisons When should each technique be used?

Global comparison will attempt to match all of one sequence against another even when sequences share only one common domain Common

domain Non-matching

domains

Global comparison should only be used if the sequences being compared have a common domain structure

Common domain

Common domain

Domain uniqueto one sequence

Local vs global comparisonsSummary

• Proteins are organised into domains• Local comparisons find short stretches of similarity• Global comparisons match the whole length of one

sequence against another• Local comparisons should be used unless sequences

are closely related and have identical domain structures.

Searching with BLASTSearch with DNA or protein?

• Use DNA if– There are frameshifts - common in ESTs– Interested in evolution (3rd base in codon hidden in translation)

• Otherwise, use protein sequence. Why?– Two DNA sequences can be aligned in six ways– Each alignment can give scores, therefore more partial matches– Therefore there is more noise associated with comparison– Statistical significance of good hits are thus reduced.

Searching with FASTABLAST vs FASTA

• Advantages of BLAST– Faster than FASTA

– Reports all high-scoring local alignments

• Advantages of FASTA– More sensitive - approaches that of rigorous methods

– Faster than rigorous methods

– E-values are more accurate

– Better handling of frameshifts - important for ESTs.

Basics of sequence searchingSummary

• Sequence searching is complicated because we want to find partial matches

• Search method should be sensitive and selective • Rigorous methods are much more sensitive than

heuristic methods, but are too slow

Secondary databasesDatabases available - Prosite

• 1492 regular expressions• Each entry consists of two files

– Text file with information on family

– A regular expression and matching sequences

ID PROTEIN_KINASE_TYR; PATTERN.AC PS00109;

DT APR-1990 (CREATED); DEC-1992 (DATA UPDATE); JUL-1998 (INF UPDATE).

DE Tyrosine protein kinases specific active-site signature.

PA [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N-[LIVMFYC](3).

Secondary databasesDatabases available - Pfam

• Split into two sections– Pfam-A 3,071 HMMs (Curated)– Pfam-B 36,700 HMMs (Not curated)

• Each entry consists of description and alignmentID IL7AC PF01415DE Interleukin 7/9 familyAU Ponting CP, Schultz J, Bork PAL ClustalwBM hmmbuild HMM SEEDBM hmmcalibrate --seed 0 HMMDR PROSITE; PDOC00228;CC IL-7 is a cytokine that acts as a growth factor for earlyCC lymphoid cells of both B- and T-cell lineages. IL-9 is aCC multi-functional cytokine. IL7_BOVIN/28-172 DISGKDGGAYQNVLMVNIDD-LDNMINFDSNCLNNEPNFFKKHSCDDNKEASFLNRASRKIL7_HUMAN/28-173 DIEGKDGKQYESVLMVSIDQLLDSMKEIGSNCLNNEFNFFKRHICDANKEGMFLFRAARKIL7_MOUSE/28-152 HIKDKEGKAYESVLMISIDE-LDKMTGTDSNCPNNEPNFFRKHVCDDTKEAAFLNRAARK.

Secondary databasesDatabases available - InterPro

Biotechhuset modell

Biotechhuset Vy mot sydväst

Biotechhuset Annedal

http://www.arexis.com