cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ......

21
CBS* META CATA- LOG * CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS

Transcript of cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ......

Page 1: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

cbs*metacata-log * center for biological sequence analysis

Page 2: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

con-tents Center for BiologiCal SequenCe analySiS 2 CanCer SyStemS Biology 4

Cellular Signal integration 6

Comparative miCroBial genomiCS 8

Computational ChemiCal Biology 12

FunCtional human variation 14

immunologiCal BioinFormatiCS 16

integrative SyStemS Biology 18

metagenomiCS 22

moleCular evolution 24

regulatory genomiCS 26

SyStemS Biology oF immune regulation 28

Dtu multi-aSSay Core FaCility 30

aBout CBS 32

Page 3: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

2 3

number of computational methods and databases. These are made freely available for academic use and are also available on a commercial basis for industrial use. All the services are available for direct online use, and most are also available as software packages for download and as SOAP based web services.

partnerShipS CBS has strong partnerships both nationally and internationally. CBS is a part of the Department of Systems Biology at DTU, which re-presents the largest concentration of biotech research in Denmark. CBS works closely with the Novo Nordisk Foundation Center for Protein Re-search at the University of Copenhagen, and also with the Novo Nordisk Foundation Center for Biosustainability at DTU. The close relationship between the centers offers a wide range of opportunities for collaboration on projects and for exchange of staff.

Core FaCility CBS maintains an experimental DTU core facility for the high-throughput analysis of genomic-level events. The core facility is hosted by CBS, however, the core is a DTU wide facility contributing to the research of many other centers and departments at DTU as well as external cus-tomers from academia and industry across the country and abroad.

FunDing anD hiStory Based on bioinformatics efforts started in the late 1980s, the group was established formally as a center in 1993 by a large five-year grant from the Danish National Research Foundation. The original grant was extended for an additional five-year period, and since 2003 the Center for Biological Sequence Analysis has relied on an increasing number of grants from different sources as well as a substantial, permanent contri-bution from the Technical University of Denmark. Major funding agencies and foundations include (but are not limited to): The Danish Research Councils, EU, NIH, the Danish Center for Scientific Computing, the Villum Foundation, the Novo Nordisk Foundation and the Lundbeck Foundation.

Center for BiologiCal SequenCe analySiS The Center for Biological Sequence Analysis (CBS) at the Technical University of Denmark (DTU), Department of Systems Biology, conducts ba-sic research in the general fields of bioinformatics and systems biology. CBS represents one of the largest, multi-disciplinary basic research groups within bioinformatics and systems biology in academia in Europe. Research at CBS is conducted in 11 specialist research groups spanning the broad spectrum of research at the interface between computational biology and wet lab experimentation. CBS is highly active across the spectrum from molecular level disease systems biology, chemical biology, regulatory genomics, evolu-tionary analysis, and immunological bioinformatics to protein post-transla-tional modifications and sorting. Activities span all kingdoms of life and also their interaction and co-existence, for example in projects on the bacterial flora in the human gut as it is studied using metagenomics and metatran-scriptomics techniques. The center is also interfacing to the general area of medical informatics analyzing data from the healthcare sector, including text mining of electronic patient records and biobank questionnaires.

exploiting Data Comprehensive public databases of DNA and protein sequences, genotyping, macromolecular structure, gene and protein expression levels, protein-protein interactions, pathway organization and cell signaling, have been established to optimize scientific exploitation of the explosion of data within biology. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mecha-nisms of sequence information – not only in general terms but also at the level of human individuals. By exploiting available experimental data and evi-dence in the design of algorithms, sequence correlations and other features of biological significance can be inferred.

WeB ServiCeS A key activity at the Center for Biological Sequence Analysis is the maintenance and further expansion of our web services comprising a large

Page 4: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

4

Error free repair using sister chromatid

Mitosis Mitosis Mitosis

Non-homologous end joining,

radial formation

Cisplatin therapy

Failure to repair DNA damage

Cell death

Mitotic Recombination with homolog

Gain Loss Copy-neutral

CanCer SyStemS Biology group leaDer: proFeSSor Zoltan SZallaSi,[email protected]

The Cancer Systems Biology group is developing computational methods to analyze genome scale molecular profiles of cancer samples in order to improve therapeutic efficacy and find new therapeutic targets. We aim to understand certain key aspects of cancer biology by combining data from a wide variety of sources such as gene expression microarrays CGH arrays, high-throughput sequencing etc. Genome scale analysis of cancer helps researchers to expand their research focus from a few genes to a more complete view of the entire cancerous genetic network. The expression levels or DNA copy number of virtually all genes in a cell can be quantified simultaneously. This more complete analysis of human cancer cells will very likely hold the key to solving the difficulties associated with the fact that cancer cells comprise complex, robust and evolving systems. The group is focusing on several aspects of genome scale analysis of human cancer. Firstly, the group is developing bioinformatics methods that increase the accuracy of high-throughput measurements. Secondly, from the data sets of increased accuracy the aim is to extract quantita-tive measures of key biological processes in cancer. We are particularly in-terested in the various subtypes of genomic instability, their relative level in a given tumor and whether this could guide more effective therapeutic decisions. Thirdly, we are combining genome scale molecular profiling of chemotherapy resistant breast cancer cell lines with bioinformatics analy-sis in order to determine whether clinical response to chemotherapy in cancer can be predicted by gene expression signatures derived from cell lines. The most immediate outcome of this research is to develop tools that would predict with high accuracy which cancer patient will respond to a given chemotherapeutic agent.

Page 5: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

6 7

ERK Signaling (4)KSR, RAF, ERK,Erk7

Parkinson’s and other Disease Genes (7)Atg1, CG5483/Lrrk, CG6355/Fab1, CG9961, Doa, Pink1, Pk92B/Ask

Migration and Cytoskeleton (12)bent, Btk29A,FER, l(1)G0232,off-track, Ptp69D, Ptp99A, RET, sallimus, Spred,thickveins,Unc-89

PDK1

Gp150

CycA CycE RAFKsr

JAK

Ptp61FNipApolo

CycT

Shark

Cell Cycle Control and Cytokinesis (13)Cdc25B, Cdc25C,CDK2,CDK4, Cyclin A, Cyclin E, four-wheel drive, ial, Nipped-A, polo, Pp1-87B,Pitslre,Tao-1

Magi

Dlg1

JunFOS

JNKK

MLK

CG6355Fab1

Pka-C1

Smg1

RNA processing (3)Cyclin T, Smg1,mRNA capping enzyme

HippoWarts

GSK3

Akt Signaling (2)PDK1, GSK3

Novel (6)CG14714/PTP, CG1637, CG17746/PP2C, CG18854,CG31714, CG31711

Lipid Metabolism (5) CG3530,lazarous,Ipp, rdgA,wunen2

AMPK (3)alicorn, SAD, SNF1A

Metabolism (6)Argine kinase, CG5144, CG7362, CG8298, CG9961, Pgk

ERKCDK2

JNK

PucAKT

Canonical JNK Signaling (7)JNK, JNKK, mishappen, MLK, PAK, puckered, shark

Caki, Casein Kinase I alpha, discs large 1, Magi, ZO-1

Apicobasal Polarity (5)

JAK-STAT (3)Gp150,PTP61F, JAK

Pka-C3

Hippo Signaling (2)

AKTCDK2ERKGSK3HippoJNKPka-C1

Phosphorylation Sites

Cellular Signal integration group leaDer: Dr. rune linDing,[email protected]

The Cellular Signal Integration Group conducts computational and quantitative biology with the aim of understanding biological signal-ing systems. We explore biological systems by developing and deploying computational algorithms aimed to predict cell behavior similar to weather forecasts or aircraft models. Cellular signaling networks are the founda-tion of cell fate and behavior and their aberrant activity is a key mechanism underlying the pathological behavior of cells, such as during tumor devel-opment. However, signaling networks are highly complex, involving a vast ensemble of dynamic interactions that flux in space and time. To understand how normal signal integration and aberrant cell decisions arise requires a global view of cell signaling networks. We and others have demonstrated that predictive insight into a biological system can be obtained through a combination of experimental and computational exploration. Thus, a major aim of our group is to develop computational tools (such as our flagship algorithms networKIn) and to deploy these on quantitative mass-spectrometry, genetic and phenotypic data to under-stand at a systems level principles underlying the spatiotemporal assembly of mammalian signaling networks and how they process information (mo-lecular logic) in order to alter cell behavior (cellular logic). We have previ-ously deployed these algorithms on quantitative proteomics data to model JNK regulation DNA damage response, stem-cell differentiation, cell-cell communication, signaling evolution and to compare model organisms. A major current activity is the search for signaling networks that drive regulatory diseases such as cancer, diabetes, and neurological disor-ders. We hypothesize that these networks are powerful therapeutic targets, and we are motivated by the opportunity to perform systems level targeting of complex human diseases while at the same time gaining insight into the fundamental principles behind cellular information processing and deci-sion making.

Page 6: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

9

Comparative miCroBial genomiCSgroup leaDer: aSSoCiate proFeSSor DaviD Wayne uSSery,[email protected]

Several thousand bacterial genome sequences are available in the public databases and more genomes are being sequenced on a daily basis. The Comparative Microbial Genomics (CMG) group explores relationships between large numbers of sequenced bacterial genomes, with emphasis on minimal genomes and genome-based classification methods. The approach is “genome-centric” in that the DNA sequence is used as the starting material, for example to predict DNA structures, which can in turn be indicators of useful biology, such as localization of a promoter based on DNA curvature and melting profiles, and the prediction of poten-tial genomic island regions. Further, we are developing pipelines to go from genomic sequences to predicted RNAs and proteins in a standardized and consistent manner across all families of bacteria. Soon it will be routine to sequence a hundred or more bacterial genomes a week. This will gener-ate very large amounts of data and that is something we keep in mind when developing pipelines. The two major focus areas of the group are: 1) minimal genomes – that is, what set of gene families are common to all life; 2) using genomic sequences (and the core/pan-genomes) to predict taxonomic relationships, both at the broad level, such as phyla, or genus/species and also for strain typing and classification. Handling and maintaining this large amount of data for thousands of organisms sequenced requires a structured database system. For this purpose, the GenomeAtlas database was developed in 1999 and has been updated on a regular basis since then. A recent addition to the atlas pages is the zoomable atlases. The CMG group has published more than 100 papers since 2000, and the popular DTU course on Comparative Microbial Genomics has been run-ning for more than 10 years, and one-week workshops based on this course are held in North and South America, Europe, Asia, and Africa.

H)

G)

F)E)

D)

C)

B)

A)

0k

125k

250k

375k

500k

625k

750k

875k

Borrelia burgdorferi

strain ZS7 main chromosome (linear) 906,707 bp

Center for Biological Sequence Analysis

http://www.cbs.dtu.dk/

BASE ATLAS

A) G Content

dev

avg

0.04

0.24

B) A Content

dev

avg

0.20

0.51

C) T Content

dev

avg

0.21

0.51

D) C Content

dev

avg

0.04

0.25

E) Annotations: C

DS +

CDS -

rRNA

tRNA

F) AT Skew

fixavg

-0.25

0.25

G) GC Skew

fixavg

-0.10

0.10

H) Percent AT

fixavg

0.20

0.80

Resolution: 363

Page 7: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

0M 0.5M

1M1.5M

2M

2.5M3M3.5M

4M4 .

5M

5M

Terminus

Origin

Escherichia coli O157:H7 strain EC4115

Page 8: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

13

Computational ChemiCal Biologygroup CoorDinatorS: aSSoCiate proFeSSor irene kouSkoumvekaki, [email protected], anD aSSoCiate proFeSSor olivier taBoureau,[email protected]

Common human diseases result from the interplay of many genes and environmental factors. A drug – or a xenobiotic agent in general – rarely acts on one target alone but rather follows a complex interaction pattern with a number of proteins. This multi-targeting can be beneficial in drug discovery (polyphar-macology), but it may also lead to disturbance of the metabolism, adverse effects or toxicity effects. Designing safe and effective medicines require an in-depth understanding of their phenotypic effect on the human body. Although many questions still remain, systems biology has already led to great advances in our understanding of the biology of complex diseases and drug targets. Thus, the field has now reached a state that allows the integration of large-scale data across multiple levels of information to decipher the effect of small molecules on biological systems. Such efforts are gradually emerging under the name of “systems chemical biology”, with the aim to unravel the genetic background of diseases and to provide the best possible medicinal treatment. The Computational Chemical Biology (CCB) group has evolved from the former group of Chemoinformatics, with interest in exploring the interplay between small molecules and biological systems using computational methods. The group holds expertise in molecular modeling, machine learning methods (QSAR, Neural Networks, Support Vector Machines), chemogenomics, and sys-tems chemical biology. The overall aim of the research focus of CCB is to eluci-date the biological profile of small molecules regarding their targets, off-targets, phenotypic outcome, and side effects through an in-depth understanding of the biological networks that lead to disease. Research topics in the group lie in the areas of drug targets for CNS and metabolic diseases, natural compounds with medicinal properties, ADME-Tox and pharmacogenetics, biological networks for systems pharmacology and clinical effects. The data warehouse of CCB includes around 600,000 bioactive drugs and drug-like molecules with biologically related annotations and more than 20 mil-lion small molecules with structural information. The group also provides online services for the prediction of pH-dependent aqueous solubility of drug like small molecules, pHsol, and chemProt, an online resource of annotated and predicted chemical-protein interactions.

Page 9: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

14

funCtional human variation group leaDer: aSSoCiate proFeSSor ramneek gupta,[email protected]

Advances in genomic sequencing, both in speed and lower cost, have finally created an opportunity to understand the human genome at the individual level and not just as a composite reference. Variation between individuals can be at the single nucleotide level, stretches of nucleotides or in copy number of sections of the genome. Current esti-mates indicate that an individual differs by three million single nucleotide polymorphisms (SNPs) from the commonly accepted human reference. Additional sources of variation are alternative splicing of exons within a gene, post-translational modifications (PTMs) on proteins and last but certainly not least, the microbiota within the human being that contributes an important diet-related role. The mission of the Functional Human Variation group is to under-stand how this variation translates into function. What individual SNPs influence susceptibility to disease and sensitivity to drug response? Why do some children with leukemia respond well to chemotherapy, but others develop toxic side effects or relapse a few years later? Which mutations in a cancer are drivers and which are passengers? Which nucleotide changes influence protein structural changes or changes in PTMs? The group works closely with CBS’s Metagenomics group and collaborates with external clinical groups on diseases such as leukemia, breast cancer, male infertility, and childhood asthma. Genotype-phenotype database infrastructure development has been used for these clinical phe-notypes as well as non-clinical phenotypes on collaborations with ancient DNA studies. In partnership with DTU’s Multi-Assay Core Facility, the group has developed multiplexed targeted sequencing strategies for low-cost and high-throughput custom genotyping. The group also provides several online machine learning based prediction methods for signal peptides, protein structural features, and post-translational modifications.

Page 10: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

17

immunologiCal BioinformatiCSgroup leaDer: Deputy Center DireCtor, proFeSSor ole lunD,[email protected]

The immune system normally does a good job keeping us free from diseases, but sometimes it fails. One approach towards understand-ing why this happens is to produce advanced simulation models of the immune system and to understand the relationship between hosts and pathogens in this manner. Depending on the complexity of these models and the input given they can be used to simulate what happens when a host gets infected by a pathogen, thereby predicting the co-evolvement of pathogens and immune systems. One aim of the modeling is to identify the parts of proteins known as epitopes, which are recognized by the im-mune system, thereby inducing a protective response. This knowledge is very valuable for the development of better vaccines and provides impor-tant insights into the nature of cancer, allergy, and autoimmune diseases. The Immunological Bioinformatics Group is developing new tech-nologies for epitope discovery that can aid in the search for new vaccines and therapies for HIV, malaria, and tuberculosis, as well as for diseases such as influenza and pox, which may evolve to be a threat naturally or intentionally through bioterrorism. The group has built a simulation model of the human immune system and has constructed a database with all human pathogens. Using this database and a database of the human ge-nome, the group is working on using the prediction methods to simulate the co-evolvement of pathogens and immune systems and in particular to identify epitopes from the different arms of immune systems. In most of the projects the predicted epitopes are being validated through experi-mental collaborations with partners doing wet lab research. The group has developed methods for the three main types of epitopes: B cell epitopes which are used to recognize microorganisms outside cells; Helper T lymphocyte epitopes which are used to activate cells that have taken up foreign substances; and cytotoxic T lymphocyte epitopes, which are used to detect and kill infected cells.

Page 11: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

18

I11

M10

M14

G62

R27

G93

E51

H55 T9

0E

78 I25

G91 I61

R41 F0

4F1

8F6

2F0

6F0

7F0

1I6

4I6

9F5

0N

91 I44

I45

I48

I49

R32

E87

R00 L2

1L4

0L2

0L3

0T1

4T6

3M

06M

11M

17M

19E

11R

73E

14E

10E

16E

86H

10H

26H

40R

30N

30R

31A

41B

37R

39 L89

R33 I9

5R

61D

50E

61 F98

R11

B23

B24 F1

1F1

4F1

5K

73K

76R

17 F19

B15

B17

B18

K77 J40

J42

J45

J44

J18

J98

R09

R35 T7

3G

24R

25 I46

R02 I5

0R

18R

60A

46B

95K

59R

10R

50A

49B

34

B34 VIRAL INFECTION OF UNSPECIFIED SITEA49 BACTERIAL INFECTION OF UNSPECIFIED SITER50 FEVER OF OTHER AND UNKNOWN ORIGINR10 ABDOMINAL AND PELVIC PAINK59 OTHER FUNCTIONAL INTESTINAL DISORDERSB95 STREPTOCOCCUS AND STAPHYLOCOCCUS AS THE CAUSE OF DISEASES CLASSIFIED TO OTHER CHAPTERSA46 ERYSIPELASR60 OEDEMA, NOT ELSEWHERE CLASSIFIEDR18 ASCITESI50 HEART FAILURER02 GANGRENE, NOT ELSEWHERE CLASSIFIEDI46 CARDIAC ARRESTR25 ABNORMAL INVOLUNTARY MOVEMENTSG24 DYSTONIAT73 EFFECTS OF OTHER DEPRIVATIONR35 POLYURIAR09 OTHER SYMPTOMS AND SIGNS INVOLVING THE CIRCULATORY AND RESPIRATORY SYSTEMSJ98 OTHER RESPIRATORY DISORDERSJ18 PNEUMONIA, ORGANISM UNSPECIFIEDJ44 OTHER CHRONIC OBSTRUCTIVE PULMONARY DISEASEJ45 ASTHMAJ42 UNSPECIFIED CHRONIC BRONCHITISJ40 BRONCHITIS, NOT SPECIFIED AS ACUTE OR CHRONICK77 LIVER DISORDERS IN DISEASES CLASSIFIED ELSEWHEREB18 CHRONIC VIRAL HEPATITISB17 OTHER ACUTE VIRAL HEPATITISB15 ACUTE HEPATITIS AF19 MENTAL AND BEHAVIOURAL DISORDERS DUE TO MULTIPLE DRUG USE AND USE OF OTHER PSYCHOACTIVE SUBSTANCESR17 UNSPECIFIED JAUNDICEK76 OTHER DISEASES OF LIVERK73 CHRONIC HEPATITIS, NOT ELSEWHERE CLASSIFIEDF15 MENTAL AND BEHAVIOURAL DISORDERS DUE TO USE OF OTHER STIMULANTS, INCLUDING CAFFEINEF14 MENTAL AND BEHAVIOURAL DISORDERS DUE TO USE OF COCAINEF11 MENTAL AND BEHAVIOURAL DISORDERS DUE TO USE OF OPIOIDSB24 UNSPECIFIED HUMAN IMMUNODEFICIENCY VIRUS [HIV] DISEASEB23 HUMAN IMMUNODEFICIENCY VIRUS [HIV] DISEASE RESULTING IN OTHER CONDITIONSR11 NAUSEA AND VOMITINGF98 OTHER BEHAVIOURAL AND EMOTIONAL DISORDERS WITH ONSET USUALLY OCCURRING IN CHILDHOOD AND ADOLESCENCEE61 DEFICIENCY OF OTHER NUTRIENT ELEMENTSD50 IRON DEFICIENCY ANAEMIAR61 HYPERHIDROSISI95 HYPOTENSIONR33 RETENTION OF URINEL89 DECUBITUS ULCERR39 OTHER SYMPTOMS AND SIGNS INVOLVING THE URINARY SYSTEMB37 CANDIDIASISA41 OTHER SEPTICAEMIAR31 UNSPECIFIED HAEMATURIAN30 CYSTITISR30 PAIN ASSOCIATED WITH MICTURITIONH40 GLAUCOMAH26 OTHER CATARACTH10 CONJUNCTIVITISE86 VOLUME DEPLETIONE16 OTHER DISORDERS OF PANCREATIC INTERNAL SECRETIONE10 INSULIN−DEPENDENT DIABETES MELLITUSE14 UNSPECIFIED DIABETES MELLITUSR73 ELEVATED BLOOD GLUCOSE LEVELE11 NON−INSULIN−DEPENDENT DIABETES MELLITUSM19 OTHER ARTHROSISM17 GONARTHROSIS [ARTHROSIS OF KNEE]M11 OTHER CRYSTAL ARTHROPATHIESM06 OTHER RHEUMATOID ARTHRITIST63 TOXIC EFFECT OF CONTACT WITH VENOMOUS ANIMALST14 INJURY OF UNSPECIFIED BODY REGIONL30 OTHER DERMATITISL20 ATOPIC DERMATITISL40 PSORIASISL21 SEBORRHOEIC DERMATITISR00 ABNORMALITIES OF HEART BEATE87 OTHER DISORDERS OF FLUID, ELECTROLYTE AND ACID−BASE BALANCER32 UNSPECIFIED URINARY INCONTINENCEI49 OTHER CARDIAC ARRHYTHMIASI48 ATRIAL FIBRILLATION AND FLUTTERI45 OTHER CONDUCTION DISORDERSI44 ATRIOVENTRICULAR AND LEFT BUNDLE−BRANCH BLOCKN91 ABSENT, SCANTY AND RARE MENSTRUATIONF50 EATING DISORDERSI69 SEQUELAE OF CEREBROVASCULAR DISEASEI64 STROKE, NOT SPECIFIED AS HAEMORRHAGE OR INFARCTIONF01 VASCULAR DEMENTIAF07 PERSONALITY AND BEHAVIOURAL DISORDERS DUE TO BRAIN DISEASE, DAMAGE AND DYSFUNCTIONF06 OTHER MENTAL DISORDERS DUE TO BRAIN DAMAGE AND DYSFUNCTION AND TO PHYSICAL DISEASEF62 ENDURING PERSONALITY CHANGES, NOT ATTRIBUTABLE TO BRAIN DAMAGE AND DISEASEF18 MENTAL AND BEHAVIOURAL DISORDERS DUE TO USE OF VOLATILE SOLVENTSF04 ORGANIC AMNESIC SYNDROME, NOT INDUCED BY ALCOHOL AND OTHER PSYCHOACTIVE SUBSTANCESR41 OTHER SYMPTOMS AND SIGNS INVOLVING COGNITIVE FUNCTIONS AND AWARENESSI61 INTRACEREBRAL HAEMORRHAGEG91 HYDROCEPHALUSI25 CHRONIC ISCHAEMIC HEART DISEASEE78 DISORDERS OF LIPOPROTEIN METABOLISM AND OTHER LIPIDAEMIAST90 SEQUELAE OF INJURIES OF HEADH55 NYSTAGMUS AND OTHER IRREGULAR EYE MOVEMENTSE51 THIAMINE DEFICIENCYG93 OTHER DISORDERS OF BRAINR27 OTHER LACK OF COORDINATIONG62 OTHER POLYNEUROPATHIESM14 ARTHROPATHIES IN OTHER DISEASES CLASSIFIED ELSEWHEREM10 GOUT

I11 HYPERTENSIVE HEART DISEASE

−3 −1 0 1 3

Mental and behavioural disorders

Brain disorders

Dermatitis

Heart disorders

Arthritis

Diabetes

Urinary disor-ders

Drug abuseLiver diseaseHIV

Respiratory disorders

integrative SyStemS Biology group leaDer: Center DireCtor, proFeSSor Søren Brunak,[email protected]

The study of life at the cellular and molecular level has brought about insight and change beyond anyone’s imagination over the last thirty years. Until quite recently, this type of research has been carried out in a reductionist way in which a few components were studied at a time. Howev-er, the latest breakthroughs and advances in nano- and biotechnology have created new possibilities for cataloging and linking hundreds and thousands of biomolecules simultaneously and have thereby paved the way for a new systems-scale view of living cells and organisms. The new field is called systems biology. The Integrative Systems Biology Group is at the leading edge of these developments, focusing mainly on understanding how intracellular networks of genes, proteins, metabolites and other small molecules regu-late cellular behavior and how perturbations to these regulatory systems may lead to disease for the individual. Our research strategies typically rely on integration of massive amounts of experimental data rather than just on theoretical modeling. Pathways and protein complexes are key levels of analysis helping to un-derstand how genetic changes in many different molecular components lead to the same or similar phenotypes. The ongoing effort in the group is to construct such models that will aid the identification of new disease genes and in uncovering the mechanisms behind complex, multifactoral diseases. The group also works on combining molecular level systems biology data with medical informatics data from the healthcare sector, such as for example electronic patient records and biobank questionnaires. The aim is to combine and stratify patients not only from their genotypes, but also phe-notypically based on the clinical descriptions in the medical records which describe disease histories in detail.

Page 12: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

1% 10% 25%

VI

VII

X

VIII

IX

V

III

IIIXXI

XIX IV

XX

XIII

XVII

XIV

XIIXI

XVIII

XIII XXI

XIV

VIII

XII

XV

IX

V

III

VII

VI

IV

XXI

XVI

XVIII

II

XVII

XX

XIX

I

1% 10% 25%

VI

VII

X

VIII

IX

V

III

IIIXXI

XIX IV

XX

XIII

XVII

XIV

XIIXI

XVIII

XIII XXI

XIV

VIII

XII

XV

IX

V

III

VII

VI

IV

XXI

XVI

XVIII

II

XVII

XX

XIX

I

Page 13: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

22

Thermoanaerobacter sp. X514

Lactobacillus paracasei

Holdemania filiformis

Clostridium methylpentosum

Eubacterium dolichum

Ruminococcus gnavus

Streptococcus infantarius

Clostridium bolteae

Anaerostipes caccae

Clostridium phytofermentansFinegoldia magna

Clostridium difficile

7

Clostridium sp. 7_2_43FAA

Faecalibacterium prausnitzii

Anaerococcus hydrogenalis

2

19

30

Butyrivibrio fibrisolvens

11

29

14

21

8

24

31

27

28

3

4

25

5

12

1713

Desulfovibrio vulgaris

Desulfovibrio piger

Anaerofustis stercorihominis

Porphyromonas gingivalis

Lactobacillus delbrueckii

Blautia hydrogenotrophicaFusobacterium nucleatum

15

16

9

18

10

34

22

26

35

33

32

20

Bryantella formatexigens

Subdoligranulum variabile

Bacteroides capillosus

Clostridium hylemonaeCandidatus Sulcia muelleri

6

Klebsiella pneumoniae

1

Megamonas hypermegale

23

Proteus mirabilis

Mollicutes bacterium D7

Clostridium nexile

Clostridium symbiosumClostridium ramosum

Clostridium leptum

Clostridium asparagiformeLeuconostoc mesenteroides

Bacteroides sp. 9_1_42FAA

Tropheryma whipplei

Proteus penneri

Bacteroides doreiBacteroides sp. D4

Bacteroides ovatus

Bacteroides sp. 4_3_47FAA

Lactobacillus acidophilus

Enterococcus faecalis

Collinsella aerofaciens

Staphylococcus saprophyticus

Bifidobacterium adolescentisBifidobacterium breve

Bifidobacterium bifidum

Clostridium sp

Bifidobacterium longum

Bacteroides sp. 2_2_4

Collinsella stercorisBacteroides sp. D1

Bacteroides xylanisolvens

Bifidobacterium gallicum

Gordonibacter pamelaeaeBifidobacterium dentium

Bifidobacterium pseudocatenulatum

Bifidobacterium animalis

Anaerotruncus colihominis

Dorea longicatena

Ruminococcus sp.

Eubacterium hallii

Ruminococcus obeum

Coprococcus comes

Eubacterium ventriosum

Bifidobacterium catenulatum

Dorea formicigenerans

Mitsuokella multacida

Bacteroides fragilis

Clostridium bartlettiiLactococcus lactis

Lactobacillus casei

Bacteroides vulgatus

Bacteroides uniformis

metagenomiCSgroup leaDer: aSSoCiate proFeSSor thomaS SiCheritZ-pontén,[email protected]

Less than one percent of the known microorganisms can be grown and studied in the lab. With the advent of low-cost high-throughput DNA sequencing technologies it is possible to explore the genetic material from entire microbial communities directly without the need of culturing the organisms under study. This emerging field, known as metagenom-ics, utilizes a huge genetic reservoir of non-culturable organisms as a resource for biotechnological and medical products and processes. The Metagenomics group collects different samples from all over the world and develops new tools that will address many of the unique challenges of metagenomics data sets. The majority of our projects are based on at least one of the follow-ing three simple questions: Who is in there? What are they doing? How are they doing it? Our major focus is on metatranscriptomics of the human micro-biome – correlating changes in the human microbiome with changes in human health – where we are involved in the EU FP7 project MetaHIT on the metagenomics of the human intestinal tract. We are working in close collaboration with the Molecular and Cellular Evolution of Microbial Eu-karyotes group in Newcastle, studying the impact of lateral gene transfer on prokaryotic genome evolution and eukaryotic parasites. To facilitate the modeling of metabolic and signaling pathways in both prokaryotic and eukaryotic organisms, we are developing new protein features based pipelines based on different machine learning methods. Together with the Funtional Human Variation group at CBS, we try to understand the correlation between present day human genetic makeup and human associated bacteria by literally going back in time and study-ing ancient genomes. This work is carried out in collaboration with the GeoGenetic Center in Copenhagen. The sequencing of complete nuclear genomes of ancient human remains (mummies, skulls, hair tufts from permafrost) and environmental metagenomes from different evolutionary time periods lets us identify the most important milestones during human evolution with respect to changes in the human-microbe interplay. Even-tually this may give us insight into present day increased risk of metabolic disorders and other digestive system related diseases.

Page 14: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

25

moleCular evolutiongroup leaDer: proFeSSor mSo anDerS gorm peDerSen,[email protected]

Evolutionary theory is the conceptual foundation of the life sci-ences. The famous geneticist Theodosius Dobzhansky expressed this very well when he said, “Nothing in biology makes sense, except in the light of evolution”. In the post-genomic era this insight is more relevant than ever, and only by taking the theory of evolution into account is it possible to get a handle on organizing and analyzing the massive amount of biological data now available. In the Molecular Evolution Group we are interested in applying phylogenetic methods to analyze specific biological systems, but also in using the flood of sequence data to learn about the evolutionary process itself. A focus of much of the research done in the Molecular Evolution Group is the evolution of pathogenic organisms such as HIV, influenza, and the malaria parasite Plasmodium falciparum. In particular, we are interested in how knowledge of the evolutionary processes that occur during infection and transmission of a pathogen can be used as the basis for deriving strategies for fighting the disease. As an example, patterns of sequence variation across a population of influenza viruses close to the time when the virus has jumped from, say, an avian to a human host, will contain information about the selective pressures acting at that point. This knowledge can be directly useful for monitoring when new species jumps are imminent but may also form the basis for designing interventions aimed at hindering such jumps. Our work on pathogen evolution is done in close collaboration with experimental groups at a number of universi-ties and hospitals. Other projects in the Molecular Evolution Group include investiga-tions into the evolution of resistance to antibiotics, evolution and origin of introns, de novo evolution of genes, and evolution of evolvability (the ability of biological systems to evolve). Generally, we are interested in all aspects of evolution, and while we are very interested in developing and applying state-of-the-art computational tools in our work (especially in the frame-work of probabilistic model selection and multimodel inference), the focus is always on analyzing problems that are interesting from a biological point of view.

Page 15: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

26 27

regulatory genomiCSgroup leaDer: aSSoCiate proFeSSor ChriStopher Workman,[email protected]

Much of the relationship between genotype and phenotype is determined by gene regulation. The components of signaling and tran-scriptional regulatory pathways act together in gene regulatory networks (GRNs) to control which genes are expressed, at which times and under which conditions. GRNs are very adaptable within a given individual cell or organism and evolve rapidly to provide large phenotypic diversity over short evolutionary time periods. These networks play critical roles in response to environmental stress, control of normal cell division and proliferation, and when not functioning properly, they contribute to disease progression. Efforts to understand the regulatory networks in control of stress responses are of particular interest to the Regulatory Genomics Group. In particular, the responses to oxidative and DNA damage stress are being investigated. Although both of these stresses are present at low levels in healthy cells, a measured and rapid stress response must be mounted to ensure survival to acute exposure to these stresses. By perturbing specific regulatory genes, e.g. gene knock-outs or knock-downs, we can observe differences in the regulation of the response using genome-wide mRNA or protein profiles using DNA microarrays, RNA-Seq or mass spectrometry based proteomics analyses. The combination of environmental and genetic perturbations allows us to interrogate GRNs that control these vital cel-lular responses and provides powerful evidence for the functional role of known and novel regulatory genes.

Page 16: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

28 29

AktPI3K

Raf

MEK

Ras

ERK

IL-12

p38 MAPK

NF-κB

IL-23

JNK

TLR4

TLR2

c-Fos

1)

1) 1)

1)

1)

1)

2,8)

3)

4)

5)

1)

IKKi

DAI

dsDNA

(ZBP-1/DLM-1)

NF-κB

IRF3

6, 7)

6, 7)

6, 7)

6, 7)

6, 7)

IL-1b

8)

IRF19)

IRF3

9)

9)

9)

TRIF

MyD88

TLR49)

IRF79)

IFNAR

9)

type 1 IFNs9)

NOD2

9)

TLR4

Mtb

8) 8)

p38 MAPK

JNK

10)

10)

c-Fos

AP1

10)

10) IL-10

11)

TLR2

NF-κB11)

11)

DC-SIGN

Raf-1

11)

11)

TLR2+

SyStemS Biology of immune regulationgroup leaDer: aSSoCiate proFeSSor larS hellgren,[email protected]

The Systems Biology of Immune Regulation group studies the role of a deregulation of the innate immune response in initiation of life-style related diseases. Redundant local inflammation due to inappropriate acti-vation of the innate immune system plays a pivotal role in the development of many diseases of affluence, such as coronary heart disease, type 2 diabetes and atopic asthma. The unwanted inflammatory response is also manifested as a low-grade systemic inflammation. In the healthy state, tissue homeostasis is ensured by a tightly controlled balance between pro and anti-inflammatory phenotypes in the tissue-resident immune-cells, which is lost in the initiation of these diseases. We study how the development of inflammatory responses in specific tissues (e.g. adipose and lung) is dependent on the changes in the functional response pattern of the tissue-resident leukocytes and its modulation by dietary components and the mucosal microbiome. Our specific expertise is how dietary fatty acids and fibers interact in determin-ing the functional immunological response in the tissue through its effects on immune and tissue cell phenotypes as well as the composition of the mucusal microbiome. We have developed a unique multiplex approach for functional characterization of different immune cell phenotypes on a single cell basis using flow cytometry, which we combine with different omics technolo-gies as well as traditional cell biological and biochemical methods. We apply these methods to integrated studies of primary immune and tissue cell cultures, disease animal models and translational cohort studies in humans aiming at developing systems biology models describing the se-quence of events leading to dysregulation of tissue-specific innate immune responses for use in the development of new therapeutic strategies.

Page 17: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

31

Dtu multi-aSSay Core faCility (DmaC)manager: Senior reSearCher laurent gautier,[email protected]

Research projects that use a systems biology approach require comprehensive high-throughput analyses that characterize many or all of the components of the system being studied. The set of tools and techniques for analysing DNA and RNA is in constant evolution, involv-ing various trade-offs between high-volume of data, precision and cost. qPCR, microarrays and next-generation sequencing supplemented by flow cytometry are important tools for elucidating biological questions at the systems level offered by DMAC. The core facility for high-throughput analysis of genomic-level events was established in 2008 at DTU with a substantial grant from the Danish Research Councils and DTU. The core facility is hosted by the Center for Biological Sequence Analysis, but is a DTU wide facility contri-buting to the research of many other centers and departments at DTU as well as external customers from academia and industry. The facility provides access to the whole chain that generates knowledge and insight on biological data from genomic or cellular level measurements. Data generation and analysis are technologies, and the core facility is a one-stop shop for getting state-of-the-art work done. Thanks to its unique experience on a range of assays, the DMAC facility can provide advice on the optimal assays, experimental design and sub-sequent data analysis for each individual project. The aim is to provide the customers with the best in technology and data analysis schemes, while abstracting as many of the technical details as possible.

Page 18: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

32 33

site, with the same functionality. Ready-to-ship packages exist for the most common UNIX platforms. In addition, for many servers, programmatic access is provided in the form of SOAP-based webservices.

teaChing CBS has a strong, innovative teaching component with more than 20 highly popular courses in the areas of Bioinformatics, Systems Biology, Hu-man Health, Microbiology and Nutrigenomics – representing important dis-ciplines in the BSc and MSc educations under the Department of Systems Biology. In addition, CBS teaches several highly recognized PhD courses in bioinformatics and systems biology. The teaching strategies at CBS are constantly allowed to evolve in order to facilitate the learning process for all student categories. Since 2005 CBS has offered a variety of courses using real-time internet-transmitted teaching – not only in the form of lectures but also as online teacher-student(s) discussion sessions. Other untraditional teaching actions such as supplementary video modules directed at certain students with a particular need or desire for additional knowledge and in the form of podcasts are also made available from CBS.

external FunDing External funding plays an important role in the development and expansion of CBS. A strong focus on external funding does not only allow for growth, it also facilitates collaboration and partnerships with many new partners. External funding accounts for the majority of CBS’ budget; and funding sources include large institutional donors such as the Danish Na-tional Research Councils, EU and NIH, as well as numerous private Danish foundations, including but not limited to the Villum Foundation, the Novo Nordisk Foundation and the Lundbeck Foundation.

Computing inFraStruCture A strong compute, storage and database infrastructure has been de-veloped since the start of the center in 1993. The compute resources consist of four shared memory servers (SGI Altix, the largest with 8 TB RAM), two Linux clusters (SGI Altix ICE) and a number of dedicated smaller servers.

aBout CBS

puBliCation proFile CBS has a strong publication profile with more than 750 peer-re-viewed papers, many in high impact journals as well as many with very high citation levels. The publication list includes more than a dozen Science and Nature papers. Since 2007, the average, annual publication rate has been around 70 papers in journals with review. In addition to scientific papers, the CBS staff has authored or co-authored many text books and edited proceed-ings. In 2010, CBS contributed to two cover-story Nature papers: A human gut microbial gene catalogue established by metagenomic sequencing, Nature, 464:132-136, and, Ancient Human Genome Sequence of an Extinct Palaeo-Eskimo, Nature 463:757-762.

Citation proFile The papers from CBS are widely cited and receive currently more than 5,000 citations per year. The most cited CBS publication has more than 4,000 citations; Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites, H. Nielsen, J. Engelbrecht, S. Brunak and G. von Heijne, Protein Eng., 10:1-6, 1997, describing a method for prediction of signal peptides in prokaryotic and eukaryotic proteins. Four CBS papers have been included in the ISI Red Hot list for the most cited papers in biol-ogy in specific periods, and altogether 10 CBS papers have now more than 500 citations. Except for a single year, CBS has in the 1997- 2010 period each year produced a paper, which is among the ten most cited papers out of the approximately 10,000 papers each year co-authored by Danish scientists.

online ServiCeS CBS has established a highly popular service component. More than 50 servers are available as interactive input forms and access to all the ser-vers is free and unlimited for all academic users, while they are also avail-able on a commercial basis for industrial use. Most of the servers are also available as stand-alone software packages to install and run at the user’s

Page 19: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

34

The data storage exceeds 500 TB, with different types of media ranging from JBODs to close-to-compute fast fiber channel disks. The installation makes use of a 10Gbit/s LAN for efficient communication between the compute and the storage. The center maintains a data warehouse with more than 300 pub-lic databases particularly useful in the context of data integration within sys-tems biology. Furthermore, a very large collection of bioinformatics software is installed. The Danish Center for Scientific Computing, the Villum Founda-tion and the Novo Nordisk Foundation are among the principal funders of the hardware located at CBS.

aCaDemiC environment At any given time around 20 different nationalities from all over the world and many different scientific backgrounds are represented at CBS, making CBS a highly multi-disciplinary center with a strong international profile. Together, the CBS group has key competences within the fields of molecular biology, biochemistry, chemical engineering, chemistry, computer science, mathematics, medicine, pharmacology, physics and statistics. CBS is part of one of the largest departments at the Technical University of Denmark, the Department of Systems Biology, which represents the largest concentra-tion of biotechnological research in Denmark. Through numerous collabora-tive grants CBS also has strong ties to universities, hospitals and industry both nationally and internationally. CBS’ director, Søren Brunak is also affili-ated as professor and founding partner of the Novo Nordisk Foundation Center for Protein Research at the University of Copenhagen, as well as the Novo Nordisk Foundation Center for Biosustainability at DTU, which allows for syn-ergy effects between the centers, close collaboration on projects and allowing PhD students to alternate between the different research environments.

management CBS is a research center at the Department of Systems Biology, DTU. Since 1993 CBS has been led by Professor Søren Brunak. Professor Ole Lund acts as the deputy director. Monthly meetings between the group leaders, the secretariat and the systems administration support the overall management of CBS. On a day to day basis all practicalities pertaining to administrative support, finance, human resource and project coordination are managed by the secretariat in collaboration with the department staff.

Graphic Design: Sofie Meedom and Helle DaugbjergPrint: Narayana PressEditing: Louise Juul Hansen

© Center for Biological Sequence Analysis, Kgs. Lyngby 2011. All rights reserved.

Page 20: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when

www.cbs.dtu.dK

Center for Biological Sequence AnalysisDepartment of Systems BiologyTechnical University of DenmarkKemitorvet, Building 208DK-2800 Kgs. LyngbyDenmark

+45 45 25 24 77 (telephone)+45 45 93 15 85 (fax)[email protected]

Page 21: cbs* meta cata- · cbs* meta cata-log * center for ... Migration and Cytoskeleton (12) bent, ... ate very large amounts of data and that is something we keep in mind when