Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division...

39
Experimental Approaches in Nutrigenetics #2: Studying the Effects of Phytochemicals on Human Health and Disease Cory Brouwer, Ph.D. Director Bioinformatics Services and Professor of Bioinformatics and Genomics [email protected]

Transcript of Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division...

Page 1: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

1

Experimental Approaches in Nutrigenetics #2:

Studying the Effects of Phytochemicals on Human Health and Disease

Cory Brouwer, Ph.D.

Director Bioinformatics Services and Professor of Bioinformatics and Genomics

[email protected]

Page 2: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

2

Outline

bioservices.uncc.edu

Nutrigenetic discoveries

How they were made

Bioinformatics in Nutrigenetics/NutrigenomicsP2EP KnowledgebaseVisualization

Page 3: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

3

1-Carbon Metabolism

Produce the building blocks of DNA and RNA

S-adenosylmethionine (SAM) – methyl group donor

Vitamins B12, B6 and folic acid co-enzymes

Amino acid methionine

Choline metabolism

Possible links to Cognitive function

CVD

Cancer

Etc.

Fredrickson et al. HUMAN MUTATION (2007) 28(9), 856-865

Page 4: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

4Jay JJ, Brouwer C. (2016) Lollipops in the clinic: information dense mutation plots for precision medicine. PLoS ONE 11:8.

1-Carbon Metabolism

Fredriksen et al. (2007) Hum. Mutat. 28(9) 856-865.

Page 5: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

5

Finding Variation

>2,000 variants discovered through GWAS

Usually require a minor allele frequency > 5%

Typically produce Manhattan Plots

GWAS Manhattan plot showing differences in allele

frequency between responders and nonresponders to

n-3 PUFA supplementation.

Iwona Rudkowska et al. J. Lipid Res. 2014;55:1245-1253

Page 6: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

6 bioservices.uncc.edu

NHGRI GWA Catalog

www.genome.gov/GWAStudies

www.ebi.ac.uk/fgpt/gwas/

NHGRI GWA Catalog www.genome.gov/GWAStudies www.ebi.ac.uk/fgpt/gwas/

Published Genome-Wide Associations through 12/2013 Published GWA at p≤5X10-8 for 17 trait categories

Page 7: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

7

However…

bioservices.uncc.edu

Even large GWAS typically only explain a fraction of heritability

Type 2 diabetes >150,000 individuals – 11% of heritability (Morris et al. Nat. Genet. 2012;44:981–990)

Crohn Disease >210,000 individuals – 23% of heritability(Franke et al. Nat. Genet. 2010;42:1118–1125)

So what other approaches can you take?

Page 8: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

8 bioservices.uncc.edu

http://www.koonec.com/k-blog/2010/06/22/how-to-select-candidate-genes-for-your-association-study/

or RNA-seq

Candidate Gene Approach

Page 9: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

9

1. Bioinformatics Services

2. Bioinformatics Research

3. Student Training (Ph.D., Masters, Undergrad.)

UNCC Bioinformatics at NCRC

Page 10: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

10

Bioinformatics Services Division

Majority Ph.D. level research staff

Specializations in:

Full time high performance computing expert

Full time office manager and administrative staff

Student interns

NGS (DNA-seq, RNA-seq, reference and de novo)biostatisticsgene expressionmetabolomics

metagenomicsvariant analysispathway analysisdata integrationtext mining

Page 11: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

11

Bioinformatics Services Division

1000+ core Linux cluster

8 GB RAM per core

1 PB Compellent Storage Solution

Multiple High-Memory servers,2 with 2 TB RAM, 1 with 4 TB RAM

Multiple Mac, Linux and Windows workstations

High Performance Computing

Page 12: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

12

Software Resources

Ingenuity IPA & GeneGo Metacore, an integrated knowledge database and software suite for pathway analysis of experimental data and gene lists.

Linguamatics i2e, text mining software that enables the extraction of relevant facts and relationships from Medline and AGRICOLA.

CLC Bio, which provides some of the newest algorithms and analyses available for analyses of genomic, transcriptomic and epigenomic NGS data.

Geneious Pro, a software platform that is able to search, organize and analyze genomic and protein information.

Umetrics SIMCA-P+, software is used for the process of designing experiments and also for multivariate data analysis.

SAS program package for statistical analyses.

Numerous open-source software packages as well as in-house developed applications, scripts and workflows

Page 13: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

13

Omics

GenomicsGWAS, WGS, WES, etc.

EpigenomicsMethyl-seq

TranscriptomicsRNA-seq

ProteomicsProtein-Protein interactions

MetabolomicsMass Spec, NMR

Page 14: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

14

‘Omics

TranscriptionalProfiling

ProteomicsMetabonomics

Functional Screens

RNAi

Genetics

Pathways

PPi Networks

Literature

Bringing Information Together

Knowledge of human health and nutrition

Disease

Page 15: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

15

Traditional Bioinformatics

bioservices.uncc.edu

BLAST Search

ClustalW Alignment

Protein Domains

Page 16: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

16

Traditional Bioinformatics

bioservices.uncc.edu

Command-line Programming

Page 17: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

17

Knowledge-Based Bioinformatics

Text Mining

Ontologies

Knowledge bases

Network analysis

bioservices.uncc.edu

Page 18: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

18

Growth of PubMed 1986-2010

Zhiyong Lu Database 2011;2011:baq036bioservices.uncc.edu

Page 19: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

19

Text mining concepts

Information Retrieval Define relevant literature

Entity RecognitionIdentifying agricultural and biomedical entities

Information ExtractionFormatting of typified information

Page 20: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

20

Results

P2EP Knowledge Base – Hypothesis Generation

OMIM

Homologene

NCBI Taxonomy

AGRICOLA

PubMed

NCBI Gene

BioCyc

Allen Brain Atlas

Gene Expression(GEO)

Gene Ontology

Mammalian Phenotype Ontology

Human Phenotype Ontology

Reactome

ChEBI

P2EP Internal data

SequenceMarkers

Curated Gene ListsCurated LiteratureCurated Pathways

P2EP-KB

Data Sources

Queries

Page 21: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

21

P2EP Knowledge-base Statistics

bioservices.uncc.edu

Node Source Unique Count Description

ncbi.gene 14,314,272 All NCBI Gene identifiers

ncbi.taxonomy 1,113,614 Everyone entry in NCBI Taxonomy tree

ncbi.pubmed 1,075,534 PubMed records with Gene Annotations

chebi 41,824 ChEBI nodes (full tree)

gene_ontology 40,959 Gene Ontology nodes (full tree)

tair 33,334 TAIR Gene Identifiers

biocyc.gene 29,770 BioCyc Gene Identifiers

biocyc.enzyme_reaction 27,408 BioCyc Enzyme-specific Reaction Identifiers

ensembl 20,056 Ensembl Identifiers referenced in NCBI

biocyc.protein 18,271 BioCyc Protein Identifiers

hpo 10,491 Human Phenotype Ontology terms

mpo 10,244 Mammalian Phenotype Ontology terms

biocyc.reaction 4,627 BioCyc Reaction Identifiers

mim 3,995 OMIM Identifiers

biocyc.compound 3,444 BioCyc Compound Identifiers

biocyc.regulation 1,383 BioCyc Regulation Identifiers

plant_trait_ontology 1,327 Plant Trait Ontology Identifiers

enzyme_commission 1,191 EC Numbers

biocyc.pathway 563 BioCyc Pathway Identifiers

mirbase 425 miRBase Identifiers

unit_ontology 330 Unit Ontology Identifiers

plant_ontology 193 Plant Ontology Identifiers

433,383,352 Linked Pairs

Page 22: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

22

Knowledge-base Workflow

bioservices.uncc.edu

Ontologies• MeSH• NALT• ChEBI• Entrez

Gene• KEGG

• Text Mining/NLP• Ontology based• Query development• Co-occurrence (6,7)

Agricola

Graph Database

JSON(Mining Results)

CSV(Intermediate

Files)

EXTRACT data (Python)

Page 23: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

23

Literature Based Discovery

“Literature Based Discovery (LBD) refers to a particular type of text mining that seeks to identify nontrivial assertions that are implicit, and not explicitly stated, and that are detected by juxtaposing (generally a large body of) documents”

-Neil R. Smalheiser

Page 24: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

24

ABC Model of Discovery

Concept A:Dietary Fish Oils

Concept C:Raynaud’s Syndrome

Concept B: Blood Viscosity

Concept B: Platelet Aggregation

Concept B: Vascular Reactivity

Figure adapted from Figure 1 of:Smalheiser, Neil R., Vetle I. Torvik, and Wei Zhou. "Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in MEDLINE." Computer methods and programs in biomedicine 94.2 (2009): 190-197.

Page 25: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

25

Cancer Treatment

bioservices.uncc.edu

We know of many food constituents that have anti-cancer properties,” says Dr. Steven Zeisel, director of the

University of North Carolina’s Nutrition Research Institute … Garlic, broccoli, green tea and turmeric, for

example, have been shown to fight cancer through extensive good research, he says. “But we do not know

precisely which mixture of these constituents works best.”

Page 26: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

26

Cancer pathways

bioservices.uncc.edu

Fruman & Rommel Nature Reviews Drug

Discovery 13, 140–156 (2014)

PI3K–AKT–mTOR signaling network

Page 27: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

27

Broccoli to Cancer

bioservices.uncc.edu

Brassica oleracea

Hydroxycinnamicacid

PIK3CB

PIK3CD

PIK3CG

PIK3CA

has

Thyroid Cancer

Endometrial Cancer

Prostate Cancer

Cervical Cancer

Neuro-blastoma

Breast Cancer

Colorectal Cancer

Plant

Phytochemical

Drug target

Disease

Key

Page 28: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

28

Broccoli to Cancer

bioservices.uncc.edu

Brassica oleracea

Hydroxycinnamicacid

PIK3CB

PIK3CD

PIK3CG

PIK3CA

Thyroid Cancer

Endometrial Cancer

Prostate Cancer

Cervical Cancer

hasNeuro-

blastoma

Breast Cancer

Colorectal Cancer

Plant

Phytochemical

Drug target

Disease

Key

Kaempferol

Page 29: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

29

P2EP Knowledgebase – Web Interface

bioservices.uncc.edu

Page 30: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

30

P2EP Knowledgebase

bioservices.uncc.edu

Plant Genetic Markers

Plant Genomic Sequence

Plant Genes Plant Pathways

Plant Phytochemicals

& nutrients

Human genesHuman

Pathways

Human Health & Disease

Page 31: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

31

Plant Pathway Elucidation Project (P2EP)

Scientific DiscoveryWhat plants make

How it makes them

What is the effect on human health?

Educational OpportunityStudents training students (Ph.D.s -> undergrads)

Research opportunities for undergrads

Knowledgebase Creation Assemble what is known about plant metabolic pathways

Capture what is discovered

bioservices.uncc.edu

Page 32: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

32

1-Carbon Metabolism

Page 33: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

33 bioservices.uncc.eduhttps://github.com/pbnjay/lollipops

Page 34: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

34 bioservices.uncc.edu

Page 35: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

35

Page 36: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

36 bioservices.uncc.edu

Page 37: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

37

Pathview – pathview.uncc.edu

bioservices.uncc.edu

RNA-seq results

Metabolomics

Proteomics

Anything mapped togenes or metabolites

Page 38: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

38

Software Developed by BiSD

LinkageMapViewExome Project Reports& Lollipops

Page 39: Experimental Approaches in Nutrigenetics #2 · 2017-05-06 · Bioinformatics Services Division Majority Ph.D. level research staff Specializations in: Full time high performance computing

39

Acknowledgements

bioservices.uncc.edu

Richard Linchangco

Jeremy Jay

Rob Reid

Weijun Luo

Bioinformatics Services Group

P2EP Students and PIs