ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.

Post on 21-Dec-2015

222 views 0 download

Tags:

Transcript of ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.

ANALYSIS OF GENETIC NETWORKS USING

ATTRIBUTED GRAPH MATCHING

BACKGROUND

• Completion of sequencing projects• Need for functional discovery• Emerging area of study: Large

scale genomic analysis• Similarity of living systems

GENETIC NETWORKS

• Modelling genetic networks• Interaction of genes and proteins• Relationship between topology and

function

MOTIVATION

• Common biological processes• Comparison of networks• Discovering missing interactions• Discovering missing genes

GRAPH MATCHING

mpn132

mpn124

mpn141

mpn145

mpn134

mpn133

mge234

mge235

mge236

mge312

mge314

mge310

mge313

mge336mge337

Search-based Algorithm

Pruning Techniques

G1

G2

ROADMAP

• Scale-Free Networks• Modelling Genetic Networks• Graph Matching• Algorithm• Results

SCALE-FREE NETWORKS

COMPLEX NETWORKS

• Small-world model– WWW– Human acquaintances network– Citation networks– Biological networks

SMALL-WORLD

• Features:– Characteristic path length– Clustering coefficient– Sparseness

SMALL-WORLD

• Somewhere in between regular & random graphs

SMALL-WORLD • Highly clustered• Short diameter

SCALE-FREE NETWORKS

• Complex networks: biological, social, www, power grid, citation etc.

• Power low connectivity: P(k) = k -

• Hubs - authorities

SCALE-FREE NETWORKS

• Application for testing scale free behavior• Yeast• Helicobacter Pylori• Mycoplasma Pnuemonia• Mycoplasma Genitelium• Linear log-log graph• Slope =

SCALE-FREE NETWORKS • Slope is calculated by least mean

square method

TOPOLOGY & FUNCTIONALITY

• Small diameter– ease of dissemination of information– ease of restoring after disturbance

• Cliquishness – Alternate paths are found

• Heterogeneity– Random removal does not effect the

network– Hubs are vulnerable to attack

BIOLOGICAL ASPECTS • Multifunctionality

– Grouped into functional units

• Stability• Reason: Most of

the interactions are between hubs and authorities

MODELLING GENETIC NETWORKS

TYPES OF GENETIC NETWORKS• Categorized by data sources

– Metabolic pathways– Gene expression arrays– Protein interactions– Gene interactions

INTERACTION MAPS• High level perspective

– Nodes: Genes or proteins– Edges: Presence of an interaction

• Data sources– Two-hybrid analysis– Fusion analysis– Chromosomal proximity– Phylogenetic analysis

GRAPH MATCHING

PROBLEM DEFINITION

Attributed Relational Graph (ARG)

G = { V, E, X}.

V = {v1, v2, …, vn} Nodes

E = {e1, e2, …, em} Edges

X = {x1, x2,…,xn} Attributes

INEXACT SUBGRAPH MATCHING

Allow for :

• Mismatching attribute values

• Missing nodes

• Missing links

Also called error-correcting subgraph isomorphism

NP-Complete

SEARCH TECHNIQUES

• Cost function• Pruning (Structure Constraints)•Backtracking

ATTRIBUTED GRAPH MATCHING TOOL

ATTRIBUTE MATCHING

- Amino Acid Sequence Content Composition– array of 20, percentage of each aa– Amino acid grouped into classes: array of 6– Amino acid triples grouped into classes:

array of 216

MKVLNKNEL

216

1

2)]()([ 21

iiiS XX

6 x 6 x 6

A

anOaX

A

1n

))(( 1)(

ATTRIBUTE MATCHING

Difference in amino acid composition values of gene pairs for M. Genitalium and M. Pneumoniae.

Score

observations

STRUCTURAL CONSTRAINTS

• Effect of scale-free behaviour– Connectivity information: Highly

heterogeneous, thus start with most connected and work around it

– Pruning strategy: comparibility is determined by power low

loglog

)(log)(log

12

12

12

12

kk

kPkP

xx

yy

STRUCTURAL CONSTRAINTS• Neigborhood connectivity

– Choose the neighbor at the next stage

• Backtracking– Component by component– Go back to the neighbor with the

most connectivity within the component

TEST CASE

• Mycoplasma Genitalium: – smallest genome (470 ORFs)

• Mycoplasma Pnuemoniae: – Very similar, superset (688 ORFs)

TEST CASE...• Mycoplasma Genitalium:

– 232 nodes– 211 links

• Mycoplasma Pnuemoniae: – 267 nodes– 257 links

• Inputs:• MGE links• MPN links

• MGE synonyms• MPN synonyms

• MGE amino acid sequence• MPN amino acid sequence

RESULTSMGE MPN

DISCOVERY OF MISSING DATA

• Missing link

• Link between in MPN632 and MPN637 is missing in our data but exists in literature

DISCOVERY OF MISSING DATA

• Missing node with known COG

MPN236--- MPN237---MPN238---MPN678MG098 ----MG099-----MG100----MG459

MG459 is ortholog of MPN678

DISCOVERY OF MISSING DATA

• Missing node without known ortholog

CONCLUSION

• Large-scale genomics• Interaction data captures system

structure and dynamics• Graph matching exploits the scale-

free characteristics• Novel interactions and genes can

be identified

ACKNOWLEDGEMENT

• YASEMİN TÜRKELİ