Algorithmic approach to computational biology using graphs

Algorithmic approach to Computational

Biology using Graphs

Submitted by

S P Sajjan

Research Guide

Dr. Ishwar BaidariMCA,Ph. D.

Dept. of Computer Science

Karnatak University, Dharwad.

What is Computational Biology?"Computational biology is not a "field", but an "approach" involving

the use of computers to study biological processes and hence it is an area as

diverse as biology itself."

• Biological data

Biological data are data or measurements collected from

biological sources,

which are often stored or exchanged in a digital form.

Biological data are commonly stored in files or databases.

Ex : DNA sequences, and population data used in ecology.

• Functional molecules

In organic chemistry, functional groups are specific groups

of atoms or bonds within molecules that are responsible for the

characteristic chemical reactions of those molecules.

• Mining in molecular biology

Text-mining in molecular biology is defined as the

automatic extraction of information about genes, proteins and

their functional relationships from text documents.

Ex: Information science, Bioinformatics and Computational

linguistics.

• Defining Metabolism

The term, 'Metabolism' refers to biochemical processes

that happen within a person or living organism.

Metabolism is something that consists of both,’

Catabolism,' and, 'Anabolism;' which are the buildup and

breakdown of substances.

Cellular networks

• Interacting molecular sets

within cells.

• It includes mainly p-p

interactions, metabolism, gene

transcriptional regulatory

networks and signal

transduction pathways.

• All of them are different subsets

of a single large-scale cellular

network, since they are

eventually cross-linked.

Purpose of Computational Biology

• Computational Biology can be summarized as the field

utilizing high throughput technology and computation to study

complex organizational patterns of biological systems and

how they contribute to the normal physiology and disease.

• Experimental systems biology uses various

genomics/proteomics.

• Large number of genes or proteins at a genome scale, which

naturally yields a large volume of data to be interpreted and

put within the context of real biology.

• There are several nation-wide large projects aiming at

characterizing the genome and proteome of different (e.g

cancer) cells.

• Billions of dollars are spending into this research that spans

many of the top institutions across the nation.

• Classical molecular biology has mainly focused on gene or

molecular centric research,

• 30-40 years of this research led to our realization of the

incredible complexity of biological systems.

• we need more global experimental approaches and equally as

importantly.

Relevance of the study and present status

Issues Related to Computational Biology

• ~22,000 noted Human genes in Sequence

• ~60,000 known protein-protein interactions in human

• Millions of indirect relationships between genes

• Typical genomic experiment: millions of data points

Statement of Research Problem

• The theory of complex networks plays an important role in a

wide variety of disciplines, ranging from communication to

molecular and population biology.

• The focus of this Research is on graph theory methods for

computational biology.

• We will survey methods and approaches in graph theory,

along with current applications in biomedical informatics.

• Within the fields of Biology and Medicine, potential

applications of network analysis by using graph theory

including identifying drug targets, determining the role of

proteins or genes of unknown function.

• There are several biological domains where graph theory

techniques are applied for knowledge extraction from data.

We have classified these problems as follows.

• Modeling methods of bio-molecular networks such as protein

interaction networks, metabolic networks, as well as

transcriptional regulatory networks.

• Measurement of centrality and importance in bio-molecular

networks. To identify the most important nodes in a large

complex network is of fundamental importance in

computational biology.

• We will introduce several researches that applied centrality

measures to identify structurally important genes or proteins

identified in this way.

• Mining new pathways from bio-molecular networks.

• Experimental validation of identification of the pathway in

different organisms is requires huge amounts of time and effort.

• Thus, there is a need for Graph theory tools help scientists predict

pathways in bio-molecular networks.

• Our primary goal in the present Research is to provide as broad a

survey as possible of the major advances made in this field.

Moreover, we also highlight what has been achieved as well as

some of the most significant open issues that need to be addressed.

• Finally, we hope that this Research will serve as a useful

introduction to the field for those unfamiliar with the literature.

The concept of Graph theory

• Graph: A graph G consists of a set of vertices V(G) and set of

edges E(G).

• Simple Graph: In simple graph, two of the vertices in G are

linked if there exits an edge (𝑉𝑖, 𝑉𝑗) ∈E(G). connecting the

vertices and in graph G such that 𝑉𝑖 ∈V(G) and 𝑉𝑗 ∈V(G).

• Undirected Graph : An undirected graph is graph, i.e., a set of

objects (called vertices or nodes) that are connected together,

where all the edges are bidirectional. An undirected graph is

sometimes called an undirected network.

• Directed Graph: A directed graph is graph, i.e., a set of objects

(called vertices or nodes) that are connected together, where all

the edges are directed from one vertex to another. A directed

graph is sometimes called a digraph or a directed network.

Modeling of Bio-molecular networks in

Graph• In Biology, Transcriptional regulatory networks and metabolic

networks would usually be modeled as directed graphs.

• For instance, in a Transcriptional regulatory network, nodes

represent genes with edges denoting the Transcriptional

relationship between them.

• In recent years, attentions have been focused on the protein-

protein interaction networks of various simple organisms. These

networks describe the direct physical interaction between the

proteins in an organism’s proteome and there is no direction

associated with the interactions in such networks.

• Hence, PPI networks are typically modeled as undirected

graphs, in which nodes represent protein and edges represent

interaction.

Computational Limitations• The challenges of computational biology are enormous, and may exceed

the expected increases in computing capability. Several years ago the

computational power of “state-of-the-art parallel supercomputers”

allowed highly predictive calculations treating only hundreds of atoms for

time scales of picoseconds, while molecular dynamics calculations of tens

of thousands of atoms for nanoseconds were becoming common, although

they were some what less predictive.

• A straightforward application of Moore’s Law would predict an increase

of about three – four doublings in capability in the intervening five or six

years.

• Using current methodologies, achieving the desired level of computation

would represent an increase of greater than ~109 times in computing

power.

• It must be noted that even an increase of ~109 in computing power would

only provide the ability to simulate certain cellular systems, and may not

provide a means to predictively model whole cells, organs or organisms.

Algorithmic approach to computational biology using graphs

Technology

Transcript of Algorithmic approach to computational biology using graphs