Scalable Algorithms for Next-Generation Sequencing Data Analysis
Protein Sequencing Algorithms
-
Upload
mohammad-usman -
Category
Documents
-
view
85 -
download
3
Transcript of Protein Sequencing Algorithms
Protein Sequencing Algorithms – A survey
Muhammad Usman (Author)
School of Science and Technology
University of Management and Technology
Lahore, Pakistan
Abstract— Protein sequencing is used in many fields. In this
technique, sequence of amino acids in a protein is determined by
using an algorithm. For this, there should be better
understanding of structures as well as functions of proteins in
any living organism. In this paper, different algorithms of protein
sequencing have been discussed. Majorly ten algorithms are
discussed with their applied formulas as well as their steps and
then well demonstrated with graphs. In comparative analysis,
these algorithms are compared; paper is concluded with best
possible algorithm for protein sequencing.
Keywords—(Adenine, Guanine Thyamine, Markov,
Oligonucleotides, Nucleotides, RNA, DNA)
I. INTRODUCTION
The word Protein is derivate from a word in Greek language,
“proteios” that stands for Primary. It’s not hard to say that
proteins are one of the vital building blocks for a living
individual. They are composed of a chain of amino acids of
various types (around 25 are commonly used) mostly refereed
as standard amino acids. Scientists have been researching on
them for over 200 years that includes their structure,
functionality and use. Still, there are many queries unanswered
in this domain like, how they transforming a basic linear
primary structure (amino acids) to a useful 3D assemblage.
From core it is biological problem that is rooted in multiple
domains. In Computer Sciences, it can be mapped to an NP –
hard problem, in both Physics and Geometry the same problem
can be classified in to “self avoiding walk”. The solution for
such problems needs a complete integration of various domains
and is very interesting to address. An accurate prediction may
lead a bundle of fields in the coming era.
Proteins have various types like functional, structural,
hormonal etc. Proteins are composed of a unique pattern of
amino acids (essential & non essential). Amino acids that are
essential and our body does not produce, we take them from
outside. DNA is a structural part of gene which is in double
helix form. DNA consists of 4 bases nucleotide, one phosphate
group and one sugar group. Nucleotide bases further consists of
adenine, guanine, thymine and cytosine. Out of these four,
three combine to form a helix structure to form an amino acid.
For instance, adenine, guanine, and thymine combine to form a
unique amino acid called methyonine. One amino acid is
coded by 3 bases. To create amino acid there must be an
algorithm followed which is called transcription. After that
there is another method which produces messenger RNA (ribo
nucleic acid), and finally messenger RNA is translated in
proteins. The method to create proteins from nucleotide chain
is called translation. The overall procedure is well explained in
the figure below.
Tommy Bennet and James A. Coker [3] came up with
NGA(Niche Genetic Algorithm) that later on was declared as
an extension to GA which can address the problems related to
multiple optima. They also compared NGA with DSGA
(Dynamic Radius Species Conserving Genetic Algorithm) and
found promising result.
There are various algorithms designed for translation and
transcription. Here we will find a comparison for transcription
and translation algorithms for different types of proteins.
II. RELATED WORK
One of the common and traditional way to predict the structure
(folding and formation) of proteins is GA (Genetic Algorithm).
It has a good computational power to predict the structure of
proteins. But when is comes to multiple optima (multiple
proteins), GA is not considered to be that efficient. Michael
Scott Brown, Tommy Bennet and James A. Coker [3] came up
with NGA (Niche Genetic Algorithm) that later on was
declared as an extension to GA which can address the problems
related to multiple optima. They also compared NGA with
DSGA (Dynamic Radius Species Conserving Genetic
Algorithm) and found promising result.
Alexander S. Krylov, and Renad I. Zhdanov [5] worked on
binding proteins. They experimented on short oligonucleotides
(short chained) and micro-array of hydrogel cells – biochip.
Firstly they worked on how a protein can recognize hortest
single strand oligonucleotide which they achieved by binding
oligonucleotides from 2 – 12 bases. They tried it for different
number of bases in this range and constructed a microarray that
DNAMessenger
RNAAmino Acids
Proteins
Transcription Translation
Figure 1: Proteins Formation
contains 16 di-nucleaotides. That array was then tested for
specific binding of proteins labeled with Texas Red of Bodipy.
Ga¨elle LENGLET and Sabine DEPAUW [6] proposed a
unique method to recognize protein structure by involving
glyceraldehyde (sugar group). They used Benzo-b-acronycine
guanine nucleo-bases of DNA helix and that has capability to
open DNA double helix locally, which is attached with its
cytotoxic activity which is also known as cell destruction
activity. Since enzymes are required to generate proteins, they
worked on an algorithm that took dehydrogenated enzyme and
combined it with alkyl group to generate the single/ double
stranded DNA coded by recorded telomerase (pattern repetition
algorithm) activity. They used the cyclic amplification of
sequence targeting (CASTing) algorithms for identification of
DNA-binding selectivity. Furthermore, there is an increase in
GAPDH binding as well as its partner HMG a.k.a high-
mobility group protein B1 to the chromatin at cellular level was
observed.
Figure 2: CASTing Tests
Pooya Zakeri and Yves Moreau [11] proposed a classical
method to recognize proteins through GEOMETRIC KERNEL
DATA FUSION. They break the linearity of base kernels by
taking the mean of individual kernels. Since geometric means
is of such proteins pattern is proven computationally hard and
expensive and one may consider it as computationally
unfeasible. This can be avoided by using Log – Euclidean
mean and can be considered as a consensus between the
arithmetic and geometric mean. They are successful to provide
a functional domain composition of proteins through a kernel –
based hybridization model.
Leonid Mirny [13] suggested a useful technique to find out
folding and binding in protein-DNA interactions. Proteins can
bind target sites ~102 -103 times faster than diffusion the limit.
They proposed a two stage mechanism, first stage involves
search of folding patterns of proteins and second step is to
recognize the pattern using various mathematical models.
Figure 3: Coupling and Bonding
An effective landscpae is made by using Random Energy
Model to get kinetics (fraction of time in S states) and stability
on the target site. It is actually a double edge sowrd that no
doubt speeds up the process due to S states but it slows down
due to a possibility to miss the site. They also used correlated
landscapes for finding coupling of binding and folding.
Quentin R. Johnson and Richard J. Lindsay [12] worked on
protein Recognition via Computer Simulation. Their main
focus was on the portion of utilization of computer simulations
as well as biophysical models for the evaluation of specificity
and strength of recognition of carbohydrate. They presented the
computational methods which are assisting quantification of
sugar recognition as well as they proved that traditional
problems such as cooperatively, purification and specificity can
be avoided by usage of those computational methods.
Additionally many other methods were compared that were
used for the calculation of binding between protein and
carbohydrate. At the end, successful examples of a binding
study by using computer simulation were used for
demonstration of the mature technique, rather than describing
existing deficiencies.
Ilda D’Annessa and Cinzia Tesauro [4] worked on Function of
elasticity in Protein-DNA-Drug Recognition. Processes of
covalent complex and reversibly stabilized of DNA-
topoisomerase were mainly contributed by these two authors.
They found that when compared with the protein of wild-type,
DNA substrate with minor rate of relegation was exhibited by
the transformed drug.
Figure 4: DNA and Protein Tests
Authors also proved that double mutant is more sensitive to
CPT as compared to the wild type. Sensitivity of CPT is
inversely proportional to the rate of relegation. This conclusion
shows that linker domain has a critical role and also shows the
effect of mutations in this domain on the catalytic site which is
actually in a region which is located at long distance from the
mutations. This paper also demonstrates the frequency of
communication between domains which are localized far away
one from the other.
Proteins binding with the help of the chemical structure is one
of the domains that are focus of attention in this era. Alexander
175 200 250 300 350100 125170
200275
300350
400450
525
1
0
200
400
600
800
1000
1200
1400
Test1 Test2 Test3 Test4 Test5
No
of
Sam
ple
s
Tests
CASTing Algorithm Tests
Telomerase Alkyle group Dehydrogenase
0
0.5
1
1.5
1 2 3 4 5 6 7Protien DNA Substrate
S. Krylov, and Renad I. Zhdanov [5] worked on protein
recognition by chemical composition. To know about the
shortest and longest single strand oligonucleotide, they
performed experiments for recognition as their initial stage.
Furthermore, they tested the binding behavior by mixing 2-12
bases and identified that tetra nucleotide one is quite handsome
for protein binding. This results in the simplest protein binding
microarray.
The group of non covalent exchanges among DNA,
carbohydrates, small molecules, proteins or lipids are critical
events in many processes of biology. For better understanding
of reactions of biochemistry as well as procedure of designing
therapeutic agents which are useful to treat many diseases and
infections, characterization and discovery of the interactions of
these small molecules is essential. Since last twenty years, a
major tool used in vitro for quantification as well as
identification of protein–ligand interactions is known as
electrospray ionization mass spectrometry (ESI-MS). In this
paper, ESI-MS will be implemented for determining the
binding stoichemtry and affinity of protein–ligand. Also,
common sources of error encountered with these measurements
and many strategies for overcoming them will be discussed. At
the end, challenges which are related to the process of
implementation of the assay will be discussed with future
work.
Hoon Choi and Seungsoo Han [1] recognize protein patterns by
SAPs (Stress associated proteins). Plants contain zinc finger
domains that are helpful for the recognition of regulatory signal
in cell known as ubiquitin. Thus, it was not clear that whether
domains in plants and domains in animal cells perform similar
roles. They shows a unique series of feature among these
domains. The highly conserved diaromatic patch is replaced by
the dialipathic patch. Results have shown that AtSAP5 shows
better results for linear and K63-linked polyubiquitin chains as
compare to K48- linked one.
The entire PGLYRP1 gene from Macaca thibetana and
Rhinopithecus roxellana is identified for exploration of the
adaptive evolution of the peptidoglycan (PGN)-recognition
protein 1 gene in primates and also shows the function of this
antibacterial protein. It is shown by homology analysis that the
identity of nucleotide and deduced amino acid sequences of
PGLYRP1 among 10 primates ranged from 82.0 to 99.0% and
74.5 to 98.5%, respectively. By using the Bayes empirical
Bayes procedure, authors also found two positively selected
condos (121L and 141T sites) that are not affected by PGN-
binding and PGLYRP-specific regions and also for the
functional effect of the PGLYRP1 protein, two potential key
sites were implied.
Małgorzata Grabinska and Paweł Błazej [2] worked on Markov
chains ( the most commonly used algorithm for protein
sequencing). They used matrices that describe the
dependencies among nucleotides sequences. After that they
predict a gene measured by some content. Algorithm used was
PMC which takes 6 different Markov chains and tells about
transitions among nucleotides separately for DNA strand.
They suggested that PMC algorithm shows better precision
than the other Markov models.
W.Liu, Y.F. Yao, L.Zhou, Q.Y.Ni ans H.L.Xu [10] performed
an analysis on peptidoglycan-recognition protein gene
(PGLYRP1) which is used in primates. They discussed the
evolution of this protein gene by considering and discussing all
previous work done on this protein gene. Authors stated that
the immune system or self-defense system of any micro-
organism can be recognized by using many recognition
receptors which are highly functional. Work on this recognition
was started in 2002 by Hoffman and Reichhart. After that
immune system of mammals was discussed by Takeda and
Akira in 2005 which shows that mammals have many
compulsory constituent members of proteins such as CD14.
This work continues from insects to mammals and many other
proteins such PGRPs, PGLYRP-S and PGN-lytic enzymes
were discussed up to year 2007 by many authors. Primates of
non-human are usually used for experimentation or studies of
transplantation from 2007 own-wards. Many primates were
also used for vaccination purposes. Authors stated that protein
gene such as PGLYRP1 was also found in many parts of living
organisms such as in corneal tissue, bone marrow, kidney,
lungs and liver. It also helps in killing bacteria and helps in
activating two- component protein-sensing system whenever
skin have a contact with any complicated external
environment. They have doe analysis by using molecular
evolutionary analysis. Also roles of insect PGRs were
documented too.
Oleg V. Kovalenko, Andrea Olland, Nicole Piché-Nicholas,
Adarsh Godbole, Daniel King, Kristine Svenson, Valerie
Calabro, Mischa R. Müller, Caroline J. Barelle, William
Somers, Davinder S. Gill, Lidia Mosyak and Lioudmila
Tchistiakova [11] discussed a new category of immunoglobulin
known as new antigen receptors (IgNARs). These antigens
belong to the class of Ig-like molecules. Authors took
experiments by following some major steps for recognition of
these receptors as well processing of these IgNARs. Authors
discussed shark IgNARs which are actually associated more
with human, rat or mouse species. At the end of the paper, they
showed results according to the structure of specie as well as
they consider many other elements of a molecule.
III. TECHNICAL FRAMEWORK
pad is processed with droplets of aqueous solutions of ON and
the ON were immobilized by reductive coupling of their amino
groups with the aldehyde groups of the gel. Thus, the biochip
was formed with single stranded oligonucleotides immobilized
inside gel pad. For experimental control as well as data
processing by using the "LabVIEW virtual instrument
interface", special software was designed by the authors.
Figure 5: Visual Image of Hybridization pattern through a
microchip
Ga¨elle LENGLET and Sabine DEPAUW [6] used
glyceraldehydes for protein recognition. Initially cell structure
and protein extraction was done using chromatographic
techniques, electrophoresis and MS analysis
Figure 6: Chromatographic isolation
For linear data they used chromatographic techniques but for
proper and chained data, electrophoresis was used and was then
refined by MS analysis. The extracted data is then passed
through a specific protocol, EMSAs (electrphoretic mobility –
shift assay).
Figure 7: EMSA protocol for different protein patterns
The protocol is used to sense protein composites with nucleic
acids and to analyze quality and quantity of multiple interactive
systems. After electrophoresis, the division of proteins
containing nucleic acid is obtained, by autoradiography. The
result is then fit for CASTing (a cyclic process normally used
for amplification and sequence targeting). The algorithm takes
DNA as an input and dissolves it in a calculated buffer (which
is to be amplified by PCR) and finally PCR products are
amplified that recognize protein patterns.
Figure 08: Protein Sequencing
Elena N. Kitova [8] used direct ESI-MS Measurements for
protein sequencing. Initially the algorithm detects and quantify
free and ligand proteins. For this Cafor a given protein is
obtained by a ratio (K) that describes the abundance (Ab) of
ligand and proteins. The relation is
P + L ↔ PL
Ca s calculated by following relation
Ca =K
[L]a − K
1 + K[P}a
Where K is determined by, [PL]
[P}=
Ab(PL)
Ab(P)= K
The abundance of every detected PL and P ions should include
K. The relation is fine for linear data but to break linearity in
the given data the above relation can be written as; K
{K + 1}
= 1 + Ca[P]a + Ca[L]a − √(1 + Ca[P]a − Ca [L]a)2 + 4Ca[L]a
2Ca[P]a
The relation above is used for ESI-MS binding and its values
normally range from 0.050 – 20. Moreover, P and L lies
between 0.10 - 1000 M. The relation above describes the
uniformity of response factors, P and PL. But for non uniform
data, below relation is suitable. [PL]
P= 1 + CFp − Ab(PL)/CFPLAb(P)
W.Liu, Y.F. Yao, L.Zhou, Q.Y.Ni ans H.L.Xu [10] used
peptidoglycan-recognition protein gene (PGLYRP1) of
primates for their study of analysis. They took many DNA
samples which were taken from muscle tissue of one of the
species of monkey. They took experiments at the wildlife
protection laboratory in china. They also downloaded some
other samples of PGLYRP1 of crab-eating macaque as well as
from human. They used Ensembl Genome Database for the
collection of multiple samples from different species. After
collection, they amplify, clone and sequence these all proteins
genes. They used polymerase chain reaction (PCR) which was
designed on the basis of PGLYRP1 sequence. Primer 5.0
software was used for the processing. Process of PCR was held
in thermal cylinder names as Mastercycler gradient in Germany
with a total reaction volume of approx. 50 μL which contains 1
μL 10 ng/μL genomic DNA, 0.5 μL of each primer, 5 μL 2X
buffer, 25 μL 2X mix, 18 μL double-distilled water, and 5 μL
mineral oil. Some conditions of temperature as well as of
timings were also considered by the authors for better results.
PCR gel extraction kit was also then used for the purification
process of PGLYRP1. After amplification, cloning is done and
cloned into a pMD 19-T Simple vector. For sequencing of
these cloned proteins, authors used Bug Dye Terminator v3.1
cycle sequencing ready reaction kit. These sequences were then
assembled using software named DNASTAR and complete
coding sequence is obtained of PGLYRP1.
After the processes of amplification, cloning and sequences of
protein gene, analysis on data was done. Sequence obtained
was firstly confirmed through some checks using software
named Chromas 1.45 and if there is any correction needed, that
will be made before further processing. Authors took many
parameters and many sites in MegAlign program for their
analysis study. Authors used already discussed molecular
evolutionary genetic analysis for their own study. They took
different values for different ratios. In results, authors used
trees and tables for comparison of different values if ratios and
then discussed these values in detail according to the type of
specie.
IgNARs were discussed by Oleg V. Kovalenko, Andrea
Olland, Nicole Piché-Nicholas, Adarsh Godbole, Daniel King,
Kristine Svenson, Valerie Calabro, Mischa R. Müller, Caroline
J. Barelle, William Somers, Davinder S. Gill, Lidia Mosyak
and Lioudmila Tchistiakova [11]. They defined this recognition
process in nine major steps. First step was of designing and
cloning the variants of humainized V-NAR. In this step, E06
variants were codonoptimized for expression in mammalian
cells and synthesized by using GeneArt AG. Some control such
as murine CMV promoter is considered while process of
cloning. Second step was of expression and purification of V-
NAR proteins. Authors used COS-1 expression type for
representation of fusion protein named V-NAR-hFc. On basis
of recommendation of manufacturer, cells used TransIT
reagent for tranfection. Similarly monomeric V-NARs were
expressed in COS-1 cells as well and they were purified using
chromatography technique. Different minerals used for the
process of chromatigraphy such as sodium phosphate, NaCl,
and imidazole. Concentration of protein is then determined by
using OD 280mm. Cells which are grown in serum-free style,
expression of FreeStyle293 was used. Third step was of
isolation of E-06 proteins. In this step, E-06 was applied with
Ni2--NTA Super flow resin. Resulted substance is then washed
by using PBS supplement contains imidazole. Dialyze the E06
again BS will remove excessive imidazole and process it for
next step. For the removal of oligomeric speciies, PBS contains
lipid-free HSA is used. Incubation is then done for one hour
and Superdex 200 was applied to it for the removal of excess
E06. At the end, remaining fractions were pooled and prepared
it for the process of crystallization. Fourth step is of ELISA.
Proteins of serum albumin in used for binding of
experimentations. Direct and indirect ELISA is done. Detection
of V-NAR bindings in case of direct ELISA is done with costar
assay plates which were coated with PBS. Fusion protein such
as VNAR-hFc were diluted by using assay buffer and sandwich
ELISA, anti-hFc pAb coating on plates was used Fifth step of
crystallization. In this step, major consideration was of
temperature fixing. E06 crystals were obtained by keeping
temperature at 18 degree Celsius for hanged drops. Different
quantities of solutions were used with different minerals such
as protein complex, NaCl and sodium acetate. At the end of
this step, diamond shaped crystals were obtained in one night
which continues growing up to one week approx. Sixth step is
of data collection and processing. Data was collected by using
APS beamline 22-ID on a detector of MAR-300. Program
named Xia2 was used for scaling and integration of intensities.
Another program named autoProc was also used for the same
purpose. Seventh step is to phasing, model building and
refinement of E06. For this process, PHASER is used for the
replacement of complex E06 with HSA. Model used was apo
HSA (PDB ID: 1AO6). At the end Phenix was used for the
refinement process. Different programs ans models were used
for different type of proteins in this step. Eighth step is of
measurements of E06. Kinetic constants of E06 were collected
by using surface plasmon resonance (Biacore T100, GE Life
Sciences). Finally last step is of assigning accession numbers.
Factors as well as coordinates based on structure were
deposited with the Worldwide Protein Data Bank - PDB ID:
4HGK (E06) and PDB ID: 4HGM (huE06 v1.1)..
IV. COMPARITIVE ANAYSIS
of the techniques discussed in paper was by Micheal Scott
Brown Niche [3] of Genetic algorithms. These algorithms were
better for proteins recognition but it reduces the dimension. As
proteins are in 3D but this algorithm first converted proteins
into 2D and then process it further. By doing so, search space is
also reduced.
Other technique was of markov chains used for sequencing of
proteins. Authors Małgorzata Grabinska and Paweł Błazej [2]
compared their work with the already presented algorithm of
PMC. Supervised learning was used for the training of data and
then original data is tested. Gene Mark algorithm was proposed
by Paweł Mackiewicz [2]. They also used markov chains but
they treated every protein sequence has three unique markov
chains. They also compared their flow with PMC algorithm
and ROC curves were used for efficiency calculations. True
positive rate for these algorithms has shown less variation.
Figure 09: PMC & Three chained Algorithm Comparison
Protein sequencing is discussed by Elena N. Kitova [8] by
using direct ESI-MS Measurements. Initially the algorithm
detects and quantify free and ligand proteins and then authors
used different formulas for linear and non linear data.
Comparatively, most of the authors used markov chains for
sequencing of proteins. Because markov chains can be used for
any dimensional data. But defficiency of this technique was
different computational cost of linear and non linear data.
Similarly the least expensive technique was used by the Gaelle
LENGLET and Sabine DEPAUW [6]. Chromatography is
widely used as well as less expensive. It also gives better
results but each stage of process used different kind of
technique.
Another protein recognition technique was presented by Ilda
D’Annessa [4] who worked on role of flexibility in Protein
DNA Drug Recognition. Author used specially designed
software for the processing of data. They took many
experiments using "LabVIEW virtual instrument interface" and
shows that results are better as compare to other algorithms.
Glyceraldehydes were used by Gaelle LENGLET and Sabine
DEPAUW [6] for recognition of protein. They used
chromatographic techniques, electrophoresis and MS analysis
for different types of data. For linear data they used
chromatographic techniques but for proper and chained data,
electrophoresis was used and was then refined by MS analysis.
They used EMSAs (electrophoretic mobility – shift assay)
protocol for the processing of extracted data.
Conclusions
The major purpose of this paper was the search and study of
protein sequencing, recogintion and creative exercise of this
knowledge to develop a novel approach to forecast protein-
protein complexes. Foundation of this study is a Neiche
Genetic Algorithm function that was derived from a previously
prepared dataset of Genetic Algorithm. On basis of its result, it
was used for computational scanning to calculate changes in
the binding of protein complexes. Computed and tentative
values proven good correlations and, thus, a PMS – algorithm
was introduced to perk up the predictive power. Based on these
findings, the PMS – algorithm was developed, which allows
identifying scums in protein and performing. The results have
shown that PMS – algorithm has not inly the state-of-the-art
process with respect to predictive power but also in terms of
computational speed. Markov chains were also productively
appraised by re - score six diferent datasets that includes bound
and unbound protein predictions. Furthermore, the chained
algorithm, it is useful if it is applied as an objective function in
mixture with different Markov chains to predict 3D structures
of protein-protein structure. For this, model based learned
learned algorithms were used to test protein sequencing. The
direct ESI-MS Measurements approach showed average results
for bound and restrained protein complex predictions. Not
many factors were recognized to persuade on the success of the
sequencing approach, such as the series of probable
conformational changes of a protein. Finally, a large-scale
validation study on peptidoglycan-recognition protein into was
performed. Results there by obtained allow identifying those
protein-protein interfaces that are best for molecular docking
approaches.
0
20
40
60
80
100
120
0 50 100 150
se
ns
itiv
ity
1 - specificity
ROC Curve
Figure 12: Linear Data Analysis
y = 2554.x + 36508
0200000400000600000800000
10000001200000
0 200 400 600
Sam
ple
s
Protiens Formation
Linear data Chromatographic Technique
Figure 13: Non Linear Data Analysis
y = -4.516x2 + 4342x - 29016
-2000000
200000400000600000800000
10000001200000
0 200 400 600
Sam
ple
s
Concentration
Non Linear Data , Electrophoresis Technique
REFERENCES
[1] Hoon Choi, Seungsoo Han, Donghyuk Shin, Sangho Lee. Sangho Lee. (2012), Polyubiquitin recognition by AtSAP5, an A20-type zinc finger containing protein from Arabidopsis thaliana.
[2] Małgorzata Grabinska, Paweł Błazej and Paweł Mackiewicz (Wrocław). (2013), Two Algorithms based on Markov Chains and their application to Recognition of Protein coding genes in Prokaryotic Genomes.
[3] Michael Scott Brown and James Coker. (2014), Niche Genetic Algorithms are better than traditional Genetic Algorithms for de novo Protein Folding.
[4] Ilda D’Annessa, Cinzia Tesauro, Paola Fiorani, Giovanni Chillemi, Silvia Castelli, Oscar Vassallo, Giovanni Capranico, and Alessandro Desideri. (2012), Role of Flexibility in Protein-DNA-Drug Recognition: The Case of Asp677Gly-Val703Ile TopoisomeraseMutant Hypersensitive to Camptothecin.
[5] Alexander S. Krylov and Renad I. Zhdanov. (2012), Nucleic acid – protein fingerprints. Novel protein classification based on nucleic acid – protein recognition.
[6] Ga¨elle LENGLET, Sabine DEPAUW, Denise MENDY and Marie-H´el`ene DAVID-CORDONNIER. (2013), Protein recognition of the S23906-1–DNA adduct by nuclear proteins: direct involvement of glyceraldehyde-3 phosphate dehydrogenase (GAPDH).
[7] Alfred V.Aho. (2012), Algorithms for finding patterns in Strings.
[8] Elena N. Kitova, Amr El-Hawiet, Paul D. Schnier, John S. Klassen. (2012), Reliable Determinations of Protein–Ligand Interactions by Direct ESI-MS Measurements. Are We There Yet?
[9] Parwiz Abrahimi, William G. Chang, Martin S. Kluger, Yibing Qyang, George Tellides, W. Mark Saltzman, Jordan S. Pober. (2015), Efficient Gene Disruption in Cultured Primary Human Endothelial Cells by CRISPR/Cas9.
[10] W. Liu, Y.F. Yao, L. Zhou, Q.Y. Ni and H.L. Xu. (2013), Evolutionary analysis of the short-type peptidoglycan-recognition protein gene (PGLYRP1) in primates.
[11] Oleg V. Kovalenko, Andrea Olland, Nicole Piché-Nicholas, Adarsh Godbole, Daniel King, Kristine Svenson, Valerie Calabro, Mischa R. Müller, Caroline J. Barelle, William Somers, Davinder S. Gill, Lidia Mosyak and Lioudmila Tchistiakova. (2013), Atypical Antigen Recognition Mode of a Shark IgNAR Variable Domain Characterized by Humanization and Structural Analysis.
[12] Quentin R. Johnson, Richard J. Lindsay, Loukas Petridis and Tongye Shen. (2015), Investigation of Carbohydrate Recognition via Computer Simulation.
[13] Jiansheng Jiang, Bing-Rui Zhou, Rodolfo Ghirlando and Tsan Xiao. (2013), A conserved mechanism for centromeric nucleosome recognition by centromere protein CENP-C.
[14] Wei-Lun Hsu. (2013), Mechanisms of binding diversity in Protein Disorder: Molecular Recognition features mediating protein interaction Networks.
[15] Wells, J. A.; McClendon, C. L., Reaching for high-hanging fruit in drug discovery at protein-protein interfaces. Nature 2007, 450, (7172), 1001-9.2.
[16] Mulder, G. J., Ueber die Zusammensetzung einiger thierischen Substanzen. Journal für praktische Chemie 1839, 16, 129-151.
[17] Campbell, N. A., Biologie. Spektrum Akademischer Verlag: Heidelberg, Berlin, Oxford, 1997; p 80.4.
[18] Crick, F. H., The genetic code--yesterday, today, and tomorrow. Cold Spring Harb Symp Quant Biol 1966, 31, 1-9. 5.
[19] Atkins, J. F.; Gesteland, R., Biochemistry. The 22nd amino acid. Science 2002, 296, (5572), 1409-10.6.
[20] Xu, X. M.; Carlson, B. A.; Mix, H.; Zhang, Y.; Saira, K.; Glass, R. S.; Berry, M. J.; Gladyshev, V. N.;
[21] Hatfield, D. L., Biosynthesis of selenocysteine on its tRNA in eukaryotes. PLoS Biol 2007, 5, (1), e4.7.