In Silico GalNAc-T1 - Hindawi Publishing...

16
Research Article First Comprehensive In Silico Analysis of the Functional and Structural Consequences of SNPs in Human GalNAc-T1 Gene Hussein Sheikh Ali Mohamoud, 1,2 Muhammad Ramzan Manwar Hussain, 2 Ashraf A. El-Harouni, 2,3 Noor Ahmad Shaik, 2,3 Zaheer Ulhaq Qasmi, 4 Amir Feisal Merican, 5 Mukhtiar Baig, 6 Yasir Anwar, 7 Hani Asfour, 2 Nabeel Bondagji, 2 and Jumana Yousuf Al-Aama 2,3 1 Human Genetics Research Centre, Division of Biomedical Sciences (BMS), Saint George’s University of London (SGUL), London, UK 2 Princess Al-Jawhara Al-Ibrahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia 3 Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia 4 Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi 75270, Pakistan 5 Institute of Biological Sciences and Centre of Research for Computational Sciences and Informatics for Biology, Bioindustry, Environment, Agriculture and Healthcare (CRYSTAL, UM), University of Malaya, Kuala Lumpur, Malaysia 6 Faculty of Medicine, King Abdulaziz University, Rabigh, Saudi Arabia 7 Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia Correspondence should be addressed to Hussein Sheikh Ali Mohamoud; [email protected] Received 26 September 2013; Revised 12 November 2013; Accepted 17 November 2013; Published 4 March 2014 Academic Editor: Emil Alexov Copyright © 2014 Hussein Sheikh Ali Mohamoud et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. GalNAc-T1, a key candidate of GalNac-transferases genes family that is involved in mucin-type O-linked glycosylation pathway, is expressed in most biological tissues and cell types. Despite the reported association of GalNAc-T1 gene mutations with human disease susceptibility, the comprehensive computational analysis of coding, noncoding and regulatory SNPs, and their functional impacts on protein level, still remains unknown. erefore, sequence- and structure-based computational tools were employed to screen the entire listed coding SNPs of GalNAc-T1 gene in order to identify and characterize them. Our concordant in silico analysis by SIFT, PolyPhen-2, PANTHER-cSNP, and SNPeffect tools, identified the potential nsSNPs (S143P, G258V, and Y414D variants) from 18 nsSNPs of GalNAc-T1. Additionally, 2 regulatory SNPs (rs72964406 and #x26; rs34304568) were also identified in GalNAc-T1 by using FastSNP tool. Using multiple computational approaches, we have systematically classified the functional mutations in regulatory and coding regions that can modify expression and function of GalNAc-T1 enzyme. ese genetic variants can further assist in better understanding the wide range of disease susceptibility associated with the mucin-based cell signalling and pathogenic binding, and may help to develop novel therapeutic elements for associated diseases. 1. Introduction Glycosylation represents the common form of posttrans- lational modification that is critical for the stability, sol- ubility, secretion, and interactions of proteins [1]. Of two glycosylation types (N- and O-linked), O-glycosylation is the most abundant glycosylation form of proteins expressed in a variety of secreted and membrane-bound mucins [2]. e O-linked glycosylation (also known as mucin type) is initi- ated in the Golgi compartment of human cells by the transfer of monosaccharide N-acetylgalactosamine (GalNAc) from UDP-GalNAc to the hydroxyl groups of Ser/r residues in core polypeptides by a large family (20) of GalNAc- transferases (ppGalNac-Ts or GalNAc-Ts; E.C. 2.4.1.41) [3, 4]. Hindawi Publishing Corporation Computational and Mathematical Methods in Medicine Volume 2014, Article ID 904052, 15 pages http://dx.doi.org/10.1155/2014/904052

Transcript of In Silico GalNAc-T1 - Hindawi Publishing...

Page 1: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

Research ArticleFirst Comprehensive In Silico Analysis of the Functional andStructural Consequences of SNPs in Human GalNAc-T1 Gene

Hussein Sheikh Ali Mohamoud12 Muhammad Ramzan Manwar Hussain2

Ashraf A El-Harouni23 Noor Ahmad Shaik23 Zaheer Ulhaq Qasmi4

Amir Feisal Merican5 Mukhtiar Baig6 Yasir Anwar7 Hani Asfour2

Nabeel Bondagji2 and Jumana Yousuf Al-Aama23

1 Human Genetics Research Centre Division of Biomedical Sciences (BMS) Saint Georgersquos University of London (SGUL) London UK2 Princess Al-Jawhara Al-Ibrahim Center of Excellence in Research of Hereditary Disorders King Abdulaziz UniversityJeddah Saudi Arabia

3 Department of Genetic Medicine Faculty of Medicine King Abdulaziz University Jeddah Saudi Arabia4Dr Panjwani Center for Molecular Medicine and Drug Research International Center for Chemical and Biological SciencesUniversity of Karachi Karachi 75270 Pakistan

5 Institute of Biological Sciences and Centre of Research for Computational Sciences and Informatics for Biology BioindustryEnvironment Agriculture and Healthcare (CRYSTAL UM) University of Malaya Kuala Lumpur Malaysia

6 Faculty of Medicine King Abdulaziz University Rabigh Saudi Arabia7Department of Biological Sciences Faculty of Science King Abdulaziz University Jeddah Saudi Arabia

Correspondence should be addressed to Hussein Sheikh Ali Mohamoud husseinsheikhlivecouk

Received 26 September 2013 Revised 12 November 2013 Accepted 17 November 2013 Published 4 March 2014

Academic Editor Emil Alexov

Copyright copy 2014 Hussein Sheikh Ali Mohamoud et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

GalNAc-T1 a key candidate of GalNac-transferases genes family that is involved in mucin-type O-linked glycosylation pathwayis expressed in most biological tissues and cell types Despite the reported association of GalNAc-T1 gene mutations with humandisease susceptibility the comprehensive computational analysis of coding noncoding and regulatory SNPs and their functionalimpacts on protein level still remains unknown Therefore sequence- and structure-based computational tools were employedto screen the entire listed coding SNPs of GalNAc-T1 gene in order to identify and characterize them Our concordant in silicoanalysis by SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect tools identified the potential nsSNPs (S143P G258V and Y414Dvariants) from 18 nsSNPs of GalNAc-T1 Additionally 2 regulatory SNPs (rs72964406 and x26 rs34304568) were also identifiedin GalNAc-T1 by using FastSNP tool Using multiple computational approaches we have systematically classified the functionalmutations in regulatory and coding regions that can modify expression and function of GalNAc-T1 enzymeThese genetic variantscan further assist in better understanding the wide range of disease susceptibility associated with the mucin-based cell signallingand pathogenic binding and may help to develop novel therapeutic elements for associated diseases

1 Introduction

Glycosylation represents the common form of posttrans-lational modification that is critical for the stability sol-ubility secretion and interactions of proteins [1] Of twoglycosylation types (N- andO-linked)O-glycosylation is themost abundant glycosylation form of proteins expressed in

a variety of secreted and membrane-bound mucins [2] TheO-linked glycosylation (also known as mucin type) is initi-ated in the Golgi compartment of human cells by the transferof monosaccharide N-acetylgalactosamine (GalNAc) fromUDP-GalNAc to the hydroxyl groups of SerThr residuesin core polypeptides by a large family (sim20) of GalNAc-transferases (ppGalNac-Ts or GalNAc-Ts EC 24141) [3 4]

Hindawi Publishing CorporationComputational and Mathematical Methods in MedicineVolume 2014 Article ID 904052 15 pageshttpdxdoiorg1011552014904052

2 Computational and Mathematical Methods in Medicine

These series of glycosyltransferases act together in elongat-ing the O-linked glycan to produce a variety of denselyO-glycosylated mucin glycoproteins Mucin constitutes theprinciple component of mucus that defends the epithelialcell surfaces against infectious and environmental agents [56] Additionally mucin-like glycans also serve as receptor-binding ligands during an inflammatory response [7]

The GalNAc-Ts (GALNTs) are classified into 27 familymembers based on their sequence and structural similaritiesIn total 20 human GalNAc-Ts gene entries are made up-to-date Most of the GalNAc-T genes such as GalNAc-T1GalNAc-T3 GalNAc-T4 GalNAc-T5 GalNAc-T6 GalNAc-T7 GalNAc-T9 GalNAc-T12 GalNAc-T13 and GalNac-T14are important in determining the peptide and glycopeptidesubstrate specificities Analysis of intronexon positioning inhuman fish fly and worm supports the model that GalNAc-Ts evolved from a common ancestral gene Furthermoreseveral studies have provided the evidence that predictedorthologs of GalNAc-Ts genes in human mouse and flyrepresent true functional orthologs [4 8 9]

Over the past few years there has been considerableinterest in linking the genetic variations of GalNAc-Ts todisease susceptibility in humans For example rs17647532in GalNAc-T1 is known to influence the risk of epithelialovarian cancers [10] rs142046356 in GalNAc-T2 is describedto play a role in elevating the HDLc levels in some families[11] rs6710518 in GalNAc-T3 is strongly associated with bonemineral density and fracture risk [12] and rs52822348 inGalNAc-T4 increases the risk of acute coronary disease [13]Additionally genome wide studies have also revealed thebiochemically inactivating germ line and somatic mutationsin GalNAc-T12 genes [14]

In recent years computational approaches have beenextensively used to identify the impact of deleterious nsSNPsin candidate genes by using information such as conservationof sequences across species [15] structural attributes [16]and physicochemical properties of polypeptides [17 18]By adopting computational algorithms some studies havesuccessfully classified highly functional SNPs out of a hugepool of disease susceptible SNPs of BRCA1 gene ATMgene [19] and BRAF gene [20] based on their structuraland functional consequences Despite the availability ofconvincing data indicating the wide involvement of GalNAc-T1 genemutations in human diseases computational analysisof coding SNPs (nsSNPs and sSNPs in exonic regions) andnoncoding SNPs (Intronic exonic 51015840 and 31015840 UTR SNPs) incandidate GalNAc-Ts genes still remains unexplored

Hence in order to characterize the deleterious mutationsin candidate GalNAc-Ts genes the GalNAc-T1 gene which isknown to be expressed at high levels and is found broadlyamong most tissues and cell types [21 22] was chosen for insilico analysis in the current investigation Our experimentalstrategy (Figure 1) involved (i) retrieval of SNPs in GalNAc-T1 gene from public databases (ii) classifying the deleteriousnsSNPs to that of phenotypes based on sequence andstructure-based homology analyses and characterising theregulatory nsSNPs that alter the splicing and gene expressionpatterns (iii) predicting the precise effects of amino acid sub-stitutions on secondary structures by means of stability and

solvent accessibility simulations studies and (iv) analysingthe impact of amino acid substitutions caused by nsSNPson GalNAc-T1 interactions with other proteins in a networkThe in silico approaches adopted in this investigation offeradvantages over the experimental based ones due to theirconvenience reliability speed and of lower cost in order tolocate the amino acid variants that regulate the function ofGalNAc-T1 protein

2 Methodology

21 SNP Data Mining The data on the human GalNAc-T1(GALNT1) gene was collected from web-based data sourcessuch as Online Mendelian Inheritance in Man (OMIMhttpwwwncbinlmnihgovomim) and the National Cen-ter for Biological Information (httpwwwncbinlmnihgov) Details of the SNPs (mRNA accession number refer-ence and assay IDrsquos) and protein sequence ofGalNAc-T1 genewere retrieved from the dbSNP-NCBI and SwissProt

22 Analysis of Functional Consequences of nsSNPs by SIFTSorting Intolerant from Tolerant (SIFT httpsiftjcviorg)predicts the tolerated and deleterious SNPs in order toidentify the impact of amino acid substitution on phenotypicand functional changes of protein molecules In the currentinvestigation the identification numbers (rs IDs) of each SNPof GalNAc-T1 gene obtained from NCBI were submitted as aquery sequence to SIFT for homology searching The SIFTvalue le 005 indicates the deleterious effect of nonsynony-mous variants on protein function [23 24]

23 Simulation of Functional Consequences of nsSNPs byPolyPhen-2 Polymorphism Phenotyping-2 or PolyPhen-2(httpgeneticsbwhharvardedupph2) is a probabilisticclassifier which calculates functional significance of an allele-change by Naıve Bayes a set of supervised learning algo-rithms A mutation is evaluated qualitatively as probablydamaging (probabilistic score gt 085) possibly damaging(probabilistic score gt 015) and benign (remaining) cor-responding to the pairs of false positive rate (FPR) andtrue positive rate (TPR) thresholds adjusted separately forHumDiv (10 amp 18 FPR) and HumVar (19 amp 40FPR for probably and possibly damaging mutations resp)This tool was used to study the possible impacts of aminoacid substitutions on the function of candidate GalNAc-T1 protein Input options for PolyPhen-2 are comprised ofUniProt accession numberFASTA sequence and detail ofamino acids substitution [25 26]

24 Characterization of Functional nsSNPs by PANTHERand Fathmm The Protein ANalysis THrough EvolutionaryRelationships (PANTHER httpwwwpantherdborg) clas-sification system was used to characterize the functionalnsSNPs in GalNAc-T1 gene using HMM-based statisticalmodelling and evolutionary relationships in the proteinfamily [27] PANTHER-cSNP tools estimate the functionof coding nsSNPs by calculating the subPSEC (substitution

Computational and Mathematical Methods in Medicine 3

GalNAc-T1 orGALNT1 gene

Database mining

dbSNP-NCBI

nsSNPs

rs ID Allel change Amino acidvariation

SIFT PolyPhen-2 PANTHER SNP effect

Fast SNP

NetsurfP Consurf Clustal Omega Project hope

STRING FT-site KEGG

I-TASSER

GROMOS 96 Raswin Chimera Swiss pdb viewer Patchdock Firedock Swissdock

UniProtFASTA

Figure 1 Schematic representation of computational tools for in silico analysis of GalNAc-T1 gene

position-specific evolutionary conservation) score The pos-sible inputs to run the cSNP tools of PANTHER are theamino acid substitution and protein sequence PANTHERsubPSEC score le minus3 classifies the amino acid substitution asdeleterious or intolerant whereas the score ge minus3 is predictedto be less deleterious [24]

The Functional Analysis throughHiddenMarkovModels(fathmm) was used to predict the functional molecularand phenotypic consequences of protein missense variantsby combining sequence conservation within hidden Markovmodels (HMMs) [28] User input for Fathmm server includesUniProtENSEMBLE ID with amino acid variation predic-tion algorithm and phenotypic associations

25 Analysing the Molecular Phenotypic Effects of nsSNPsby SNPeffect The SNPeffect database 40 (httpsnpeffectswitchlaborg)was used to predict the phenotypic impacts ofnsSNPs of GalNAc-T1 gene This database uses different toolssuch as TANGO(predict aggregation prone regions)WALTZ(simulate amyloidogenic regions) LIMBO (examine hsp70chaperone binding sites) and FoldX (analyze the possibleimpact on protein stability) to annotate both noncoding andcoding SNPs Input options usually consist of a gene list andUniProt ID which are critical for a biological pathway or

function SNPeffect databasemainly focuses on themolecularidentification categorization and explanation of diseasevariants in the human proteome [29]

26 Identification of Functional SNPs in Conserved Regionsby ConSurf Using an empirical Bayesian inference ConSurfweb-server calculates the evolutionary conservation of aminoacid substitution in proteins [30 31] After giving the FASTAsequence ofGalNAc-T1 the conserved regionswere predictedby means of conservation scores and colouring schemedivided into a distinct scale of nine grades starting from themost variable positions (grade 1) coloured turquoise throughintermediately conserved positions (grade 5) coloured whiteto the most conserved positions (grade 9) coloured maroon[32]

27 Characterization of Functional SNPs in Regulatory Regionsby FastSNP Function analysis and selection tool for singlenucleotide polymorphisms or FastSNP (httpFastSNPibmssinicaedutw) was used to analyse the SNPs located inregulatory regions of GalNAc-T1 gene This tool follows thedecision tree principle to extract the functional informationof SNPs from external web-servers and accordingly assign

4 Computational and Mathematical Methods in Medicine

the risk rankings for SNPs from 1 2 3 4 or 5 (which indicatesthe levels of very low low medium high and the very highfunctional impact resp) The input to query FastSNP isusually a gene symbol SNP reference cluster ID (rs ID) or achromosome position After getting query FastSNP providestwo main options (query by candidate gene and query bySNP) to further generate the SNP function report against eachSNP [33]

28 Analysis of SNP Effects on Surface and Solvent Accessibilityof Protein by NetSurfP Solvent accessibility or accessiblesurface area (ASA) of an amino acid helps in locating thepotential active sites in a three-dimensional structure and inan extended tripeptide conformation of proteins (ShandarAhmad et al 2004) Amino acid FASTA sequence ofGalNAc-T1 was submitted to the NetSurfP (httpwwwcbsdtudkservicesNetSurfP) server to predict its secondary structuresurface and solvent accessibility of amino acids This pre-diction method relies on the Z-score which can predict thesurfaces but not secondary structures of proteins There are3 subclasses defined for solvent accessibility of the aminoacids these include buried (low accessibility) partially buried(moderate accessibility) and exposed (high accessibility)[34] Simulating the secondary structure of proteins isessential to analyze the relationship between amino acidsequence and protein structure Hence we simulated thesecondary structure of GalNAc-T1 protein with I-TASSER(httpzhanglabccmbmedumicheduI-TASSER) by excis-ing continuous fragments from threading alignments andfurther refined them using replica-exchanged Monte-CarlosimulationsThe quality of predictionmodels was reflected inthe form of c-scores (minus5 to 2) Secondary structure predictionand solvent accessibility analysis were an intermediate stepprior to predicting the tertiary structure of GalNAc-T1protein

29 Modelling the Molecular Effects of nsSNP on ProteinStructures The nsSNPs can significantly change the stabil-ity of proteins Consequently in order to investigate thestructural deviations and stability differences between nativeand mutant forms of GalNAc-T1 structural analysis wasundertaken based on the results obtained from highest SIFTPolyPhen-2 SNPeffect and PANTHER scores Prediction ofthe 3 dimensional model of GalNAc-T1 protein structurewas done using the Universal Protein Resource (UniProthttpwwwuniprotorg ) I-TASSER (httpzhanglabccmbmedumicheduI-TASSER) andMultisource protein struc-ture threading (MUSTER httpzhanglabccmbmedumicheduMUSTER) web tools However structural visualizationwas performed by the SWISS-PDB viewer and Chimera [3536]

210 Analysis of Structural Specificity of Functional SNPsTo study the structural effects of mutations on GalNAc-T1 protein Project Have yOur Protein Explained (HOPEhttpwwwcmbirunlhopehome) a unique web serverwas used To simulate structural features of mutations onnative proteinmolecule ProjectHOPEuses the 3D structures

of the proteins that are available in Uniprot database and alsoin DAS prediction servers however the HOPE server canalso build homology models independently if necessary TheHOPE sever predicts the structural variation between nativeand mutant residues [37]

211 Prediction of Ligand Binding Sites on Unbound ProteinStructures by FTSite To solve the classic problem asso-ciated with the elucidation of protein structure-functionrelationship protein engineering and drug design proteinbinding sites are considered as hot spot regions especially forsmall ligand molecules FTSite is an accurate computationalmethod based on experimental evidence to determine theligand binding sites with experimental accuracy of 94The input options contain job name PDB ID or file PDBchain IDs and email (optional) to retrieve the bindingsitesresidues within the candidate unbound protein [38]

212 Predictions of Protein-Protein Interactions Protein-protein interaction networks are important to unveil andannotate all functional interaction among cell proteinsFor this current investigation the online database resourceldquoSearch Tool for the Retrieval of Interacting proteinsrdquo(STRING httpstring-dborg) was used This providedunique coverage and ease of access to both experimentaland theoretical interaction evidence of GalNac-T1 The inputoptions for STRING database include protein name pro-tein sequence and multiple sequences STRING databaseis presently equipped with 5214234 proteins belonging to1133 organisms [39] To predict the functional networkingof GalNAc-T1 enzyme we used KEGG (httpwwwgenomejpkegg) PATHWAY and LIGAND and curated the datafor molecular reaction and interaction networks includingmetabolic pathways regulatory pathways and molecularcomplexes for biological systems

213 Molecular Docking for Protein-Ligand Interaction byPatchDock and FireDock To get the structural insight ofunbound protein-ligand interaction PatchDock and Fire-Dock (Fast Interaction REfinement in molecular DOCK-ing) were considered PatchDock algorithm entails objectrecognition and image segmentation techniques to carry outrigid docking through a three-stage filtering process (a)molecular shape representation (b) surface patch matchingand (c) filteringscoring The FireDock method simulta-neously targets the problem of flexibility and scoring ofsolutions to address the refinement problem of protein-protein docking solutions After generating top 1000 modelsthe PatchDock output was redirected to FireDock (as aninput) to produce the top 10 refined structures of associatedGalNAcT protein-ligand [40]

3 Results

31 SNP Analysis The dbSNP-NCBI database search fornonsynonymous and synonymous SNPs located in exonicintronic and UTR regions ofGalNAc-T1 gene revealed a totalof 891 SNPs of which the exonic region is comprised of 18

Computational and Mathematical Methods in Medicine 5

Table 1 List of nsSNPs with rs IDs found to be functionally significant by SIFT Tool

rs IDs Allele change Amino acid change SIFT score SIFT prediction Medianrs143502430 c13GgtA A5T 055 TOLERATED 295rs141224997 c58TgtG L20V 1 TOLERATED 266rs138822379 c73CgtG L25V 007 TOLERATED 267rs199595808 c137AgtG D46G 03 TOLERATED 251rs72964406 c161CgtT P54L 028 TOLERATED 251rs139185162 c213TgtG D71E 1 TOLERATED 238rs150799030 c211GgtC D71H 007 TOLERATED 238rs113616262 c427TgtC S143P 0 DAMAGING 237rs201338228 c646GgtA V216M 014 TOLERATED 237rs199977475 c773GgtT G258V 004 DAMAGING 237rs144282744 c1135AgtC N379H 01 TOLERATED 237rs146565032 c1150AgtG I384V 052 TOLERATED 238rs142387342 c1202GgtA G401D 034 TOLERATED 237rs34304568 c1240TgtC Y414D 0 DAMAGING 237rs151329739 c1318AgtC N440H 008 TOLERATED 237rs142110831 c1528CgtA P510T 016 TOLERATED 237rs145884536 c1622AgtG N541S 045 TOLERATED 238rs200444543 c1657GgtA V553I 039 TOLERATED 266SIFT score le 005 damaging

(2) nsSNPs and 11 (12) sSNPs whereas intronic regionscontained 836 (93) SNPs and mRNAUTR region consistedof 26 (29) SNPs However search for nonsense frameshiftstopgain mutations did not show any results in NCBI-dbSNPWe selected nonsynonymous SNPs from the exonic regionand UTR SNPs (51015840 and 31015840) from the intronic region for ouranalysis

32 Identification of Functional SNPs in Coding RegionsIdentification of functional SNPs was done by predictingthose which substitute the amino acids that are critical forGalNAc-T1 functionThis in silico analysis was performed andvalidated using 4 different computational tools namely SIFTPolyPhen-2 PANTHER and SNPeffect

321 Analysis of Molecular Phenotypic Effects by SIFT Byusing sequence homology SIFT characterizes the effect ofamino acid substitution on protein function From SIFTresults (Table 1) a total of 3 (166) nsSNPs were predicted asdamaging (score of 000ndash004) by SIFT whereas the remain-ing 15 nsSNPs (8333) were simulated to be tolerated (scoreof 008ndash055) The 3 SNPs (rs113616262 rs34304568 andrs199977475) of GalNAc-T1 gene are filtered to be functionalby SIFT

322 Simulation of Functional Consequences by PolyPhen-2 For a given threshold of Naıve Bayes probabilistic scorePolyPhen-2 calculates the true positive rate as a fraction ofcorrectly predicted mutations From 18 nsSNPs of GalNAc-T1 gene only 4 (22) nsSNPs were predicted ldquoprobablydamagingrdquo (score of 096ndash100) whereas 14 (78) wereclassified as benign (score of 0411ndash000) The ranking ofSNPs on the basis of PolyPhen-2 scores enables us to assess

the potential-quantitative effect of SNPs on native protein(Table 2) PolyPhen-2 predicted functional 4 SNPs includingldquors113616262 rs34304568 and rs199977475rdquo and therebyvalidated the results predicted from SIFT

323 Functional Characterization by PANTHER andFathmm PANTHER characterizes likely functional effectof amino acid variation by means of HMM-based statisticalmodeling and evolutionary relationship We performedPANTHER analysis of GalNAc-T1 nsSNPs in order to addanother layer of refinement in SNPs characterization A totalof 4 SNPs possessed the subPSEC score less than minus3 and weretherefore classified as toleratedThe remaining 14 amino acidvariants (D46G P54L D71E D71H V216M G258V N379HI384V G401D Y414D N440H and P510T) were found tobe deleterious with subPSEC score in between minus3 and minus10(Table 2) PANTHER-cSNP tool simulated rs113616262 andrs34304568 with the highest subPEC score of minus74941 andminus917274 respectively

From Fathmm only rs34304568 showed the damagingeffect of amino acid substitution (Y414D) with minus182 score

324 Functional Analysis by SNPeffect SNPeffect databasepredicts the impact of SNPs on aggregation-prone regionsamyloid-forming regions Hsp70 chaperone binding sitesand structural stability of human proteins TANGO analysisshowed that L25V variant increases (dTANGO score is6927) and Y414D decreases (dTANGO score is minus15410)the aggregation tendency of the GalNAc-T1 protein How-ever WALTZ analysis revealed that Y414D (dWALTZ scoreis minus20948) also possessed the ability to decrease pro-tein amyloid propensity None of the variants were pre-dicted to alter the chaperone binding sites for Hsp70

6 Computational and Mathematical Methods in Medicine

Table 2 Characterization of SNPs by using PolyPhen-2 PANTHER and Fathmm based classification systems

Amino acid change PolyPhen-2 PANTHER FathmmA1A2 Score Prediction Specificitysensitivity subPSEC Score Prediction Score PredictionA5T 0411 Benign 090089 minus296529 Tolerated 059 ToleratedL20V 0000 Benign 000100 minus265438 Tolerated 069 ToleratedL25V 0960 Probably damaging 095078 minus345127 Deleterious 035 ToleratedD46G 0004 Benign 059097 minus334504 Deleterious 051 ToleratedP54L 0001 Benign 015099 minus427568 Deleterious 056 ToleratedD71E 0000 Benign 000100 minus340972 Deleterious 068 ToleratedD71H 0132 Benign 086093 minus524027 Deleterious 049 ToleratedS143P 0999 Probably damaging 099014 minus74941 Deleterious 015 ToleratedV216M 0411 Benign 090089 minus556762 Deleterious 026 ToleratedG258V 1000 Probably damaging 100000 minus440932 Deleterious 035 ToleratedN379H 0000 Benign 000100 minus341433 Deleterious minus035 ToleratedI384V 0000 Benign 000100 minus381526 Deleterious minus031 ToleratedG401D 0000 Benign 000100 minus323851 Deleterious 156 ToleratedY414D 1000 Probably damaging 100000 minus917274 Deleterious minus182 DamagingN440H 0048 Benign 083094 minus406922 Deleterious minus117 ToleratedP510T 0009 Benign 077096 minus346487 Deleterious minus109 ToleratedN541S 0000 Benign 000100 minus280107 Tolerated minus103 ToleratedV553I 0000 Benign 000100 minus2409 Tolerated minus098 ToleratedPolyPhen-2 probably damaging (probabilistic score gt 085) possibly damaging (probabilistic score gt 015) and benign (remaining)PANTHER deleterious or intolerant (subPSEC score le minus3) less deleterious (subPSEC score ge minus3)

chaperones When analyzed by FoldX severe reductionin stability was demonstrated by S143P (750Kcalmol)G258V (782Kcalmol) Y414D (603 Kcalmol) variants whileN440H (167 Kcalmol) P510T (225 Kcalmol) and N541S(055 Kcalmol) were accounted to slightly reduce the stabilityof GalNAc-T1 protein

325 Characterization of Functional SNPs by ConcordanceAnalysis The efficacy of functional SNP prediction can bemademore reliable by combining the results of empirical andsupport vector machine (SVM) based approaches Hencewe performed the concordance analysis to get the integratedpicture with SIFT PolyPhen-2 PANTHER and SNPeffecttools Out of the 18 nsSNPs 3 (1666) were predicted tobe ldquodeleteriousrdquo by SIFT whereas this prediction rate wasincreased to 4 (22) as ldquoprobably damagingrdquo when analyzedby PolyPhen-2 5 (2777) as ldquoseverely destabilizingrdquo by SNPeffect and 14 (78) as ldquodeleteriousrdquo by PANTHER Fromconcordance analysis 3 amino acid variants (S143P G258Vand Y414D) were commonly predicted by all 4 of the in silicotools (SIFT PolyPhen-2 PANTHER and SNPeffect)

326 Identification of Functional SNPs in Regulatory andCon-served Regions Using empirical Bayesian inference Con-Surf database characterizes the evolutionary conservation ofamino acids Our ConSurf results showed that only S143 waslocated in highly conserved region and predicted to havefunctional impact on GalNAc-T1 proteinThe remaining twonsSNPs responsible for G258V and Y414D are found in thevicinity of conserved residues and found to be buried in wildtype residues by NetSurfP (Table 3)

FastSNP assigns the risk ranking for SNPs after excisingfunctional effect information From FastSNP results wefound 2 SNPs with rs72964406 (ESE motif CCTCATGscore is 2897886) and rs34304568 (ESE motif GGACCTAGscore is 2088850) located in coding regions and altered ESE(exonic splicing enhancers) motifs indicating that all thesensSNPs may have the ability to affect the level position andtiming of gene expression or regulate the splicing of GalNAc-T1 gene transcript No functional role was predicted for theSNPs which are found be located in 31015840 UTR region

33 AMBER-Energy Minimization Solvent Accessibilityand Electrostatic Interaction Effects of Functional SNPson GalNAc-T1 Protein Based on multiple-threadingalignments I-TASSER builds high-quality 3D structuresof protein molecules from their amino acid sequenceAfter retrieving protein FASTA sequence of GalNAc-T1from UniProt (ID ldquoQ10472rdquo) database protein sequencewas given to I-TASSER as an input The I-TASSER toolcreated the 5 full-length models for GalNAc-T1 protein(with C-scores minus075 minus129 minus189 minus271 and minus283) byexcising top 10 structures with C-scores after targeting thePDB library hits (Table 4) Among the 5 predicted modelsmodel 1 carried the high-quality confidence in the formof C-score (minus075) TM-Score (062 plusmn 014) and RMSD(93plusmn 46A) hence it was selected for further analysis usingChimera Swiss Pdb viewer GROMOS96 and AMBERff99SB software to visualize and compute the total energybefore and after energy minimization The superimposedstructures of mutant and wild-type residues with theirsurface were visualized in Figure 2 The total energy of

Computational and Mathematical Methods in Medicine 7

Table 3 Surface accessibility of wild-type and mutant variants in GalNAc-T1

Amino acid change Class assignment Relative surfaceaccessibility (RSA)

Absolute SurfaceAccessibility prediction

minusminus

minusminus

S143P

G258V

Y414D

Z-fit score for RSA

(a) (b) (c)

Figure 2 Superimposed structures of GalNAc-T1 native (camel color) and mutant (blue color) models to visualize the stereochemicalconformation of wild type and mutant residues at 143 258 and 414 positions

Table 4 Top 10 templates used by I-TASSER to create the high-quality models for human GALNT1 secondary structure

Rank PDB Hit Iden1 Iden2 Cov Norm 119885-score1 2ffuA 045 041 088 3122 2d7iA 043 042 091 7153 2ffuA 045 041 088 8164 2ffuA 045 041 088 5975 2ffuA 045 041 088 4416 2d7iA 043 042 091 7117 2ffuA 045 041 088 7998 2d7rA 043 042 092 6429 1xhbA 099 079 080 30610 2ffuA 045 041 088 621Rank of templates represents the top ten threading templates used by I-TASSER ldquoIdent1rdquo is the percentage sequence identity of the templates in thethreading aligned region with the query sequence ldquoIdent2rdquo is the percentagesequence identity of the whole template chains with query sequence ldquoCovrdquorepresents the coverage of the threading alignment and is equal to thenumber of aligned residues divided by the length of query protein ldquoNorm119885-scorerdquo is the normalized 119885-score of the threading alignments Alignmentwith a Normalized 119885-score gt 1 mean a good alignment and vice versa Thetop 10 alignments reported above (in order of their ranking) are taken fromthe following threading programs 1 MUSTER 2 HHSEARCH 3 SP3 4PROSPECT2 5 PPA-I 6 HHSEARCH I 7 SPARKS 8 SAMT99 9 MUSTER10 HHSEARCH

native structure was found to be 13294691 kJmol before theenergyminimization stage and it was minus13882539 kJmol afterthe energy minimization step whereas mutant structures

Table 5 Total energy of native and mutant structures before andafter energy minimization

Amino acidvariants

Total energy beforeenergy minimization

(kJmol)

Total energy afterenergy minimization

(kJmol)Native 13140137 minus13882539S143P 13392757 minus13767374G258V 59777910 minus10727144Y414D 12995687 minus13918198

exhibited deviation in total energy value calculated beforeand after energy minimization (Table 5) Among 3 screenedmutations G258V showed the increase in total energy ofGalNAc-T1 before (597779100 kJmol) and after energyminimization (minus10727144 kJmol) Chimera and Swiss PDBviewer were used to visualize the structural features of aminoacids in native and mutant protein chains During structuralvisualization for all 3 mutations only mutant residue (valine)at 258 position showed a network of clashes with Tyr268(Figure 3) Additionally S143P G258V and Y414D variantswere analyzed for solvent accessibility (in form of Z-score)and stability and a decrease in both parameters was observedfor all three variants (Table 3)

34 The Structural Impacts of Functional GalNAc-T1Mutations Project HOPE simulates the structural featuresof amino acid substitution on native protein structure

8 Computational and Mathematical Methods in Medicine

(a)

(b)

(c)

Figure 3 H-bonding (green discontinuous line) interactions and clashes (pink discontinuous line) of wild type and mutant analogues withthe vicinal amino acid residues (a) Ser143 is examined with single H-bonding with Val139 and converted into clash as a result of Pro atthe same position (b) At 258 position one H-bond is observed with Arg266 in both native (Gly258) and mutant (Val258) structures but anetwork of clashes appeared between Val258 and Tyr268 (c) Tyr414 is visualized with five H-bonding interactions for Pro410 and Asn417and one H-bond is distorted due to appearance of mutant aspartic acid at the same position

Computational and Mathematical Methods in Medicine 9

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

lowast lowastlowast lowastlowastlowastlowastlowast lowastlowast

lowast lowast lowastlowastlowast lowastlowast

lowast

lowast

lowast lowast

lowastlowastlowastlowast lowast lowast lowastlowast

lowast lowast lowastlowastlowast lowast lowast lowastlowastlowastlowastlowast lowast lowastlowast lowast lowast lowastlowast lowast lowastlowast

lowast lowastlowastlowast lowast lowast lowastlowast lowast lowast143 Catalytic subdomain A

258 Catalytic subdomain B

414

Ricin B-type lectin

Figure 4 Multiple sequence alignments and evolutionary conservation behaviour among human GalNAc-T1 to GalNAc-T10 members ofGALNTs family S143 and Y414 residues are found conserved in the catalytic subdomain A and linker B domain (area between catalyticsubdomain B and Ricin B-type lectin) respectively G258 is observed in the vicinity of conservation groups with strongly similar properties(shown with ldquordquo)

The S143 residue is located within a stretch of residuesannotated in Uniprot as catalytic domain A (Figure 4 fromClustal Omega) The hydrophobicity and size differencebetween wild-type and mutant residue (Pro) can disrupt theH-bonding interactions with the neighbouring residues andhence the protein framework The wild-type residue formsa hydrogen bond with the valine on position 139 Due tohigh rigidity of proline moiety this mutation might abolishthe required flexibility of the protein at this position andcould affect the catalytic tendency Based on conservationbehaviour this mutation is probably damaging to the protein

For G258V the wild-type residue is resided in thelinker region between catalytic subdomains A and B andcan distract the required flexibility of core structure bysubstituting Gly258 with valine Additionally the structuralanalysis ofVal258 showed some clashes for Tyr268whichmaycontribute to the extra energy in the protein structure andhence the decrease in stability

In theY414Dvariant Tyr414 indicated fiveH-interactionswith Pro410 Phe411 and Asn417 whereas mutant Asp414exhibited the four H-bonding interactions in the core of theprotein (or protein complex) due to the differences in charge

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 2: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

2 Computational and Mathematical Methods in Medicine

These series of glycosyltransferases act together in elongat-ing the O-linked glycan to produce a variety of denselyO-glycosylated mucin glycoproteins Mucin constitutes theprinciple component of mucus that defends the epithelialcell surfaces against infectious and environmental agents [56] Additionally mucin-like glycans also serve as receptor-binding ligands during an inflammatory response [7]

The GalNAc-Ts (GALNTs) are classified into 27 familymembers based on their sequence and structural similaritiesIn total 20 human GalNAc-Ts gene entries are made up-to-date Most of the GalNAc-T genes such as GalNAc-T1GalNAc-T3 GalNAc-T4 GalNAc-T5 GalNAc-T6 GalNAc-T7 GalNAc-T9 GalNAc-T12 GalNAc-T13 and GalNac-T14are important in determining the peptide and glycopeptidesubstrate specificities Analysis of intronexon positioning inhuman fish fly and worm supports the model that GalNAc-Ts evolved from a common ancestral gene Furthermoreseveral studies have provided the evidence that predictedorthologs of GalNAc-Ts genes in human mouse and flyrepresent true functional orthologs [4 8 9]

Over the past few years there has been considerableinterest in linking the genetic variations of GalNAc-Ts todisease susceptibility in humans For example rs17647532in GalNAc-T1 is known to influence the risk of epithelialovarian cancers [10] rs142046356 in GalNAc-T2 is describedto play a role in elevating the HDLc levels in some families[11] rs6710518 in GalNAc-T3 is strongly associated with bonemineral density and fracture risk [12] and rs52822348 inGalNAc-T4 increases the risk of acute coronary disease [13]Additionally genome wide studies have also revealed thebiochemically inactivating germ line and somatic mutationsin GalNAc-T12 genes [14]

In recent years computational approaches have beenextensively used to identify the impact of deleterious nsSNPsin candidate genes by using information such as conservationof sequences across species [15] structural attributes [16]and physicochemical properties of polypeptides [17 18]By adopting computational algorithms some studies havesuccessfully classified highly functional SNPs out of a hugepool of disease susceptible SNPs of BRCA1 gene ATMgene [19] and BRAF gene [20] based on their structuraland functional consequences Despite the availability ofconvincing data indicating the wide involvement of GalNAc-T1 genemutations in human diseases computational analysisof coding SNPs (nsSNPs and sSNPs in exonic regions) andnoncoding SNPs (Intronic exonic 51015840 and 31015840 UTR SNPs) incandidate GalNAc-Ts genes still remains unexplored

Hence in order to characterize the deleterious mutationsin candidate GalNAc-Ts genes the GalNAc-T1 gene which isknown to be expressed at high levels and is found broadlyamong most tissues and cell types [21 22] was chosen for insilico analysis in the current investigation Our experimentalstrategy (Figure 1) involved (i) retrieval of SNPs in GalNAc-T1 gene from public databases (ii) classifying the deleteriousnsSNPs to that of phenotypes based on sequence andstructure-based homology analyses and characterising theregulatory nsSNPs that alter the splicing and gene expressionpatterns (iii) predicting the precise effects of amino acid sub-stitutions on secondary structures by means of stability and

solvent accessibility simulations studies and (iv) analysingthe impact of amino acid substitutions caused by nsSNPson GalNAc-T1 interactions with other proteins in a networkThe in silico approaches adopted in this investigation offeradvantages over the experimental based ones due to theirconvenience reliability speed and of lower cost in order tolocate the amino acid variants that regulate the function ofGalNAc-T1 protein

2 Methodology

21 SNP Data Mining The data on the human GalNAc-T1(GALNT1) gene was collected from web-based data sourcessuch as Online Mendelian Inheritance in Man (OMIMhttpwwwncbinlmnihgovomim) and the National Cen-ter for Biological Information (httpwwwncbinlmnihgov) Details of the SNPs (mRNA accession number refer-ence and assay IDrsquos) and protein sequence ofGalNAc-T1 genewere retrieved from the dbSNP-NCBI and SwissProt

22 Analysis of Functional Consequences of nsSNPs by SIFTSorting Intolerant from Tolerant (SIFT httpsiftjcviorg)predicts the tolerated and deleterious SNPs in order toidentify the impact of amino acid substitution on phenotypicand functional changes of protein molecules In the currentinvestigation the identification numbers (rs IDs) of each SNPof GalNAc-T1 gene obtained from NCBI were submitted as aquery sequence to SIFT for homology searching The SIFTvalue le 005 indicates the deleterious effect of nonsynony-mous variants on protein function [23 24]

23 Simulation of Functional Consequences of nsSNPs byPolyPhen-2 Polymorphism Phenotyping-2 or PolyPhen-2(httpgeneticsbwhharvardedupph2) is a probabilisticclassifier which calculates functional significance of an allele-change by Naıve Bayes a set of supervised learning algo-rithms A mutation is evaluated qualitatively as probablydamaging (probabilistic score gt 085) possibly damaging(probabilistic score gt 015) and benign (remaining) cor-responding to the pairs of false positive rate (FPR) andtrue positive rate (TPR) thresholds adjusted separately forHumDiv (10 amp 18 FPR) and HumVar (19 amp 40FPR for probably and possibly damaging mutations resp)This tool was used to study the possible impacts of aminoacid substitutions on the function of candidate GalNAc-T1 protein Input options for PolyPhen-2 are comprised ofUniProt accession numberFASTA sequence and detail ofamino acids substitution [25 26]

24 Characterization of Functional nsSNPs by PANTHERand Fathmm The Protein ANalysis THrough EvolutionaryRelationships (PANTHER httpwwwpantherdborg) clas-sification system was used to characterize the functionalnsSNPs in GalNAc-T1 gene using HMM-based statisticalmodelling and evolutionary relationships in the proteinfamily [27] PANTHER-cSNP tools estimate the functionof coding nsSNPs by calculating the subPSEC (substitution

Computational and Mathematical Methods in Medicine 3

GalNAc-T1 orGALNT1 gene

Database mining

dbSNP-NCBI

nsSNPs

rs ID Allel change Amino acidvariation

SIFT PolyPhen-2 PANTHER SNP effect

Fast SNP

NetsurfP Consurf Clustal Omega Project hope

STRING FT-site KEGG

I-TASSER

GROMOS 96 Raswin Chimera Swiss pdb viewer Patchdock Firedock Swissdock

UniProtFASTA

Figure 1 Schematic representation of computational tools for in silico analysis of GalNAc-T1 gene

position-specific evolutionary conservation) score The pos-sible inputs to run the cSNP tools of PANTHER are theamino acid substitution and protein sequence PANTHERsubPSEC score le minus3 classifies the amino acid substitution asdeleterious or intolerant whereas the score ge minus3 is predictedto be less deleterious [24]

The Functional Analysis throughHiddenMarkovModels(fathmm) was used to predict the functional molecularand phenotypic consequences of protein missense variantsby combining sequence conservation within hidden Markovmodels (HMMs) [28] User input for Fathmm server includesUniProtENSEMBLE ID with amino acid variation predic-tion algorithm and phenotypic associations

25 Analysing the Molecular Phenotypic Effects of nsSNPsby SNPeffect The SNPeffect database 40 (httpsnpeffectswitchlaborg)was used to predict the phenotypic impacts ofnsSNPs of GalNAc-T1 gene This database uses different toolssuch as TANGO(predict aggregation prone regions)WALTZ(simulate amyloidogenic regions) LIMBO (examine hsp70chaperone binding sites) and FoldX (analyze the possibleimpact on protein stability) to annotate both noncoding andcoding SNPs Input options usually consist of a gene list andUniProt ID which are critical for a biological pathway or

function SNPeffect databasemainly focuses on themolecularidentification categorization and explanation of diseasevariants in the human proteome [29]

26 Identification of Functional SNPs in Conserved Regionsby ConSurf Using an empirical Bayesian inference ConSurfweb-server calculates the evolutionary conservation of aminoacid substitution in proteins [30 31] After giving the FASTAsequence ofGalNAc-T1 the conserved regionswere predictedby means of conservation scores and colouring schemedivided into a distinct scale of nine grades starting from themost variable positions (grade 1) coloured turquoise throughintermediately conserved positions (grade 5) coloured whiteto the most conserved positions (grade 9) coloured maroon[32]

27 Characterization of Functional SNPs in Regulatory Regionsby FastSNP Function analysis and selection tool for singlenucleotide polymorphisms or FastSNP (httpFastSNPibmssinicaedutw) was used to analyse the SNPs located inregulatory regions of GalNAc-T1 gene This tool follows thedecision tree principle to extract the functional informationof SNPs from external web-servers and accordingly assign

4 Computational and Mathematical Methods in Medicine

the risk rankings for SNPs from 1 2 3 4 or 5 (which indicatesthe levels of very low low medium high and the very highfunctional impact resp) The input to query FastSNP isusually a gene symbol SNP reference cluster ID (rs ID) or achromosome position After getting query FastSNP providestwo main options (query by candidate gene and query bySNP) to further generate the SNP function report against eachSNP [33]

28 Analysis of SNP Effects on Surface and Solvent Accessibilityof Protein by NetSurfP Solvent accessibility or accessiblesurface area (ASA) of an amino acid helps in locating thepotential active sites in a three-dimensional structure and inan extended tripeptide conformation of proteins (ShandarAhmad et al 2004) Amino acid FASTA sequence ofGalNAc-T1 was submitted to the NetSurfP (httpwwwcbsdtudkservicesNetSurfP) server to predict its secondary structuresurface and solvent accessibility of amino acids This pre-diction method relies on the Z-score which can predict thesurfaces but not secondary structures of proteins There are3 subclasses defined for solvent accessibility of the aminoacids these include buried (low accessibility) partially buried(moderate accessibility) and exposed (high accessibility)[34] Simulating the secondary structure of proteins isessential to analyze the relationship between amino acidsequence and protein structure Hence we simulated thesecondary structure of GalNAc-T1 protein with I-TASSER(httpzhanglabccmbmedumicheduI-TASSER) by excis-ing continuous fragments from threading alignments andfurther refined them using replica-exchanged Monte-CarlosimulationsThe quality of predictionmodels was reflected inthe form of c-scores (minus5 to 2) Secondary structure predictionand solvent accessibility analysis were an intermediate stepprior to predicting the tertiary structure of GalNAc-T1protein

29 Modelling the Molecular Effects of nsSNP on ProteinStructures The nsSNPs can significantly change the stabil-ity of proteins Consequently in order to investigate thestructural deviations and stability differences between nativeand mutant forms of GalNAc-T1 structural analysis wasundertaken based on the results obtained from highest SIFTPolyPhen-2 SNPeffect and PANTHER scores Prediction ofthe 3 dimensional model of GalNAc-T1 protein structurewas done using the Universal Protein Resource (UniProthttpwwwuniprotorg ) I-TASSER (httpzhanglabccmbmedumicheduI-TASSER) andMultisource protein struc-ture threading (MUSTER httpzhanglabccmbmedumicheduMUSTER) web tools However structural visualizationwas performed by the SWISS-PDB viewer and Chimera [3536]

210 Analysis of Structural Specificity of Functional SNPsTo study the structural effects of mutations on GalNAc-T1 protein Project Have yOur Protein Explained (HOPEhttpwwwcmbirunlhopehome) a unique web serverwas used To simulate structural features of mutations onnative proteinmolecule ProjectHOPEuses the 3D structures

of the proteins that are available in Uniprot database and alsoin DAS prediction servers however the HOPE server canalso build homology models independently if necessary TheHOPE sever predicts the structural variation between nativeand mutant residues [37]

211 Prediction of Ligand Binding Sites on Unbound ProteinStructures by FTSite To solve the classic problem asso-ciated with the elucidation of protein structure-functionrelationship protein engineering and drug design proteinbinding sites are considered as hot spot regions especially forsmall ligand molecules FTSite is an accurate computationalmethod based on experimental evidence to determine theligand binding sites with experimental accuracy of 94The input options contain job name PDB ID or file PDBchain IDs and email (optional) to retrieve the bindingsitesresidues within the candidate unbound protein [38]

212 Predictions of Protein-Protein Interactions Protein-protein interaction networks are important to unveil andannotate all functional interaction among cell proteinsFor this current investigation the online database resourceldquoSearch Tool for the Retrieval of Interacting proteinsrdquo(STRING httpstring-dborg) was used This providedunique coverage and ease of access to both experimentaland theoretical interaction evidence of GalNac-T1 The inputoptions for STRING database include protein name pro-tein sequence and multiple sequences STRING databaseis presently equipped with 5214234 proteins belonging to1133 organisms [39] To predict the functional networkingof GalNAc-T1 enzyme we used KEGG (httpwwwgenomejpkegg) PATHWAY and LIGAND and curated the datafor molecular reaction and interaction networks includingmetabolic pathways regulatory pathways and molecularcomplexes for biological systems

213 Molecular Docking for Protein-Ligand Interaction byPatchDock and FireDock To get the structural insight ofunbound protein-ligand interaction PatchDock and Fire-Dock (Fast Interaction REfinement in molecular DOCK-ing) were considered PatchDock algorithm entails objectrecognition and image segmentation techniques to carry outrigid docking through a three-stage filtering process (a)molecular shape representation (b) surface patch matchingand (c) filteringscoring The FireDock method simulta-neously targets the problem of flexibility and scoring ofsolutions to address the refinement problem of protein-protein docking solutions After generating top 1000 modelsthe PatchDock output was redirected to FireDock (as aninput) to produce the top 10 refined structures of associatedGalNAcT protein-ligand [40]

3 Results

31 SNP Analysis The dbSNP-NCBI database search fornonsynonymous and synonymous SNPs located in exonicintronic and UTR regions ofGalNAc-T1 gene revealed a totalof 891 SNPs of which the exonic region is comprised of 18

Computational and Mathematical Methods in Medicine 5

Table 1 List of nsSNPs with rs IDs found to be functionally significant by SIFT Tool

rs IDs Allele change Amino acid change SIFT score SIFT prediction Medianrs143502430 c13GgtA A5T 055 TOLERATED 295rs141224997 c58TgtG L20V 1 TOLERATED 266rs138822379 c73CgtG L25V 007 TOLERATED 267rs199595808 c137AgtG D46G 03 TOLERATED 251rs72964406 c161CgtT P54L 028 TOLERATED 251rs139185162 c213TgtG D71E 1 TOLERATED 238rs150799030 c211GgtC D71H 007 TOLERATED 238rs113616262 c427TgtC S143P 0 DAMAGING 237rs201338228 c646GgtA V216M 014 TOLERATED 237rs199977475 c773GgtT G258V 004 DAMAGING 237rs144282744 c1135AgtC N379H 01 TOLERATED 237rs146565032 c1150AgtG I384V 052 TOLERATED 238rs142387342 c1202GgtA G401D 034 TOLERATED 237rs34304568 c1240TgtC Y414D 0 DAMAGING 237rs151329739 c1318AgtC N440H 008 TOLERATED 237rs142110831 c1528CgtA P510T 016 TOLERATED 237rs145884536 c1622AgtG N541S 045 TOLERATED 238rs200444543 c1657GgtA V553I 039 TOLERATED 266SIFT score le 005 damaging

(2) nsSNPs and 11 (12) sSNPs whereas intronic regionscontained 836 (93) SNPs and mRNAUTR region consistedof 26 (29) SNPs However search for nonsense frameshiftstopgain mutations did not show any results in NCBI-dbSNPWe selected nonsynonymous SNPs from the exonic regionand UTR SNPs (51015840 and 31015840) from the intronic region for ouranalysis

32 Identification of Functional SNPs in Coding RegionsIdentification of functional SNPs was done by predictingthose which substitute the amino acids that are critical forGalNAc-T1 functionThis in silico analysis was performed andvalidated using 4 different computational tools namely SIFTPolyPhen-2 PANTHER and SNPeffect

321 Analysis of Molecular Phenotypic Effects by SIFT Byusing sequence homology SIFT characterizes the effect ofamino acid substitution on protein function From SIFTresults (Table 1) a total of 3 (166) nsSNPs were predicted asdamaging (score of 000ndash004) by SIFT whereas the remain-ing 15 nsSNPs (8333) were simulated to be tolerated (scoreof 008ndash055) The 3 SNPs (rs113616262 rs34304568 andrs199977475) of GalNAc-T1 gene are filtered to be functionalby SIFT

322 Simulation of Functional Consequences by PolyPhen-2 For a given threshold of Naıve Bayes probabilistic scorePolyPhen-2 calculates the true positive rate as a fraction ofcorrectly predicted mutations From 18 nsSNPs of GalNAc-T1 gene only 4 (22) nsSNPs were predicted ldquoprobablydamagingrdquo (score of 096ndash100) whereas 14 (78) wereclassified as benign (score of 0411ndash000) The ranking ofSNPs on the basis of PolyPhen-2 scores enables us to assess

the potential-quantitative effect of SNPs on native protein(Table 2) PolyPhen-2 predicted functional 4 SNPs includingldquors113616262 rs34304568 and rs199977475rdquo and therebyvalidated the results predicted from SIFT

323 Functional Characterization by PANTHER andFathmm PANTHER characterizes likely functional effectof amino acid variation by means of HMM-based statisticalmodeling and evolutionary relationship We performedPANTHER analysis of GalNAc-T1 nsSNPs in order to addanother layer of refinement in SNPs characterization A totalof 4 SNPs possessed the subPSEC score less than minus3 and weretherefore classified as toleratedThe remaining 14 amino acidvariants (D46G P54L D71E D71H V216M G258V N379HI384V G401D Y414D N440H and P510T) were found tobe deleterious with subPSEC score in between minus3 and minus10(Table 2) PANTHER-cSNP tool simulated rs113616262 andrs34304568 with the highest subPEC score of minus74941 andminus917274 respectively

From Fathmm only rs34304568 showed the damagingeffect of amino acid substitution (Y414D) with minus182 score

324 Functional Analysis by SNPeffect SNPeffect databasepredicts the impact of SNPs on aggregation-prone regionsamyloid-forming regions Hsp70 chaperone binding sitesand structural stability of human proteins TANGO analysisshowed that L25V variant increases (dTANGO score is6927) and Y414D decreases (dTANGO score is minus15410)the aggregation tendency of the GalNAc-T1 protein How-ever WALTZ analysis revealed that Y414D (dWALTZ scoreis minus20948) also possessed the ability to decrease pro-tein amyloid propensity None of the variants were pre-dicted to alter the chaperone binding sites for Hsp70

6 Computational and Mathematical Methods in Medicine

Table 2 Characterization of SNPs by using PolyPhen-2 PANTHER and Fathmm based classification systems

Amino acid change PolyPhen-2 PANTHER FathmmA1A2 Score Prediction Specificitysensitivity subPSEC Score Prediction Score PredictionA5T 0411 Benign 090089 minus296529 Tolerated 059 ToleratedL20V 0000 Benign 000100 minus265438 Tolerated 069 ToleratedL25V 0960 Probably damaging 095078 minus345127 Deleterious 035 ToleratedD46G 0004 Benign 059097 minus334504 Deleterious 051 ToleratedP54L 0001 Benign 015099 minus427568 Deleterious 056 ToleratedD71E 0000 Benign 000100 minus340972 Deleterious 068 ToleratedD71H 0132 Benign 086093 minus524027 Deleterious 049 ToleratedS143P 0999 Probably damaging 099014 minus74941 Deleterious 015 ToleratedV216M 0411 Benign 090089 minus556762 Deleterious 026 ToleratedG258V 1000 Probably damaging 100000 minus440932 Deleterious 035 ToleratedN379H 0000 Benign 000100 minus341433 Deleterious minus035 ToleratedI384V 0000 Benign 000100 minus381526 Deleterious minus031 ToleratedG401D 0000 Benign 000100 minus323851 Deleterious 156 ToleratedY414D 1000 Probably damaging 100000 minus917274 Deleterious minus182 DamagingN440H 0048 Benign 083094 minus406922 Deleterious minus117 ToleratedP510T 0009 Benign 077096 minus346487 Deleterious minus109 ToleratedN541S 0000 Benign 000100 minus280107 Tolerated minus103 ToleratedV553I 0000 Benign 000100 minus2409 Tolerated minus098 ToleratedPolyPhen-2 probably damaging (probabilistic score gt 085) possibly damaging (probabilistic score gt 015) and benign (remaining)PANTHER deleterious or intolerant (subPSEC score le minus3) less deleterious (subPSEC score ge minus3)

chaperones When analyzed by FoldX severe reductionin stability was demonstrated by S143P (750Kcalmol)G258V (782Kcalmol) Y414D (603 Kcalmol) variants whileN440H (167 Kcalmol) P510T (225 Kcalmol) and N541S(055 Kcalmol) were accounted to slightly reduce the stabilityof GalNAc-T1 protein

325 Characterization of Functional SNPs by ConcordanceAnalysis The efficacy of functional SNP prediction can bemademore reliable by combining the results of empirical andsupport vector machine (SVM) based approaches Hencewe performed the concordance analysis to get the integratedpicture with SIFT PolyPhen-2 PANTHER and SNPeffecttools Out of the 18 nsSNPs 3 (1666) were predicted tobe ldquodeleteriousrdquo by SIFT whereas this prediction rate wasincreased to 4 (22) as ldquoprobably damagingrdquo when analyzedby PolyPhen-2 5 (2777) as ldquoseverely destabilizingrdquo by SNPeffect and 14 (78) as ldquodeleteriousrdquo by PANTHER Fromconcordance analysis 3 amino acid variants (S143P G258Vand Y414D) were commonly predicted by all 4 of the in silicotools (SIFT PolyPhen-2 PANTHER and SNPeffect)

326 Identification of Functional SNPs in Regulatory andCon-served Regions Using empirical Bayesian inference Con-Surf database characterizes the evolutionary conservation ofamino acids Our ConSurf results showed that only S143 waslocated in highly conserved region and predicted to havefunctional impact on GalNAc-T1 proteinThe remaining twonsSNPs responsible for G258V and Y414D are found in thevicinity of conserved residues and found to be buried in wildtype residues by NetSurfP (Table 3)

FastSNP assigns the risk ranking for SNPs after excisingfunctional effect information From FastSNP results wefound 2 SNPs with rs72964406 (ESE motif CCTCATGscore is 2897886) and rs34304568 (ESE motif GGACCTAGscore is 2088850) located in coding regions and altered ESE(exonic splicing enhancers) motifs indicating that all thesensSNPs may have the ability to affect the level position andtiming of gene expression or regulate the splicing of GalNAc-T1 gene transcript No functional role was predicted for theSNPs which are found be located in 31015840 UTR region

33 AMBER-Energy Minimization Solvent Accessibilityand Electrostatic Interaction Effects of Functional SNPson GalNAc-T1 Protein Based on multiple-threadingalignments I-TASSER builds high-quality 3D structuresof protein molecules from their amino acid sequenceAfter retrieving protein FASTA sequence of GalNAc-T1from UniProt (ID ldquoQ10472rdquo) database protein sequencewas given to I-TASSER as an input The I-TASSER toolcreated the 5 full-length models for GalNAc-T1 protein(with C-scores minus075 minus129 minus189 minus271 and minus283) byexcising top 10 structures with C-scores after targeting thePDB library hits (Table 4) Among the 5 predicted modelsmodel 1 carried the high-quality confidence in the formof C-score (minus075) TM-Score (062 plusmn 014) and RMSD(93plusmn 46A) hence it was selected for further analysis usingChimera Swiss Pdb viewer GROMOS96 and AMBERff99SB software to visualize and compute the total energybefore and after energy minimization The superimposedstructures of mutant and wild-type residues with theirsurface were visualized in Figure 2 The total energy of

Computational and Mathematical Methods in Medicine 7

Table 3 Surface accessibility of wild-type and mutant variants in GalNAc-T1

Amino acid change Class assignment Relative surfaceaccessibility (RSA)

Absolute SurfaceAccessibility prediction

minusminus

minusminus

S143P

G258V

Y414D

Z-fit score for RSA

(a) (b) (c)

Figure 2 Superimposed structures of GalNAc-T1 native (camel color) and mutant (blue color) models to visualize the stereochemicalconformation of wild type and mutant residues at 143 258 and 414 positions

Table 4 Top 10 templates used by I-TASSER to create the high-quality models for human GALNT1 secondary structure

Rank PDB Hit Iden1 Iden2 Cov Norm 119885-score1 2ffuA 045 041 088 3122 2d7iA 043 042 091 7153 2ffuA 045 041 088 8164 2ffuA 045 041 088 5975 2ffuA 045 041 088 4416 2d7iA 043 042 091 7117 2ffuA 045 041 088 7998 2d7rA 043 042 092 6429 1xhbA 099 079 080 30610 2ffuA 045 041 088 621Rank of templates represents the top ten threading templates used by I-TASSER ldquoIdent1rdquo is the percentage sequence identity of the templates in thethreading aligned region with the query sequence ldquoIdent2rdquo is the percentagesequence identity of the whole template chains with query sequence ldquoCovrdquorepresents the coverage of the threading alignment and is equal to thenumber of aligned residues divided by the length of query protein ldquoNorm119885-scorerdquo is the normalized 119885-score of the threading alignments Alignmentwith a Normalized 119885-score gt 1 mean a good alignment and vice versa Thetop 10 alignments reported above (in order of their ranking) are taken fromthe following threading programs 1 MUSTER 2 HHSEARCH 3 SP3 4PROSPECT2 5 PPA-I 6 HHSEARCH I 7 SPARKS 8 SAMT99 9 MUSTER10 HHSEARCH

native structure was found to be 13294691 kJmol before theenergyminimization stage and it was minus13882539 kJmol afterthe energy minimization step whereas mutant structures

Table 5 Total energy of native and mutant structures before andafter energy minimization

Amino acidvariants

Total energy beforeenergy minimization

(kJmol)

Total energy afterenergy minimization

(kJmol)Native 13140137 minus13882539S143P 13392757 minus13767374G258V 59777910 minus10727144Y414D 12995687 minus13918198

exhibited deviation in total energy value calculated beforeand after energy minimization (Table 5) Among 3 screenedmutations G258V showed the increase in total energy ofGalNAc-T1 before (597779100 kJmol) and after energyminimization (minus10727144 kJmol) Chimera and Swiss PDBviewer were used to visualize the structural features of aminoacids in native and mutant protein chains During structuralvisualization for all 3 mutations only mutant residue (valine)at 258 position showed a network of clashes with Tyr268(Figure 3) Additionally S143P G258V and Y414D variantswere analyzed for solvent accessibility (in form of Z-score)and stability and a decrease in both parameters was observedfor all three variants (Table 3)

34 The Structural Impacts of Functional GalNAc-T1Mutations Project HOPE simulates the structural featuresof amino acid substitution on native protein structure

8 Computational and Mathematical Methods in Medicine

(a)

(b)

(c)

Figure 3 H-bonding (green discontinuous line) interactions and clashes (pink discontinuous line) of wild type and mutant analogues withthe vicinal amino acid residues (a) Ser143 is examined with single H-bonding with Val139 and converted into clash as a result of Pro atthe same position (b) At 258 position one H-bond is observed with Arg266 in both native (Gly258) and mutant (Val258) structures but anetwork of clashes appeared between Val258 and Tyr268 (c) Tyr414 is visualized with five H-bonding interactions for Pro410 and Asn417and one H-bond is distorted due to appearance of mutant aspartic acid at the same position

Computational and Mathematical Methods in Medicine 9

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

lowast lowastlowast lowastlowastlowastlowastlowast lowastlowast

lowast lowast lowastlowastlowast lowastlowast

lowast

lowast

lowast lowast

lowastlowastlowastlowast lowast lowast lowastlowast

lowast lowast lowastlowastlowast lowast lowast lowastlowastlowastlowastlowast lowast lowastlowast lowast lowast lowastlowast lowast lowastlowast

lowast lowastlowastlowast lowast lowast lowastlowast lowast lowast143 Catalytic subdomain A

258 Catalytic subdomain B

414

Ricin B-type lectin

Figure 4 Multiple sequence alignments and evolutionary conservation behaviour among human GalNAc-T1 to GalNAc-T10 members ofGALNTs family S143 and Y414 residues are found conserved in the catalytic subdomain A and linker B domain (area between catalyticsubdomain B and Ricin B-type lectin) respectively G258 is observed in the vicinity of conservation groups with strongly similar properties(shown with ldquordquo)

The S143 residue is located within a stretch of residuesannotated in Uniprot as catalytic domain A (Figure 4 fromClustal Omega) The hydrophobicity and size differencebetween wild-type and mutant residue (Pro) can disrupt theH-bonding interactions with the neighbouring residues andhence the protein framework The wild-type residue formsa hydrogen bond with the valine on position 139 Due tohigh rigidity of proline moiety this mutation might abolishthe required flexibility of the protein at this position andcould affect the catalytic tendency Based on conservationbehaviour this mutation is probably damaging to the protein

For G258V the wild-type residue is resided in thelinker region between catalytic subdomains A and B andcan distract the required flexibility of core structure bysubstituting Gly258 with valine Additionally the structuralanalysis ofVal258 showed some clashes for Tyr268whichmaycontribute to the extra energy in the protein structure andhence the decrease in stability

In theY414Dvariant Tyr414 indicated fiveH-interactionswith Pro410 Phe411 and Asn417 whereas mutant Asp414exhibited the four H-bonding interactions in the core of theprotein (or protein complex) due to the differences in charge

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 3: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

Computational and Mathematical Methods in Medicine 3

GalNAc-T1 orGALNT1 gene

Database mining

dbSNP-NCBI

nsSNPs

rs ID Allel change Amino acidvariation

SIFT PolyPhen-2 PANTHER SNP effect

Fast SNP

NetsurfP Consurf Clustal Omega Project hope

STRING FT-site KEGG

I-TASSER

GROMOS 96 Raswin Chimera Swiss pdb viewer Patchdock Firedock Swissdock

UniProtFASTA

Figure 1 Schematic representation of computational tools for in silico analysis of GalNAc-T1 gene

position-specific evolutionary conservation) score The pos-sible inputs to run the cSNP tools of PANTHER are theamino acid substitution and protein sequence PANTHERsubPSEC score le minus3 classifies the amino acid substitution asdeleterious or intolerant whereas the score ge minus3 is predictedto be less deleterious [24]

The Functional Analysis throughHiddenMarkovModels(fathmm) was used to predict the functional molecularand phenotypic consequences of protein missense variantsby combining sequence conservation within hidden Markovmodels (HMMs) [28] User input for Fathmm server includesUniProtENSEMBLE ID with amino acid variation predic-tion algorithm and phenotypic associations

25 Analysing the Molecular Phenotypic Effects of nsSNPsby SNPeffect The SNPeffect database 40 (httpsnpeffectswitchlaborg)was used to predict the phenotypic impacts ofnsSNPs of GalNAc-T1 gene This database uses different toolssuch as TANGO(predict aggregation prone regions)WALTZ(simulate amyloidogenic regions) LIMBO (examine hsp70chaperone binding sites) and FoldX (analyze the possibleimpact on protein stability) to annotate both noncoding andcoding SNPs Input options usually consist of a gene list andUniProt ID which are critical for a biological pathway or

function SNPeffect databasemainly focuses on themolecularidentification categorization and explanation of diseasevariants in the human proteome [29]

26 Identification of Functional SNPs in Conserved Regionsby ConSurf Using an empirical Bayesian inference ConSurfweb-server calculates the evolutionary conservation of aminoacid substitution in proteins [30 31] After giving the FASTAsequence ofGalNAc-T1 the conserved regionswere predictedby means of conservation scores and colouring schemedivided into a distinct scale of nine grades starting from themost variable positions (grade 1) coloured turquoise throughintermediately conserved positions (grade 5) coloured whiteto the most conserved positions (grade 9) coloured maroon[32]

27 Characterization of Functional SNPs in Regulatory Regionsby FastSNP Function analysis and selection tool for singlenucleotide polymorphisms or FastSNP (httpFastSNPibmssinicaedutw) was used to analyse the SNPs located inregulatory regions of GalNAc-T1 gene This tool follows thedecision tree principle to extract the functional informationof SNPs from external web-servers and accordingly assign

4 Computational and Mathematical Methods in Medicine

the risk rankings for SNPs from 1 2 3 4 or 5 (which indicatesthe levels of very low low medium high and the very highfunctional impact resp) The input to query FastSNP isusually a gene symbol SNP reference cluster ID (rs ID) or achromosome position After getting query FastSNP providestwo main options (query by candidate gene and query bySNP) to further generate the SNP function report against eachSNP [33]

28 Analysis of SNP Effects on Surface and Solvent Accessibilityof Protein by NetSurfP Solvent accessibility or accessiblesurface area (ASA) of an amino acid helps in locating thepotential active sites in a three-dimensional structure and inan extended tripeptide conformation of proteins (ShandarAhmad et al 2004) Amino acid FASTA sequence ofGalNAc-T1 was submitted to the NetSurfP (httpwwwcbsdtudkservicesNetSurfP) server to predict its secondary structuresurface and solvent accessibility of amino acids This pre-diction method relies on the Z-score which can predict thesurfaces but not secondary structures of proteins There are3 subclasses defined for solvent accessibility of the aminoacids these include buried (low accessibility) partially buried(moderate accessibility) and exposed (high accessibility)[34] Simulating the secondary structure of proteins isessential to analyze the relationship between amino acidsequence and protein structure Hence we simulated thesecondary structure of GalNAc-T1 protein with I-TASSER(httpzhanglabccmbmedumicheduI-TASSER) by excis-ing continuous fragments from threading alignments andfurther refined them using replica-exchanged Monte-CarlosimulationsThe quality of predictionmodels was reflected inthe form of c-scores (minus5 to 2) Secondary structure predictionand solvent accessibility analysis were an intermediate stepprior to predicting the tertiary structure of GalNAc-T1protein

29 Modelling the Molecular Effects of nsSNP on ProteinStructures The nsSNPs can significantly change the stabil-ity of proteins Consequently in order to investigate thestructural deviations and stability differences between nativeand mutant forms of GalNAc-T1 structural analysis wasundertaken based on the results obtained from highest SIFTPolyPhen-2 SNPeffect and PANTHER scores Prediction ofthe 3 dimensional model of GalNAc-T1 protein structurewas done using the Universal Protein Resource (UniProthttpwwwuniprotorg ) I-TASSER (httpzhanglabccmbmedumicheduI-TASSER) andMultisource protein struc-ture threading (MUSTER httpzhanglabccmbmedumicheduMUSTER) web tools However structural visualizationwas performed by the SWISS-PDB viewer and Chimera [3536]

210 Analysis of Structural Specificity of Functional SNPsTo study the structural effects of mutations on GalNAc-T1 protein Project Have yOur Protein Explained (HOPEhttpwwwcmbirunlhopehome) a unique web serverwas used To simulate structural features of mutations onnative proteinmolecule ProjectHOPEuses the 3D structures

of the proteins that are available in Uniprot database and alsoin DAS prediction servers however the HOPE server canalso build homology models independently if necessary TheHOPE sever predicts the structural variation between nativeand mutant residues [37]

211 Prediction of Ligand Binding Sites on Unbound ProteinStructures by FTSite To solve the classic problem asso-ciated with the elucidation of protein structure-functionrelationship protein engineering and drug design proteinbinding sites are considered as hot spot regions especially forsmall ligand molecules FTSite is an accurate computationalmethod based on experimental evidence to determine theligand binding sites with experimental accuracy of 94The input options contain job name PDB ID or file PDBchain IDs and email (optional) to retrieve the bindingsitesresidues within the candidate unbound protein [38]

212 Predictions of Protein-Protein Interactions Protein-protein interaction networks are important to unveil andannotate all functional interaction among cell proteinsFor this current investigation the online database resourceldquoSearch Tool for the Retrieval of Interacting proteinsrdquo(STRING httpstring-dborg) was used This providedunique coverage and ease of access to both experimentaland theoretical interaction evidence of GalNac-T1 The inputoptions for STRING database include protein name pro-tein sequence and multiple sequences STRING databaseis presently equipped with 5214234 proteins belonging to1133 organisms [39] To predict the functional networkingof GalNAc-T1 enzyme we used KEGG (httpwwwgenomejpkegg) PATHWAY and LIGAND and curated the datafor molecular reaction and interaction networks includingmetabolic pathways regulatory pathways and molecularcomplexes for biological systems

213 Molecular Docking for Protein-Ligand Interaction byPatchDock and FireDock To get the structural insight ofunbound protein-ligand interaction PatchDock and Fire-Dock (Fast Interaction REfinement in molecular DOCK-ing) were considered PatchDock algorithm entails objectrecognition and image segmentation techniques to carry outrigid docking through a three-stage filtering process (a)molecular shape representation (b) surface patch matchingand (c) filteringscoring The FireDock method simulta-neously targets the problem of flexibility and scoring ofsolutions to address the refinement problem of protein-protein docking solutions After generating top 1000 modelsthe PatchDock output was redirected to FireDock (as aninput) to produce the top 10 refined structures of associatedGalNAcT protein-ligand [40]

3 Results

31 SNP Analysis The dbSNP-NCBI database search fornonsynonymous and synonymous SNPs located in exonicintronic and UTR regions ofGalNAc-T1 gene revealed a totalof 891 SNPs of which the exonic region is comprised of 18

Computational and Mathematical Methods in Medicine 5

Table 1 List of nsSNPs with rs IDs found to be functionally significant by SIFT Tool

rs IDs Allele change Amino acid change SIFT score SIFT prediction Medianrs143502430 c13GgtA A5T 055 TOLERATED 295rs141224997 c58TgtG L20V 1 TOLERATED 266rs138822379 c73CgtG L25V 007 TOLERATED 267rs199595808 c137AgtG D46G 03 TOLERATED 251rs72964406 c161CgtT P54L 028 TOLERATED 251rs139185162 c213TgtG D71E 1 TOLERATED 238rs150799030 c211GgtC D71H 007 TOLERATED 238rs113616262 c427TgtC S143P 0 DAMAGING 237rs201338228 c646GgtA V216M 014 TOLERATED 237rs199977475 c773GgtT G258V 004 DAMAGING 237rs144282744 c1135AgtC N379H 01 TOLERATED 237rs146565032 c1150AgtG I384V 052 TOLERATED 238rs142387342 c1202GgtA G401D 034 TOLERATED 237rs34304568 c1240TgtC Y414D 0 DAMAGING 237rs151329739 c1318AgtC N440H 008 TOLERATED 237rs142110831 c1528CgtA P510T 016 TOLERATED 237rs145884536 c1622AgtG N541S 045 TOLERATED 238rs200444543 c1657GgtA V553I 039 TOLERATED 266SIFT score le 005 damaging

(2) nsSNPs and 11 (12) sSNPs whereas intronic regionscontained 836 (93) SNPs and mRNAUTR region consistedof 26 (29) SNPs However search for nonsense frameshiftstopgain mutations did not show any results in NCBI-dbSNPWe selected nonsynonymous SNPs from the exonic regionand UTR SNPs (51015840 and 31015840) from the intronic region for ouranalysis

32 Identification of Functional SNPs in Coding RegionsIdentification of functional SNPs was done by predictingthose which substitute the amino acids that are critical forGalNAc-T1 functionThis in silico analysis was performed andvalidated using 4 different computational tools namely SIFTPolyPhen-2 PANTHER and SNPeffect

321 Analysis of Molecular Phenotypic Effects by SIFT Byusing sequence homology SIFT characterizes the effect ofamino acid substitution on protein function From SIFTresults (Table 1) a total of 3 (166) nsSNPs were predicted asdamaging (score of 000ndash004) by SIFT whereas the remain-ing 15 nsSNPs (8333) were simulated to be tolerated (scoreof 008ndash055) The 3 SNPs (rs113616262 rs34304568 andrs199977475) of GalNAc-T1 gene are filtered to be functionalby SIFT

322 Simulation of Functional Consequences by PolyPhen-2 For a given threshold of Naıve Bayes probabilistic scorePolyPhen-2 calculates the true positive rate as a fraction ofcorrectly predicted mutations From 18 nsSNPs of GalNAc-T1 gene only 4 (22) nsSNPs were predicted ldquoprobablydamagingrdquo (score of 096ndash100) whereas 14 (78) wereclassified as benign (score of 0411ndash000) The ranking ofSNPs on the basis of PolyPhen-2 scores enables us to assess

the potential-quantitative effect of SNPs on native protein(Table 2) PolyPhen-2 predicted functional 4 SNPs includingldquors113616262 rs34304568 and rs199977475rdquo and therebyvalidated the results predicted from SIFT

323 Functional Characterization by PANTHER andFathmm PANTHER characterizes likely functional effectof amino acid variation by means of HMM-based statisticalmodeling and evolutionary relationship We performedPANTHER analysis of GalNAc-T1 nsSNPs in order to addanother layer of refinement in SNPs characterization A totalof 4 SNPs possessed the subPSEC score less than minus3 and weretherefore classified as toleratedThe remaining 14 amino acidvariants (D46G P54L D71E D71H V216M G258V N379HI384V G401D Y414D N440H and P510T) were found tobe deleterious with subPSEC score in between minus3 and minus10(Table 2) PANTHER-cSNP tool simulated rs113616262 andrs34304568 with the highest subPEC score of minus74941 andminus917274 respectively

From Fathmm only rs34304568 showed the damagingeffect of amino acid substitution (Y414D) with minus182 score

324 Functional Analysis by SNPeffect SNPeffect databasepredicts the impact of SNPs on aggregation-prone regionsamyloid-forming regions Hsp70 chaperone binding sitesand structural stability of human proteins TANGO analysisshowed that L25V variant increases (dTANGO score is6927) and Y414D decreases (dTANGO score is minus15410)the aggregation tendency of the GalNAc-T1 protein How-ever WALTZ analysis revealed that Y414D (dWALTZ scoreis minus20948) also possessed the ability to decrease pro-tein amyloid propensity None of the variants were pre-dicted to alter the chaperone binding sites for Hsp70

6 Computational and Mathematical Methods in Medicine

Table 2 Characterization of SNPs by using PolyPhen-2 PANTHER and Fathmm based classification systems

Amino acid change PolyPhen-2 PANTHER FathmmA1A2 Score Prediction Specificitysensitivity subPSEC Score Prediction Score PredictionA5T 0411 Benign 090089 minus296529 Tolerated 059 ToleratedL20V 0000 Benign 000100 minus265438 Tolerated 069 ToleratedL25V 0960 Probably damaging 095078 minus345127 Deleterious 035 ToleratedD46G 0004 Benign 059097 minus334504 Deleterious 051 ToleratedP54L 0001 Benign 015099 minus427568 Deleterious 056 ToleratedD71E 0000 Benign 000100 minus340972 Deleterious 068 ToleratedD71H 0132 Benign 086093 minus524027 Deleterious 049 ToleratedS143P 0999 Probably damaging 099014 minus74941 Deleterious 015 ToleratedV216M 0411 Benign 090089 minus556762 Deleterious 026 ToleratedG258V 1000 Probably damaging 100000 minus440932 Deleterious 035 ToleratedN379H 0000 Benign 000100 minus341433 Deleterious minus035 ToleratedI384V 0000 Benign 000100 minus381526 Deleterious minus031 ToleratedG401D 0000 Benign 000100 minus323851 Deleterious 156 ToleratedY414D 1000 Probably damaging 100000 minus917274 Deleterious minus182 DamagingN440H 0048 Benign 083094 minus406922 Deleterious minus117 ToleratedP510T 0009 Benign 077096 minus346487 Deleterious minus109 ToleratedN541S 0000 Benign 000100 minus280107 Tolerated minus103 ToleratedV553I 0000 Benign 000100 minus2409 Tolerated minus098 ToleratedPolyPhen-2 probably damaging (probabilistic score gt 085) possibly damaging (probabilistic score gt 015) and benign (remaining)PANTHER deleterious or intolerant (subPSEC score le minus3) less deleterious (subPSEC score ge minus3)

chaperones When analyzed by FoldX severe reductionin stability was demonstrated by S143P (750Kcalmol)G258V (782Kcalmol) Y414D (603 Kcalmol) variants whileN440H (167 Kcalmol) P510T (225 Kcalmol) and N541S(055 Kcalmol) were accounted to slightly reduce the stabilityof GalNAc-T1 protein

325 Characterization of Functional SNPs by ConcordanceAnalysis The efficacy of functional SNP prediction can bemademore reliable by combining the results of empirical andsupport vector machine (SVM) based approaches Hencewe performed the concordance analysis to get the integratedpicture with SIFT PolyPhen-2 PANTHER and SNPeffecttools Out of the 18 nsSNPs 3 (1666) were predicted tobe ldquodeleteriousrdquo by SIFT whereas this prediction rate wasincreased to 4 (22) as ldquoprobably damagingrdquo when analyzedby PolyPhen-2 5 (2777) as ldquoseverely destabilizingrdquo by SNPeffect and 14 (78) as ldquodeleteriousrdquo by PANTHER Fromconcordance analysis 3 amino acid variants (S143P G258Vand Y414D) were commonly predicted by all 4 of the in silicotools (SIFT PolyPhen-2 PANTHER and SNPeffect)

326 Identification of Functional SNPs in Regulatory andCon-served Regions Using empirical Bayesian inference Con-Surf database characterizes the evolutionary conservation ofamino acids Our ConSurf results showed that only S143 waslocated in highly conserved region and predicted to havefunctional impact on GalNAc-T1 proteinThe remaining twonsSNPs responsible for G258V and Y414D are found in thevicinity of conserved residues and found to be buried in wildtype residues by NetSurfP (Table 3)

FastSNP assigns the risk ranking for SNPs after excisingfunctional effect information From FastSNP results wefound 2 SNPs with rs72964406 (ESE motif CCTCATGscore is 2897886) and rs34304568 (ESE motif GGACCTAGscore is 2088850) located in coding regions and altered ESE(exonic splicing enhancers) motifs indicating that all thesensSNPs may have the ability to affect the level position andtiming of gene expression or regulate the splicing of GalNAc-T1 gene transcript No functional role was predicted for theSNPs which are found be located in 31015840 UTR region

33 AMBER-Energy Minimization Solvent Accessibilityand Electrostatic Interaction Effects of Functional SNPson GalNAc-T1 Protein Based on multiple-threadingalignments I-TASSER builds high-quality 3D structuresof protein molecules from their amino acid sequenceAfter retrieving protein FASTA sequence of GalNAc-T1from UniProt (ID ldquoQ10472rdquo) database protein sequencewas given to I-TASSER as an input The I-TASSER toolcreated the 5 full-length models for GalNAc-T1 protein(with C-scores minus075 minus129 minus189 minus271 and minus283) byexcising top 10 structures with C-scores after targeting thePDB library hits (Table 4) Among the 5 predicted modelsmodel 1 carried the high-quality confidence in the formof C-score (minus075) TM-Score (062 plusmn 014) and RMSD(93plusmn 46A) hence it was selected for further analysis usingChimera Swiss Pdb viewer GROMOS96 and AMBERff99SB software to visualize and compute the total energybefore and after energy minimization The superimposedstructures of mutant and wild-type residues with theirsurface were visualized in Figure 2 The total energy of

Computational and Mathematical Methods in Medicine 7

Table 3 Surface accessibility of wild-type and mutant variants in GalNAc-T1

Amino acid change Class assignment Relative surfaceaccessibility (RSA)

Absolute SurfaceAccessibility prediction

minusminus

minusminus

S143P

G258V

Y414D

Z-fit score for RSA

(a) (b) (c)

Figure 2 Superimposed structures of GalNAc-T1 native (camel color) and mutant (blue color) models to visualize the stereochemicalconformation of wild type and mutant residues at 143 258 and 414 positions

Table 4 Top 10 templates used by I-TASSER to create the high-quality models for human GALNT1 secondary structure

Rank PDB Hit Iden1 Iden2 Cov Norm 119885-score1 2ffuA 045 041 088 3122 2d7iA 043 042 091 7153 2ffuA 045 041 088 8164 2ffuA 045 041 088 5975 2ffuA 045 041 088 4416 2d7iA 043 042 091 7117 2ffuA 045 041 088 7998 2d7rA 043 042 092 6429 1xhbA 099 079 080 30610 2ffuA 045 041 088 621Rank of templates represents the top ten threading templates used by I-TASSER ldquoIdent1rdquo is the percentage sequence identity of the templates in thethreading aligned region with the query sequence ldquoIdent2rdquo is the percentagesequence identity of the whole template chains with query sequence ldquoCovrdquorepresents the coverage of the threading alignment and is equal to thenumber of aligned residues divided by the length of query protein ldquoNorm119885-scorerdquo is the normalized 119885-score of the threading alignments Alignmentwith a Normalized 119885-score gt 1 mean a good alignment and vice versa Thetop 10 alignments reported above (in order of their ranking) are taken fromthe following threading programs 1 MUSTER 2 HHSEARCH 3 SP3 4PROSPECT2 5 PPA-I 6 HHSEARCH I 7 SPARKS 8 SAMT99 9 MUSTER10 HHSEARCH

native structure was found to be 13294691 kJmol before theenergyminimization stage and it was minus13882539 kJmol afterthe energy minimization step whereas mutant structures

Table 5 Total energy of native and mutant structures before andafter energy minimization

Amino acidvariants

Total energy beforeenergy minimization

(kJmol)

Total energy afterenergy minimization

(kJmol)Native 13140137 minus13882539S143P 13392757 minus13767374G258V 59777910 minus10727144Y414D 12995687 minus13918198

exhibited deviation in total energy value calculated beforeand after energy minimization (Table 5) Among 3 screenedmutations G258V showed the increase in total energy ofGalNAc-T1 before (597779100 kJmol) and after energyminimization (minus10727144 kJmol) Chimera and Swiss PDBviewer were used to visualize the structural features of aminoacids in native and mutant protein chains During structuralvisualization for all 3 mutations only mutant residue (valine)at 258 position showed a network of clashes with Tyr268(Figure 3) Additionally S143P G258V and Y414D variantswere analyzed for solvent accessibility (in form of Z-score)and stability and a decrease in both parameters was observedfor all three variants (Table 3)

34 The Structural Impacts of Functional GalNAc-T1Mutations Project HOPE simulates the structural featuresof amino acid substitution on native protein structure

8 Computational and Mathematical Methods in Medicine

(a)

(b)

(c)

Figure 3 H-bonding (green discontinuous line) interactions and clashes (pink discontinuous line) of wild type and mutant analogues withthe vicinal amino acid residues (a) Ser143 is examined with single H-bonding with Val139 and converted into clash as a result of Pro atthe same position (b) At 258 position one H-bond is observed with Arg266 in both native (Gly258) and mutant (Val258) structures but anetwork of clashes appeared between Val258 and Tyr268 (c) Tyr414 is visualized with five H-bonding interactions for Pro410 and Asn417and one H-bond is distorted due to appearance of mutant aspartic acid at the same position

Computational and Mathematical Methods in Medicine 9

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

lowast lowastlowast lowastlowastlowastlowastlowast lowastlowast

lowast lowast lowastlowastlowast lowastlowast

lowast

lowast

lowast lowast

lowastlowastlowastlowast lowast lowast lowastlowast

lowast lowast lowastlowastlowast lowast lowast lowastlowastlowastlowastlowast lowast lowastlowast lowast lowast lowastlowast lowast lowastlowast

lowast lowastlowastlowast lowast lowast lowastlowast lowast lowast143 Catalytic subdomain A

258 Catalytic subdomain B

414

Ricin B-type lectin

Figure 4 Multiple sequence alignments and evolutionary conservation behaviour among human GalNAc-T1 to GalNAc-T10 members ofGALNTs family S143 and Y414 residues are found conserved in the catalytic subdomain A and linker B domain (area between catalyticsubdomain B and Ricin B-type lectin) respectively G258 is observed in the vicinity of conservation groups with strongly similar properties(shown with ldquordquo)

The S143 residue is located within a stretch of residuesannotated in Uniprot as catalytic domain A (Figure 4 fromClustal Omega) The hydrophobicity and size differencebetween wild-type and mutant residue (Pro) can disrupt theH-bonding interactions with the neighbouring residues andhence the protein framework The wild-type residue formsa hydrogen bond with the valine on position 139 Due tohigh rigidity of proline moiety this mutation might abolishthe required flexibility of the protein at this position andcould affect the catalytic tendency Based on conservationbehaviour this mutation is probably damaging to the protein

For G258V the wild-type residue is resided in thelinker region between catalytic subdomains A and B andcan distract the required flexibility of core structure bysubstituting Gly258 with valine Additionally the structuralanalysis ofVal258 showed some clashes for Tyr268whichmaycontribute to the extra energy in the protein structure andhence the decrease in stability

In theY414Dvariant Tyr414 indicated fiveH-interactionswith Pro410 Phe411 and Asn417 whereas mutant Asp414exhibited the four H-bonding interactions in the core of theprotein (or protein complex) due to the differences in charge

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 4: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

4 Computational and Mathematical Methods in Medicine

the risk rankings for SNPs from 1 2 3 4 or 5 (which indicatesthe levels of very low low medium high and the very highfunctional impact resp) The input to query FastSNP isusually a gene symbol SNP reference cluster ID (rs ID) or achromosome position After getting query FastSNP providestwo main options (query by candidate gene and query bySNP) to further generate the SNP function report against eachSNP [33]

28 Analysis of SNP Effects on Surface and Solvent Accessibilityof Protein by NetSurfP Solvent accessibility or accessiblesurface area (ASA) of an amino acid helps in locating thepotential active sites in a three-dimensional structure and inan extended tripeptide conformation of proteins (ShandarAhmad et al 2004) Amino acid FASTA sequence ofGalNAc-T1 was submitted to the NetSurfP (httpwwwcbsdtudkservicesNetSurfP) server to predict its secondary structuresurface and solvent accessibility of amino acids This pre-diction method relies on the Z-score which can predict thesurfaces but not secondary structures of proteins There are3 subclasses defined for solvent accessibility of the aminoacids these include buried (low accessibility) partially buried(moderate accessibility) and exposed (high accessibility)[34] Simulating the secondary structure of proteins isessential to analyze the relationship between amino acidsequence and protein structure Hence we simulated thesecondary structure of GalNAc-T1 protein with I-TASSER(httpzhanglabccmbmedumicheduI-TASSER) by excis-ing continuous fragments from threading alignments andfurther refined them using replica-exchanged Monte-CarlosimulationsThe quality of predictionmodels was reflected inthe form of c-scores (minus5 to 2) Secondary structure predictionand solvent accessibility analysis were an intermediate stepprior to predicting the tertiary structure of GalNAc-T1protein

29 Modelling the Molecular Effects of nsSNP on ProteinStructures The nsSNPs can significantly change the stabil-ity of proteins Consequently in order to investigate thestructural deviations and stability differences between nativeand mutant forms of GalNAc-T1 structural analysis wasundertaken based on the results obtained from highest SIFTPolyPhen-2 SNPeffect and PANTHER scores Prediction ofthe 3 dimensional model of GalNAc-T1 protein structurewas done using the Universal Protein Resource (UniProthttpwwwuniprotorg ) I-TASSER (httpzhanglabccmbmedumicheduI-TASSER) andMultisource protein struc-ture threading (MUSTER httpzhanglabccmbmedumicheduMUSTER) web tools However structural visualizationwas performed by the SWISS-PDB viewer and Chimera [3536]

210 Analysis of Structural Specificity of Functional SNPsTo study the structural effects of mutations on GalNAc-T1 protein Project Have yOur Protein Explained (HOPEhttpwwwcmbirunlhopehome) a unique web serverwas used To simulate structural features of mutations onnative proteinmolecule ProjectHOPEuses the 3D structures

of the proteins that are available in Uniprot database and alsoin DAS prediction servers however the HOPE server canalso build homology models independently if necessary TheHOPE sever predicts the structural variation between nativeand mutant residues [37]

211 Prediction of Ligand Binding Sites on Unbound ProteinStructures by FTSite To solve the classic problem asso-ciated with the elucidation of protein structure-functionrelationship protein engineering and drug design proteinbinding sites are considered as hot spot regions especially forsmall ligand molecules FTSite is an accurate computationalmethod based on experimental evidence to determine theligand binding sites with experimental accuracy of 94The input options contain job name PDB ID or file PDBchain IDs and email (optional) to retrieve the bindingsitesresidues within the candidate unbound protein [38]

212 Predictions of Protein-Protein Interactions Protein-protein interaction networks are important to unveil andannotate all functional interaction among cell proteinsFor this current investigation the online database resourceldquoSearch Tool for the Retrieval of Interacting proteinsrdquo(STRING httpstring-dborg) was used This providedunique coverage and ease of access to both experimentaland theoretical interaction evidence of GalNac-T1 The inputoptions for STRING database include protein name pro-tein sequence and multiple sequences STRING databaseis presently equipped with 5214234 proteins belonging to1133 organisms [39] To predict the functional networkingof GalNAc-T1 enzyme we used KEGG (httpwwwgenomejpkegg) PATHWAY and LIGAND and curated the datafor molecular reaction and interaction networks includingmetabolic pathways regulatory pathways and molecularcomplexes for biological systems

213 Molecular Docking for Protein-Ligand Interaction byPatchDock and FireDock To get the structural insight ofunbound protein-ligand interaction PatchDock and Fire-Dock (Fast Interaction REfinement in molecular DOCK-ing) were considered PatchDock algorithm entails objectrecognition and image segmentation techniques to carry outrigid docking through a three-stage filtering process (a)molecular shape representation (b) surface patch matchingand (c) filteringscoring The FireDock method simulta-neously targets the problem of flexibility and scoring ofsolutions to address the refinement problem of protein-protein docking solutions After generating top 1000 modelsthe PatchDock output was redirected to FireDock (as aninput) to produce the top 10 refined structures of associatedGalNAcT protein-ligand [40]

3 Results

31 SNP Analysis The dbSNP-NCBI database search fornonsynonymous and synonymous SNPs located in exonicintronic and UTR regions ofGalNAc-T1 gene revealed a totalof 891 SNPs of which the exonic region is comprised of 18

Computational and Mathematical Methods in Medicine 5

Table 1 List of nsSNPs with rs IDs found to be functionally significant by SIFT Tool

rs IDs Allele change Amino acid change SIFT score SIFT prediction Medianrs143502430 c13GgtA A5T 055 TOLERATED 295rs141224997 c58TgtG L20V 1 TOLERATED 266rs138822379 c73CgtG L25V 007 TOLERATED 267rs199595808 c137AgtG D46G 03 TOLERATED 251rs72964406 c161CgtT P54L 028 TOLERATED 251rs139185162 c213TgtG D71E 1 TOLERATED 238rs150799030 c211GgtC D71H 007 TOLERATED 238rs113616262 c427TgtC S143P 0 DAMAGING 237rs201338228 c646GgtA V216M 014 TOLERATED 237rs199977475 c773GgtT G258V 004 DAMAGING 237rs144282744 c1135AgtC N379H 01 TOLERATED 237rs146565032 c1150AgtG I384V 052 TOLERATED 238rs142387342 c1202GgtA G401D 034 TOLERATED 237rs34304568 c1240TgtC Y414D 0 DAMAGING 237rs151329739 c1318AgtC N440H 008 TOLERATED 237rs142110831 c1528CgtA P510T 016 TOLERATED 237rs145884536 c1622AgtG N541S 045 TOLERATED 238rs200444543 c1657GgtA V553I 039 TOLERATED 266SIFT score le 005 damaging

(2) nsSNPs and 11 (12) sSNPs whereas intronic regionscontained 836 (93) SNPs and mRNAUTR region consistedof 26 (29) SNPs However search for nonsense frameshiftstopgain mutations did not show any results in NCBI-dbSNPWe selected nonsynonymous SNPs from the exonic regionand UTR SNPs (51015840 and 31015840) from the intronic region for ouranalysis

32 Identification of Functional SNPs in Coding RegionsIdentification of functional SNPs was done by predictingthose which substitute the amino acids that are critical forGalNAc-T1 functionThis in silico analysis was performed andvalidated using 4 different computational tools namely SIFTPolyPhen-2 PANTHER and SNPeffect

321 Analysis of Molecular Phenotypic Effects by SIFT Byusing sequence homology SIFT characterizes the effect ofamino acid substitution on protein function From SIFTresults (Table 1) a total of 3 (166) nsSNPs were predicted asdamaging (score of 000ndash004) by SIFT whereas the remain-ing 15 nsSNPs (8333) were simulated to be tolerated (scoreof 008ndash055) The 3 SNPs (rs113616262 rs34304568 andrs199977475) of GalNAc-T1 gene are filtered to be functionalby SIFT

322 Simulation of Functional Consequences by PolyPhen-2 For a given threshold of Naıve Bayes probabilistic scorePolyPhen-2 calculates the true positive rate as a fraction ofcorrectly predicted mutations From 18 nsSNPs of GalNAc-T1 gene only 4 (22) nsSNPs were predicted ldquoprobablydamagingrdquo (score of 096ndash100) whereas 14 (78) wereclassified as benign (score of 0411ndash000) The ranking ofSNPs on the basis of PolyPhen-2 scores enables us to assess

the potential-quantitative effect of SNPs on native protein(Table 2) PolyPhen-2 predicted functional 4 SNPs includingldquors113616262 rs34304568 and rs199977475rdquo and therebyvalidated the results predicted from SIFT

323 Functional Characterization by PANTHER andFathmm PANTHER characterizes likely functional effectof amino acid variation by means of HMM-based statisticalmodeling and evolutionary relationship We performedPANTHER analysis of GalNAc-T1 nsSNPs in order to addanother layer of refinement in SNPs characterization A totalof 4 SNPs possessed the subPSEC score less than minus3 and weretherefore classified as toleratedThe remaining 14 amino acidvariants (D46G P54L D71E D71H V216M G258V N379HI384V G401D Y414D N440H and P510T) were found tobe deleterious with subPSEC score in between minus3 and minus10(Table 2) PANTHER-cSNP tool simulated rs113616262 andrs34304568 with the highest subPEC score of minus74941 andminus917274 respectively

From Fathmm only rs34304568 showed the damagingeffect of amino acid substitution (Y414D) with minus182 score

324 Functional Analysis by SNPeffect SNPeffect databasepredicts the impact of SNPs on aggregation-prone regionsamyloid-forming regions Hsp70 chaperone binding sitesand structural stability of human proteins TANGO analysisshowed that L25V variant increases (dTANGO score is6927) and Y414D decreases (dTANGO score is minus15410)the aggregation tendency of the GalNAc-T1 protein How-ever WALTZ analysis revealed that Y414D (dWALTZ scoreis minus20948) also possessed the ability to decrease pro-tein amyloid propensity None of the variants were pre-dicted to alter the chaperone binding sites for Hsp70

6 Computational and Mathematical Methods in Medicine

Table 2 Characterization of SNPs by using PolyPhen-2 PANTHER and Fathmm based classification systems

Amino acid change PolyPhen-2 PANTHER FathmmA1A2 Score Prediction Specificitysensitivity subPSEC Score Prediction Score PredictionA5T 0411 Benign 090089 minus296529 Tolerated 059 ToleratedL20V 0000 Benign 000100 minus265438 Tolerated 069 ToleratedL25V 0960 Probably damaging 095078 minus345127 Deleterious 035 ToleratedD46G 0004 Benign 059097 minus334504 Deleterious 051 ToleratedP54L 0001 Benign 015099 minus427568 Deleterious 056 ToleratedD71E 0000 Benign 000100 minus340972 Deleterious 068 ToleratedD71H 0132 Benign 086093 minus524027 Deleterious 049 ToleratedS143P 0999 Probably damaging 099014 minus74941 Deleterious 015 ToleratedV216M 0411 Benign 090089 minus556762 Deleterious 026 ToleratedG258V 1000 Probably damaging 100000 minus440932 Deleterious 035 ToleratedN379H 0000 Benign 000100 minus341433 Deleterious minus035 ToleratedI384V 0000 Benign 000100 minus381526 Deleterious minus031 ToleratedG401D 0000 Benign 000100 minus323851 Deleterious 156 ToleratedY414D 1000 Probably damaging 100000 minus917274 Deleterious minus182 DamagingN440H 0048 Benign 083094 minus406922 Deleterious minus117 ToleratedP510T 0009 Benign 077096 minus346487 Deleterious minus109 ToleratedN541S 0000 Benign 000100 minus280107 Tolerated minus103 ToleratedV553I 0000 Benign 000100 minus2409 Tolerated minus098 ToleratedPolyPhen-2 probably damaging (probabilistic score gt 085) possibly damaging (probabilistic score gt 015) and benign (remaining)PANTHER deleterious or intolerant (subPSEC score le minus3) less deleterious (subPSEC score ge minus3)

chaperones When analyzed by FoldX severe reductionin stability was demonstrated by S143P (750Kcalmol)G258V (782Kcalmol) Y414D (603 Kcalmol) variants whileN440H (167 Kcalmol) P510T (225 Kcalmol) and N541S(055 Kcalmol) were accounted to slightly reduce the stabilityof GalNAc-T1 protein

325 Characterization of Functional SNPs by ConcordanceAnalysis The efficacy of functional SNP prediction can bemademore reliable by combining the results of empirical andsupport vector machine (SVM) based approaches Hencewe performed the concordance analysis to get the integratedpicture with SIFT PolyPhen-2 PANTHER and SNPeffecttools Out of the 18 nsSNPs 3 (1666) were predicted tobe ldquodeleteriousrdquo by SIFT whereas this prediction rate wasincreased to 4 (22) as ldquoprobably damagingrdquo when analyzedby PolyPhen-2 5 (2777) as ldquoseverely destabilizingrdquo by SNPeffect and 14 (78) as ldquodeleteriousrdquo by PANTHER Fromconcordance analysis 3 amino acid variants (S143P G258Vand Y414D) were commonly predicted by all 4 of the in silicotools (SIFT PolyPhen-2 PANTHER and SNPeffect)

326 Identification of Functional SNPs in Regulatory andCon-served Regions Using empirical Bayesian inference Con-Surf database characterizes the evolutionary conservation ofamino acids Our ConSurf results showed that only S143 waslocated in highly conserved region and predicted to havefunctional impact on GalNAc-T1 proteinThe remaining twonsSNPs responsible for G258V and Y414D are found in thevicinity of conserved residues and found to be buried in wildtype residues by NetSurfP (Table 3)

FastSNP assigns the risk ranking for SNPs after excisingfunctional effect information From FastSNP results wefound 2 SNPs with rs72964406 (ESE motif CCTCATGscore is 2897886) and rs34304568 (ESE motif GGACCTAGscore is 2088850) located in coding regions and altered ESE(exonic splicing enhancers) motifs indicating that all thesensSNPs may have the ability to affect the level position andtiming of gene expression or regulate the splicing of GalNAc-T1 gene transcript No functional role was predicted for theSNPs which are found be located in 31015840 UTR region

33 AMBER-Energy Minimization Solvent Accessibilityand Electrostatic Interaction Effects of Functional SNPson GalNAc-T1 Protein Based on multiple-threadingalignments I-TASSER builds high-quality 3D structuresof protein molecules from their amino acid sequenceAfter retrieving protein FASTA sequence of GalNAc-T1from UniProt (ID ldquoQ10472rdquo) database protein sequencewas given to I-TASSER as an input The I-TASSER toolcreated the 5 full-length models for GalNAc-T1 protein(with C-scores minus075 minus129 minus189 minus271 and minus283) byexcising top 10 structures with C-scores after targeting thePDB library hits (Table 4) Among the 5 predicted modelsmodel 1 carried the high-quality confidence in the formof C-score (minus075) TM-Score (062 plusmn 014) and RMSD(93plusmn 46A) hence it was selected for further analysis usingChimera Swiss Pdb viewer GROMOS96 and AMBERff99SB software to visualize and compute the total energybefore and after energy minimization The superimposedstructures of mutant and wild-type residues with theirsurface were visualized in Figure 2 The total energy of

Computational and Mathematical Methods in Medicine 7

Table 3 Surface accessibility of wild-type and mutant variants in GalNAc-T1

Amino acid change Class assignment Relative surfaceaccessibility (RSA)

Absolute SurfaceAccessibility prediction

minusminus

minusminus

S143P

G258V

Y414D

Z-fit score for RSA

(a) (b) (c)

Figure 2 Superimposed structures of GalNAc-T1 native (camel color) and mutant (blue color) models to visualize the stereochemicalconformation of wild type and mutant residues at 143 258 and 414 positions

Table 4 Top 10 templates used by I-TASSER to create the high-quality models for human GALNT1 secondary structure

Rank PDB Hit Iden1 Iden2 Cov Norm 119885-score1 2ffuA 045 041 088 3122 2d7iA 043 042 091 7153 2ffuA 045 041 088 8164 2ffuA 045 041 088 5975 2ffuA 045 041 088 4416 2d7iA 043 042 091 7117 2ffuA 045 041 088 7998 2d7rA 043 042 092 6429 1xhbA 099 079 080 30610 2ffuA 045 041 088 621Rank of templates represents the top ten threading templates used by I-TASSER ldquoIdent1rdquo is the percentage sequence identity of the templates in thethreading aligned region with the query sequence ldquoIdent2rdquo is the percentagesequence identity of the whole template chains with query sequence ldquoCovrdquorepresents the coverage of the threading alignment and is equal to thenumber of aligned residues divided by the length of query protein ldquoNorm119885-scorerdquo is the normalized 119885-score of the threading alignments Alignmentwith a Normalized 119885-score gt 1 mean a good alignment and vice versa Thetop 10 alignments reported above (in order of their ranking) are taken fromthe following threading programs 1 MUSTER 2 HHSEARCH 3 SP3 4PROSPECT2 5 PPA-I 6 HHSEARCH I 7 SPARKS 8 SAMT99 9 MUSTER10 HHSEARCH

native structure was found to be 13294691 kJmol before theenergyminimization stage and it was minus13882539 kJmol afterthe energy minimization step whereas mutant structures

Table 5 Total energy of native and mutant structures before andafter energy minimization

Amino acidvariants

Total energy beforeenergy minimization

(kJmol)

Total energy afterenergy minimization

(kJmol)Native 13140137 minus13882539S143P 13392757 minus13767374G258V 59777910 minus10727144Y414D 12995687 minus13918198

exhibited deviation in total energy value calculated beforeand after energy minimization (Table 5) Among 3 screenedmutations G258V showed the increase in total energy ofGalNAc-T1 before (597779100 kJmol) and after energyminimization (minus10727144 kJmol) Chimera and Swiss PDBviewer were used to visualize the structural features of aminoacids in native and mutant protein chains During structuralvisualization for all 3 mutations only mutant residue (valine)at 258 position showed a network of clashes with Tyr268(Figure 3) Additionally S143P G258V and Y414D variantswere analyzed for solvent accessibility (in form of Z-score)and stability and a decrease in both parameters was observedfor all three variants (Table 3)

34 The Structural Impacts of Functional GalNAc-T1Mutations Project HOPE simulates the structural featuresof amino acid substitution on native protein structure

8 Computational and Mathematical Methods in Medicine

(a)

(b)

(c)

Figure 3 H-bonding (green discontinuous line) interactions and clashes (pink discontinuous line) of wild type and mutant analogues withthe vicinal amino acid residues (a) Ser143 is examined with single H-bonding with Val139 and converted into clash as a result of Pro atthe same position (b) At 258 position one H-bond is observed with Arg266 in both native (Gly258) and mutant (Val258) structures but anetwork of clashes appeared between Val258 and Tyr268 (c) Tyr414 is visualized with five H-bonding interactions for Pro410 and Asn417and one H-bond is distorted due to appearance of mutant aspartic acid at the same position

Computational and Mathematical Methods in Medicine 9

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

lowast lowastlowast lowastlowastlowastlowastlowast lowastlowast

lowast lowast lowastlowastlowast lowastlowast

lowast

lowast

lowast lowast

lowastlowastlowastlowast lowast lowast lowastlowast

lowast lowast lowastlowastlowast lowast lowast lowastlowastlowastlowastlowast lowast lowastlowast lowast lowast lowastlowast lowast lowastlowast

lowast lowastlowastlowast lowast lowast lowastlowast lowast lowast143 Catalytic subdomain A

258 Catalytic subdomain B

414

Ricin B-type lectin

Figure 4 Multiple sequence alignments and evolutionary conservation behaviour among human GalNAc-T1 to GalNAc-T10 members ofGALNTs family S143 and Y414 residues are found conserved in the catalytic subdomain A and linker B domain (area between catalyticsubdomain B and Ricin B-type lectin) respectively G258 is observed in the vicinity of conservation groups with strongly similar properties(shown with ldquordquo)

The S143 residue is located within a stretch of residuesannotated in Uniprot as catalytic domain A (Figure 4 fromClustal Omega) The hydrophobicity and size differencebetween wild-type and mutant residue (Pro) can disrupt theH-bonding interactions with the neighbouring residues andhence the protein framework The wild-type residue formsa hydrogen bond with the valine on position 139 Due tohigh rigidity of proline moiety this mutation might abolishthe required flexibility of the protein at this position andcould affect the catalytic tendency Based on conservationbehaviour this mutation is probably damaging to the protein

For G258V the wild-type residue is resided in thelinker region between catalytic subdomains A and B andcan distract the required flexibility of core structure bysubstituting Gly258 with valine Additionally the structuralanalysis ofVal258 showed some clashes for Tyr268whichmaycontribute to the extra energy in the protein structure andhence the decrease in stability

In theY414Dvariant Tyr414 indicated fiveH-interactionswith Pro410 Phe411 and Asn417 whereas mutant Asp414exhibited the four H-bonding interactions in the core of theprotein (or protein complex) due to the differences in charge

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 5: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

Computational and Mathematical Methods in Medicine 5

Table 1 List of nsSNPs with rs IDs found to be functionally significant by SIFT Tool

rs IDs Allele change Amino acid change SIFT score SIFT prediction Medianrs143502430 c13GgtA A5T 055 TOLERATED 295rs141224997 c58TgtG L20V 1 TOLERATED 266rs138822379 c73CgtG L25V 007 TOLERATED 267rs199595808 c137AgtG D46G 03 TOLERATED 251rs72964406 c161CgtT P54L 028 TOLERATED 251rs139185162 c213TgtG D71E 1 TOLERATED 238rs150799030 c211GgtC D71H 007 TOLERATED 238rs113616262 c427TgtC S143P 0 DAMAGING 237rs201338228 c646GgtA V216M 014 TOLERATED 237rs199977475 c773GgtT G258V 004 DAMAGING 237rs144282744 c1135AgtC N379H 01 TOLERATED 237rs146565032 c1150AgtG I384V 052 TOLERATED 238rs142387342 c1202GgtA G401D 034 TOLERATED 237rs34304568 c1240TgtC Y414D 0 DAMAGING 237rs151329739 c1318AgtC N440H 008 TOLERATED 237rs142110831 c1528CgtA P510T 016 TOLERATED 237rs145884536 c1622AgtG N541S 045 TOLERATED 238rs200444543 c1657GgtA V553I 039 TOLERATED 266SIFT score le 005 damaging

(2) nsSNPs and 11 (12) sSNPs whereas intronic regionscontained 836 (93) SNPs and mRNAUTR region consistedof 26 (29) SNPs However search for nonsense frameshiftstopgain mutations did not show any results in NCBI-dbSNPWe selected nonsynonymous SNPs from the exonic regionand UTR SNPs (51015840 and 31015840) from the intronic region for ouranalysis

32 Identification of Functional SNPs in Coding RegionsIdentification of functional SNPs was done by predictingthose which substitute the amino acids that are critical forGalNAc-T1 functionThis in silico analysis was performed andvalidated using 4 different computational tools namely SIFTPolyPhen-2 PANTHER and SNPeffect

321 Analysis of Molecular Phenotypic Effects by SIFT Byusing sequence homology SIFT characterizes the effect ofamino acid substitution on protein function From SIFTresults (Table 1) a total of 3 (166) nsSNPs were predicted asdamaging (score of 000ndash004) by SIFT whereas the remain-ing 15 nsSNPs (8333) were simulated to be tolerated (scoreof 008ndash055) The 3 SNPs (rs113616262 rs34304568 andrs199977475) of GalNAc-T1 gene are filtered to be functionalby SIFT

322 Simulation of Functional Consequences by PolyPhen-2 For a given threshold of Naıve Bayes probabilistic scorePolyPhen-2 calculates the true positive rate as a fraction ofcorrectly predicted mutations From 18 nsSNPs of GalNAc-T1 gene only 4 (22) nsSNPs were predicted ldquoprobablydamagingrdquo (score of 096ndash100) whereas 14 (78) wereclassified as benign (score of 0411ndash000) The ranking ofSNPs on the basis of PolyPhen-2 scores enables us to assess

the potential-quantitative effect of SNPs on native protein(Table 2) PolyPhen-2 predicted functional 4 SNPs includingldquors113616262 rs34304568 and rs199977475rdquo and therebyvalidated the results predicted from SIFT

323 Functional Characterization by PANTHER andFathmm PANTHER characterizes likely functional effectof amino acid variation by means of HMM-based statisticalmodeling and evolutionary relationship We performedPANTHER analysis of GalNAc-T1 nsSNPs in order to addanother layer of refinement in SNPs characterization A totalof 4 SNPs possessed the subPSEC score less than minus3 and weretherefore classified as toleratedThe remaining 14 amino acidvariants (D46G P54L D71E D71H V216M G258V N379HI384V G401D Y414D N440H and P510T) were found tobe deleterious with subPSEC score in between minus3 and minus10(Table 2) PANTHER-cSNP tool simulated rs113616262 andrs34304568 with the highest subPEC score of minus74941 andminus917274 respectively

From Fathmm only rs34304568 showed the damagingeffect of amino acid substitution (Y414D) with minus182 score

324 Functional Analysis by SNPeffect SNPeffect databasepredicts the impact of SNPs on aggregation-prone regionsamyloid-forming regions Hsp70 chaperone binding sitesand structural stability of human proteins TANGO analysisshowed that L25V variant increases (dTANGO score is6927) and Y414D decreases (dTANGO score is minus15410)the aggregation tendency of the GalNAc-T1 protein How-ever WALTZ analysis revealed that Y414D (dWALTZ scoreis minus20948) also possessed the ability to decrease pro-tein amyloid propensity None of the variants were pre-dicted to alter the chaperone binding sites for Hsp70

6 Computational and Mathematical Methods in Medicine

Table 2 Characterization of SNPs by using PolyPhen-2 PANTHER and Fathmm based classification systems

Amino acid change PolyPhen-2 PANTHER FathmmA1A2 Score Prediction Specificitysensitivity subPSEC Score Prediction Score PredictionA5T 0411 Benign 090089 minus296529 Tolerated 059 ToleratedL20V 0000 Benign 000100 minus265438 Tolerated 069 ToleratedL25V 0960 Probably damaging 095078 minus345127 Deleterious 035 ToleratedD46G 0004 Benign 059097 minus334504 Deleterious 051 ToleratedP54L 0001 Benign 015099 minus427568 Deleterious 056 ToleratedD71E 0000 Benign 000100 minus340972 Deleterious 068 ToleratedD71H 0132 Benign 086093 minus524027 Deleterious 049 ToleratedS143P 0999 Probably damaging 099014 minus74941 Deleterious 015 ToleratedV216M 0411 Benign 090089 minus556762 Deleterious 026 ToleratedG258V 1000 Probably damaging 100000 minus440932 Deleterious 035 ToleratedN379H 0000 Benign 000100 minus341433 Deleterious minus035 ToleratedI384V 0000 Benign 000100 minus381526 Deleterious minus031 ToleratedG401D 0000 Benign 000100 minus323851 Deleterious 156 ToleratedY414D 1000 Probably damaging 100000 minus917274 Deleterious minus182 DamagingN440H 0048 Benign 083094 minus406922 Deleterious minus117 ToleratedP510T 0009 Benign 077096 minus346487 Deleterious minus109 ToleratedN541S 0000 Benign 000100 minus280107 Tolerated minus103 ToleratedV553I 0000 Benign 000100 minus2409 Tolerated minus098 ToleratedPolyPhen-2 probably damaging (probabilistic score gt 085) possibly damaging (probabilistic score gt 015) and benign (remaining)PANTHER deleterious or intolerant (subPSEC score le minus3) less deleterious (subPSEC score ge minus3)

chaperones When analyzed by FoldX severe reductionin stability was demonstrated by S143P (750Kcalmol)G258V (782Kcalmol) Y414D (603 Kcalmol) variants whileN440H (167 Kcalmol) P510T (225 Kcalmol) and N541S(055 Kcalmol) were accounted to slightly reduce the stabilityof GalNAc-T1 protein

325 Characterization of Functional SNPs by ConcordanceAnalysis The efficacy of functional SNP prediction can bemademore reliable by combining the results of empirical andsupport vector machine (SVM) based approaches Hencewe performed the concordance analysis to get the integratedpicture with SIFT PolyPhen-2 PANTHER and SNPeffecttools Out of the 18 nsSNPs 3 (1666) were predicted tobe ldquodeleteriousrdquo by SIFT whereas this prediction rate wasincreased to 4 (22) as ldquoprobably damagingrdquo when analyzedby PolyPhen-2 5 (2777) as ldquoseverely destabilizingrdquo by SNPeffect and 14 (78) as ldquodeleteriousrdquo by PANTHER Fromconcordance analysis 3 amino acid variants (S143P G258Vand Y414D) were commonly predicted by all 4 of the in silicotools (SIFT PolyPhen-2 PANTHER and SNPeffect)

326 Identification of Functional SNPs in Regulatory andCon-served Regions Using empirical Bayesian inference Con-Surf database characterizes the evolutionary conservation ofamino acids Our ConSurf results showed that only S143 waslocated in highly conserved region and predicted to havefunctional impact on GalNAc-T1 proteinThe remaining twonsSNPs responsible for G258V and Y414D are found in thevicinity of conserved residues and found to be buried in wildtype residues by NetSurfP (Table 3)

FastSNP assigns the risk ranking for SNPs after excisingfunctional effect information From FastSNP results wefound 2 SNPs with rs72964406 (ESE motif CCTCATGscore is 2897886) and rs34304568 (ESE motif GGACCTAGscore is 2088850) located in coding regions and altered ESE(exonic splicing enhancers) motifs indicating that all thesensSNPs may have the ability to affect the level position andtiming of gene expression or regulate the splicing of GalNAc-T1 gene transcript No functional role was predicted for theSNPs which are found be located in 31015840 UTR region

33 AMBER-Energy Minimization Solvent Accessibilityand Electrostatic Interaction Effects of Functional SNPson GalNAc-T1 Protein Based on multiple-threadingalignments I-TASSER builds high-quality 3D structuresof protein molecules from their amino acid sequenceAfter retrieving protein FASTA sequence of GalNAc-T1from UniProt (ID ldquoQ10472rdquo) database protein sequencewas given to I-TASSER as an input The I-TASSER toolcreated the 5 full-length models for GalNAc-T1 protein(with C-scores minus075 minus129 minus189 minus271 and minus283) byexcising top 10 structures with C-scores after targeting thePDB library hits (Table 4) Among the 5 predicted modelsmodel 1 carried the high-quality confidence in the formof C-score (minus075) TM-Score (062 plusmn 014) and RMSD(93plusmn 46A) hence it was selected for further analysis usingChimera Swiss Pdb viewer GROMOS96 and AMBERff99SB software to visualize and compute the total energybefore and after energy minimization The superimposedstructures of mutant and wild-type residues with theirsurface were visualized in Figure 2 The total energy of

Computational and Mathematical Methods in Medicine 7

Table 3 Surface accessibility of wild-type and mutant variants in GalNAc-T1

Amino acid change Class assignment Relative surfaceaccessibility (RSA)

Absolute SurfaceAccessibility prediction

minusminus

minusminus

S143P

G258V

Y414D

Z-fit score for RSA

(a) (b) (c)

Figure 2 Superimposed structures of GalNAc-T1 native (camel color) and mutant (blue color) models to visualize the stereochemicalconformation of wild type and mutant residues at 143 258 and 414 positions

Table 4 Top 10 templates used by I-TASSER to create the high-quality models for human GALNT1 secondary structure

Rank PDB Hit Iden1 Iden2 Cov Norm 119885-score1 2ffuA 045 041 088 3122 2d7iA 043 042 091 7153 2ffuA 045 041 088 8164 2ffuA 045 041 088 5975 2ffuA 045 041 088 4416 2d7iA 043 042 091 7117 2ffuA 045 041 088 7998 2d7rA 043 042 092 6429 1xhbA 099 079 080 30610 2ffuA 045 041 088 621Rank of templates represents the top ten threading templates used by I-TASSER ldquoIdent1rdquo is the percentage sequence identity of the templates in thethreading aligned region with the query sequence ldquoIdent2rdquo is the percentagesequence identity of the whole template chains with query sequence ldquoCovrdquorepresents the coverage of the threading alignment and is equal to thenumber of aligned residues divided by the length of query protein ldquoNorm119885-scorerdquo is the normalized 119885-score of the threading alignments Alignmentwith a Normalized 119885-score gt 1 mean a good alignment and vice versa Thetop 10 alignments reported above (in order of their ranking) are taken fromthe following threading programs 1 MUSTER 2 HHSEARCH 3 SP3 4PROSPECT2 5 PPA-I 6 HHSEARCH I 7 SPARKS 8 SAMT99 9 MUSTER10 HHSEARCH

native structure was found to be 13294691 kJmol before theenergyminimization stage and it was minus13882539 kJmol afterthe energy minimization step whereas mutant structures

Table 5 Total energy of native and mutant structures before andafter energy minimization

Amino acidvariants

Total energy beforeenergy minimization

(kJmol)

Total energy afterenergy minimization

(kJmol)Native 13140137 minus13882539S143P 13392757 minus13767374G258V 59777910 minus10727144Y414D 12995687 minus13918198

exhibited deviation in total energy value calculated beforeand after energy minimization (Table 5) Among 3 screenedmutations G258V showed the increase in total energy ofGalNAc-T1 before (597779100 kJmol) and after energyminimization (minus10727144 kJmol) Chimera and Swiss PDBviewer were used to visualize the structural features of aminoacids in native and mutant protein chains During structuralvisualization for all 3 mutations only mutant residue (valine)at 258 position showed a network of clashes with Tyr268(Figure 3) Additionally S143P G258V and Y414D variantswere analyzed for solvent accessibility (in form of Z-score)and stability and a decrease in both parameters was observedfor all three variants (Table 3)

34 The Structural Impacts of Functional GalNAc-T1Mutations Project HOPE simulates the structural featuresof amino acid substitution on native protein structure

8 Computational and Mathematical Methods in Medicine

(a)

(b)

(c)

Figure 3 H-bonding (green discontinuous line) interactions and clashes (pink discontinuous line) of wild type and mutant analogues withthe vicinal amino acid residues (a) Ser143 is examined with single H-bonding with Val139 and converted into clash as a result of Pro atthe same position (b) At 258 position one H-bond is observed with Arg266 in both native (Gly258) and mutant (Val258) structures but anetwork of clashes appeared between Val258 and Tyr268 (c) Tyr414 is visualized with five H-bonding interactions for Pro410 and Asn417and one H-bond is distorted due to appearance of mutant aspartic acid at the same position

Computational and Mathematical Methods in Medicine 9

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

lowast lowastlowast lowastlowastlowastlowastlowast lowastlowast

lowast lowast lowastlowastlowast lowastlowast

lowast

lowast

lowast lowast

lowastlowastlowastlowast lowast lowast lowastlowast

lowast lowast lowastlowastlowast lowast lowast lowastlowastlowastlowastlowast lowast lowastlowast lowast lowast lowastlowast lowast lowastlowast

lowast lowastlowastlowast lowast lowast lowastlowast lowast lowast143 Catalytic subdomain A

258 Catalytic subdomain B

414

Ricin B-type lectin

Figure 4 Multiple sequence alignments and evolutionary conservation behaviour among human GalNAc-T1 to GalNAc-T10 members ofGALNTs family S143 and Y414 residues are found conserved in the catalytic subdomain A and linker B domain (area between catalyticsubdomain B and Ricin B-type lectin) respectively G258 is observed in the vicinity of conservation groups with strongly similar properties(shown with ldquordquo)

The S143 residue is located within a stretch of residuesannotated in Uniprot as catalytic domain A (Figure 4 fromClustal Omega) The hydrophobicity and size differencebetween wild-type and mutant residue (Pro) can disrupt theH-bonding interactions with the neighbouring residues andhence the protein framework The wild-type residue formsa hydrogen bond with the valine on position 139 Due tohigh rigidity of proline moiety this mutation might abolishthe required flexibility of the protein at this position andcould affect the catalytic tendency Based on conservationbehaviour this mutation is probably damaging to the protein

For G258V the wild-type residue is resided in thelinker region between catalytic subdomains A and B andcan distract the required flexibility of core structure bysubstituting Gly258 with valine Additionally the structuralanalysis ofVal258 showed some clashes for Tyr268whichmaycontribute to the extra energy in the protein structure andhence the decrease in stability

In theY414Dvariant Tyr414 indicated fiveH-interactionswith Pro410 Phe411 and Asn417 whereas mutant Asp414exhibited the four H-bonding interactions in the core of theprotein (or protein complex) due to the differences in charge

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 6: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

6 Computational and Mathematical Methods in Medicine

Table 2 Characterization of SNPs by using PolyPhen-2 PANTHER and Fathmm based classification systems

Amino acid change PolyPhen-2 PANTHER FathmmA1A2 Score Prediction Specificitysensitivity subPSEC Score Prediction Score PredictionA5T 0411 Benign 090089 minus296529 Tolerated 059 ToleratedL20V 0000 Benign 000100 minus265438 Tolerated 069 ToleratedL25V 0960 Probably damaging 095078 minus345127 Deleterious 035 ToleratedD46G 0004 Benign 059097 minus334504 Deleterious 051 ToleratedP54L 0001 Benign 015099 minus427568 Deleterious 056 ToleratedD71E 0000 Benign 000100 minus340972 Deleterious 068 ToleratedD71H 0132 Benign 086093 minus524027 Deleterious 049 ToleratedS143P 0999 Probably damaging 099014 minus74941 Deleterious 015 ToleratedV216M 0411 Benign 090089 minus556762 Deleterious 026 ToleratedG258V 1000 Probably damaging 100000 minus440932 Deleterious 035 ToleratedN379H 0000 Benign 000100 minus341433 Deleterious minus035 ToleratedI384V 0000 Benign 000100 minus381526 Deleterious minus031 ToleratedG401D 0000 Benign 000100 minus323851 Deleterious 156 ToleratedY414D 1000 Probably damaging 100000 minus917274 Deleterious minus182 DamagingN440H 0048 Benign 083094 minus406922 Deleterious minus117 ToleratedP510T 0009 Benign 077096 minus346487 Deleterious minus109 ToleratedN541S 0000 Benign 000100 minus280107 Tolerated minus103 ToleratedV553I 0000 Benign 000100 minus2409 Tolerated minus098 ToleratedPolyPhen-2 probably damaging (probabilistic score gt 085) possibly damaging (probabilistic score gt 015) and benign (remaining)PANTHER deleterious or intolerant (subPSEC score le minus3) less deleterious (subPSEC score ge minus3)

chaperones When analyzed by FoldX severe reductionin stability was demonstrated by S143P (750Kcalmol)G258V (782Kcalmol) Y414D (603 Kcalmol) variants whileN440H (167 Kcalmol) P510T (225 Kcalmol) and N541S(055 Kcalmol) were accounted to slightly reduce the stabilityof GalNAc-T1 protein

325 Characterization of Functional SNPs by ConcordanceAnalysis The efficacy of functional SNP prediction can bemademore reliable by combining the results of empirical andsupport vector machine (SVM) based approaches Hencewe performed the concordance analysis to get the integratedpicture with SIFT PolyPhen-2 PANTHER and SNPeffecttools Out of the 18 nsSNPs 3 (1666) were predicted tobe ldquodeleteriousrdquo by SIFT whereas this prediction rate wasincreased to 4 (22) as ldquoprobably damagingrdquo when analyzedby PolyPhen-2 5 (2777) as ldquoseverely destabilizingrdquo by SNPeffect and 14 (78) as ldquodeleteriousrdquo by PANTHER Fromconcordance analysis 3 amino acid variants (S143P G258Vand Y414D) were commonly predicted by all 4 of the in silicotools (SIFT PolyPhen-2 PANTHER and SNPeffect)

326 Identification of Functional SNPs in Regulatory andCon-served Regions Using empirical Bayesian inference Con-Surf database characterizes the evolutionary conservation ofamino acids Our ConSurf results showed that only S143 waslocated in highly conserved region and predicted to havefunctional impact on GalNAc-T1 proteinThe remaining twonsSNPs responsible for G258V and Y414D are found in thevicinity of conserved residues and found to be buried in wildtype residues by NetSurfP (Table 3)

FastSNP assigns the risk ranking for SNPs after excisingfunctional effect information From FastSNP results wefound 2 SNPs with rs72964406 (ESE motif CCTCATGscore is 2897886) and rs34304568 (ESE motif GGACCTAGscore is 2088850) located in coding regions and altered ESE(exonic splicing enhancers) motifs indicating that all thesensSNPs may have the ability to affect the level position andtiming of gene expression or regulate the splicing of GalNAc-T1 gene transcript No functional role was predicted for theSNPs which are found be located in 31015840 UTR region

33 AMBER-Energy Minimization Solvent Accessibilityand Electrostatic Interaction Effects of Functional SNPson GalNAc-T1 Protein Based on multiple-threadingalignments I-TASSER builds high-quality 3D structuresof protein molecules from their amino acid sequenceAfter retrieving protein FASTA sequence of GalNAc-T1from UniProt (ID ldquoQ10472rdquo) database protein sequencewas given to I-TASSER as an input The I-TASSER toolcreated the 5 full-length models for GalNAc-T1 protein(with C-scores minus075 minus129 minus189 minus271 and minus283) byexcising top 10 structures with C-scores after targeting thePDB library hits (Table 4) Among the 5 predicted modelsmodel 1 carried the high-quality confidence in the formof C-score (minus075) TM-Score (062 plusmn 014) and RMSD(93plusmn 46A) hence it was selected for further analysis usingChimera Swiss Pdb viewer GROMOS96 and AMBERff99SB software to visualize and compute the total energybefore and after energy minimization The superimposedstructures of mutant and wild-type residues with theirsurface were visualized in Figure 2 The total energy of

Computational and Mathematical Methods in Medicine 7

Table 3 Surface accessibility of wild-type and mutant variants in GalNAc-T1

Amino acid change Class assignment Relative surfaceaccessibility (RSA)

Absolute SurfaceAccessibility prediction

minusminus

minusminus

S143P

G258V

Y414D

Z-fit score for RSA

(a) (b) (c)

Figure 2 Superimposed structures of GalNAc-T1 native (camel color) and mutant (blue color) models to visualize the stereochemicalconformation of wild type and mutant residues at 143 258 and 414 positions

Table 4 Top 10 templates used by I-TASSER to create the high-quality models for human GALNT1 secondary structure

Rank PDB Hit Iden1 Iden2 Cov Norm 119885-score1 2ffuA 045 041 088 3122 2d7iA 043 042 091 7153 2ffuA 045 041 088 8164 2ffuA 045 041 088 5975 2ffuA 045 041 088 4416 2d7iA 043 042 091 7117 2ffuA 045 041 088 7998 2d7rA 043 042 092 6429 1xhbA 099 079 080 30610 2ffuA 045 041 088 621Rank of templates represents the top ten threading templates used by I-TASSER ldquoIdent1rdquo is the percentage sequence identity of the templates in thethreading aligned region with the query sequence ldquoIdent2rdquo is the percentagesequence identity of the whole template chains with query sequence ldquoCovrdquorepresents the coverage of the threading alignment and is equal to thenumber of aligned residues divided by the length of query protein ldquoNorm119885-scorerdquo is the normalized 119885-score of the threading alignments Alignmentwith a Normalized 119885-score gt 1 mean a good alignment and vice versa Thetop 10 alignments reported above (in order of their ranking) are taken fromthe following threading programs 1 MUSTER 2 HHSEARCH 3 SP3 4PROSPECT2 5 PPA-I 6 HHSEARCH I 7 SPARKS 8 SAMT99 9 MUSTER10 HHSEARCH

native structure was found to be 13294691 kJmol before theenergyminimization stage and it was minus13882539 kJmol afterthe energy minimization step whereas mutant structures

Table 5 Total energy of native and mutant structures before andafter energy minimization

Amino acidvariants

Total energy beforeenergy minimization

(kJmol)

Total energy afterenergy minimization

(kJmol)Native 13140137 minus13882539S143P 13392757 minus13767374G258V 59777910 minus10727144Y414D 12995687 minus13918198

exhibited deviation in total energy value calculated beforeand after energy minimization (Table 5) Among 3 screenedmutations G258V showed the increase in total energy ofGalNAc-T1 before (597779100 kJmol) and after energyminimization (minus10727144 kJmol) Chimera and Swiss PDBviewer were used to visualize the structural features of aminoacids in native and mutant protein chains During structuralvisualization for all 3 mutations only mutant residue (valine)at 258 position showed a network of clashes with Tyr268(Figure 3) Additionally S143P G258V and Y414D variantswere analyzed for solvent accessibility (in form of Z-score)and stability and a decrease in both parameters was observedfor all three variants (Table 3)

34 The Structural Impacts of Functional GalNAc-T1Mutations Project HOPE simulates the structural featuresof amino acid substitution on native protein structure

8 Computational and Mathematical Methods in Medicine

(a)

(b)

(c)

Figure 3 H-bonding (green discontinuous line) interactions and clashes (pink discontinuous line) of wild type and mutant analogues withthe vicinal amino acid residues (a) Ser143 is examined with single H-bonding with Val139 and converted into clash as a result of Pro atthe same position (b) At 258 position one H-bond is observed with Arg266 in both native (Gly258) and mutant (Val258) structures but anetwork of clashes appeared between Val258 and Tyr268 (c) Tyr414 is visualized with five H-bonding interactions for Pro410 and Asn417and one H-bond is distorted due to appearance of mutant aspartic acid at the same position

Computational and Mathematical Methods in Medicine 9

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

lowast lowastlowast lowastlowastlowastlowastlowast lowastlowast

lowast lowast lowastlowastlowast lowastlowast

lowast

lowast

lowast lowast

lowastlowastlowastlowast lowast lowast lowastlowast

lowast lowast lowastlowastlowast lowast lowast lowastlowastlowastlowastlowast lowast lowastlowast lowast lowast lowastlowast lowast lowastlowast

lowast lowastlowastlowast lowast lowast lowastlowast lowast lowast143 Catalytic subdomain A

258 Catalytic subdomain B

414

Ricin B-type lectin

Figure 4 Multiple sequence alignments and evolutionary conservation behaviour among human GalNAc-T1 to GalNAc-T10 members ofGALNTs family S143 and Y414 residues are found conserved in the catalytic subdomain A and linker B domain (area between catalyticsubdomain B and Ricin B-type lectin) respectively G258 is observed in the vicinity of conservation groups with strongly similar properties(shown with ldquordquo)

The S143 residue is located within a stretch of residuesannotated in Uniprot as catalytic domain A (Figure 4 fromClustal Omega) The hydrophobicity and size differencebetween wild-type and mutant residue (Pro) can disrupt theH-bonding interactions with the neighbouring residues andhence the protein framework The wild-type residue formsa hydrogen bond with the valine on position 139 Due tohigh rigidity of proline moiety this mutation might abolishthe required flexibility of the protein at this position andcould affect the catalytic tendency Based on conservationbehaviour this mutation is probably damaging to the protein

For G258V the wild-type residue is resided in thelinker region between catalytic subdomains A and B andcan distract the required flexibility of core structure bysubstituting Gly258 with valine Additionally the structuralanalysis ofVal258 showed some clashes for Tyr268whichmaycontribute to the extra energy in the protein structure andhence the decrease in stability

In theY414Dvariant Tyr414 indicated fiveH-interactionswith Pro410 Phe411 and Asn417 whereas mutant Asp414exhibited the four H-bonding interactions in the core of theprotein (or protein complex) due to the differences in charge

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 7: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

Computational and Mathematical Methods in Medicine 7

Table 3 Surface accessibility of wild-type and mutant variants in GalNAc-T1

Amino acid change Class assignment Relative surfaceaccessibility (RSA)

Absolute SurfaceAccessibility prediction

minusminus

minusminus

S143P

G258V

Y414D

Z-fit score for RSA

(a) (b) (c)

Figure 2 Superimposed structures of GalNAc-T1 native (camel color) and mutant (blue color) models to visualize the stereochemicalconformation of wild type and mutant residues at 143 258 and 414 positions

Table 4 Top 10 templates used by I-TASSER to create the high-quality models for human GALNT1 secondary structure

Rank PDB Hit Iden1 Iden2 Cov Norm 119885-score1 2ffuA 045 041 088 3122 2d7iA 043 042 091 7153 2ffuA 045 041 088 8164 2ffuA 045 041 088 5975 2ffuA 045 041 088 4416 2d7iA 043 042 091 7117 2ffuA 045 041 088 7998 2d7rA 043 042 092 6429 1xhbA 099 079 080 30610 2ffuA 045 041 088 621Rank of templates represents the top ten threading templates used by I-TASSER ldquoIdent1rdquo is the percentage sequence identity of the templates in thethreading aligned region with the query sequence ldquoIdent2rdquo is the percentagesequence identity of the whole template chains with query sequence ldquoCovrdquorepresents the coverage of the threading alignment and is equal to thenumber of aligned residues divided by the length of query protein ldquoNorm119885-scorerdquo is the normalized 119885-score of the threading alignments Alignmentwith a Normalized 119885-score gt 1 mean a good alignment and vice versa Thetop 10 alignments reported above (in order of their ranking) are taken fromthe following threading programs 1 MUSTER 2 HHSEARCH 3 SP3 4PROSPECT2 5 PPA-I 6 HHSEARCH I 7 SPARKS 8 SAMT99 9 MUSTER10 HHSEARCH

native structure was found to be 13294691 kJmol before theenergyminimization stage and it was minus13882539 kJmol afterthe energy minimization step whereas mutant structures

Table 5 Total energy of native and mutant structures before andafter energy minimization

Amino acidvariants

Total energy beforeenergy minimization

(kJmol)

Total energy afterenergy minimization

(kJmol)Native 13140137 minus13882539S143P 13392757 minus13767374G258V 59777910 minus10727144Y414D 12995687 minus13918198

exhibited deviation in total energy value calculated beforeand after energy minimization (Table 5) Among 3 screenedmutations G258V showed the increase in total energy ofGalNAc-T1 before (597779100 kJmol) and after energyminimization (minus10727144 kJmol) Chimera and Swiss PDBviewer were used to visualize the structural features of aminoacids in native and mutant protein chains During structuralvisualization for all 3 mutations only mutant residue (valine)at 258 position showed a network of clashes with Tyr268(Figure 3) Additionally S143P G258V and Y414D variantswere analyzed for solvent accessibility (in form of Z-score)and stability and a decrease in both parameters was observedfor all three variants (Table 3)

34 The Structural Impacts of Functional GalNAc-T1Mutations Project HOPE simulates the structural featuresof amino acid substitution on native protein structure

8 Computational and Mathematical Methods in Medicine

(a)

(b)

(c)

Figure 3 H-bonding (green discontinuous line) interactions and clashes (pink discontinuous line) of wild type and mutant analogues withthe vicinal amino acid residues (a) Ser143 is examined with single H-bonding with Val139 and converted into clash as a result of Pro atthe same position (b) At 258 position one H-bond is observed with Arg266 in both native (Gly258) and mutant (Val258) structures but anetwork of clashes appeared between Val258 and Tyr268 (c) Tyr414 is visualized with five H-bonding interactions for Pro410 and Asn417and one H-bond is distorted due to appearance of mutant aspartic acid at the same position

Computational and Mathematical Methods in Medicine 9

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

lowast lowastlowast lowastlowastlowastlowastlowast lowastlowast

lowast lowast lowastlowastlowast lowastlowast

lowast

lowast

lowast lowast

lowastlowastlowastlowast lowast lowast lowastlowast

lowast lowast lowastlowastlowast lowast lowast lowastlowastlowastlowastlowast lowast lowastlowast lowast lowast lowastlowast lowast lowastlowast

lowast lowastlowastlowast lowast lowast lowastlowast lowast lowast143 Catalytic subdomain A

258 Catalytic subdomain B

414

Ricin B-type lectin

Figure 4 Multiple sequence alignments and evolutionary conservation behaviour among human GalNAc-T1 to GalNAc-T10 members ofGALNTs family S143 and Y414 residues are found conserved in the catalytic subdomain A and linker B domain (area between catalyticsubdomain B and Ricin B-type lectin) respectively G258 is observed in the vicinity of conservation groups with strongly similar properties(shown with ldquordquo)

The S143 residue is located within a stretch of residuesannotated in Uniprot as catalytic domain A (Figure 4 fromClustal Omega) The hydrophobicity and size differencebetween wild-type and mutant residue (Pro) can disrupt theH-bonding interactions with the neighbouring residues andhence the protein framework The wild-type residue formsa hydrogen bond with the valine on position 139 Due tohigh rigidity of proline moiety this mutation might abolishthe required flexibility of the protein at this position andcould affect the catalytic tendency Based on conservationbehaviour this mutation is probably damaging to the protein

For G258V the wild-type residue is resided in thelinker region between catalytic subdomains A and B andcan distract the required flexibility of core structure bysubstituting Gly258 with valine Additionally the structuralanalysis ofVal258 showed some clashes for Tyr268whichmaycontribute to the extra energy in the protein structure andhence the decrease in stability

In theY414Dvariant Tyr414 indicated fiveH-interactionswith Pro410 Phe411 and Asn417 whereas mutant Asp414exhibited the four H-bonding interactions in the core of theprotein (or protein complex) due to the differences in charge

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 8: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

8 Computational and Mathematical Methods in Medicine

(a)

(b)

(c)

Figure 3 H-bonding (green discontinuous line) interactions and clashes (pink discontinuous line) of wild type and mutant analogues withthe vicinal amino acid residues (a) Ser143 is examined with single H-bonding with Val139 and converted into clash as a result of Pro atthe same position (b) At 258 position one H-bond is observed with Arg266 in both native (Gly258) and mutant (Val258) structures but anetwork of clashes appeared between Val258 and Tyr268 (c) Tyr414 is visualized with five H-bonding interactions for Pro410 and Asn417and one H-bond is distorted due to appearance of mutant aspartic acid at the same position

Computational and Mathematical Methods in Medicine 9

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

lowast lowastlowast lowastlowastlowastlowastlowast lowastlowast

lowast lowast lowastlowastlowast lowastlowast

lowast

lowast

lowast lowast

lowastlowastlowastlowast lowast lowast lowastlowast

lowast lowast lowastlowastlowast lowast lowast lowastlowastlowastlowastlowast lowast lowastlowast lowast lowast lowastlowast lowast lowastlowast

lowast lowastlowastlowast lowast lowast lowastlowast lowast lowast143 Catalytic subdomain A

258 Catalytic subdomain B

414

Ricin B-type lectin

Figure 4 Multiple sequence alignments and evolutionary conservation behaviour among human GalNAc-T1 to GalNAc-T10 members ofGALNTs family S143 and Y414 residues are found conserved in the catalytic subdomain A and linker B domain (area between catalyticsubdomain B and Ricin B-type lectin) respectively G258 is observed in the vicinity of conservation groups with strongly similar properties(shown with ldquordquo)

The S143 residue is located within a stretch of residuesannotated in Uniprot as catalytic domain A (Figure 4 fromClustal Omega) The hydrophobicity and size differencebetween wild-type and mutant residue (Pro) can disrupt theH-bonding interactions with the neighbouring residues andhence the protein framework The wild-type residue formsa hydrogen bond with the valine on position 139 Due tohigh rigidity of proline moiety this mutation might abolishthe required flexibility of the protein at this position andcould affect the catalytic tendency Based on conservationbehaviour this mutation is probably damaging to the protein

For G258V the wild-type residue is resided in thelinker region between catalytic subdomains A and B andcan distract the required flexibility of core structure bysubstituting Gly258 with valine Additionally the structuralanalysis ofVal258 showed some clashes for Tyr268whichmaycontribute to the extra energy in the protein structure andhence the decrease in stability

In theY414Dvariant Tyr414 indicated fiveH-interactionswith Pro410 Phe411 and Asn417 whereas mutant Asp414exhibited the four H-bonding interactions in the core of theprotein (or protein complex) due to the differences in charge

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 9: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

Computational and Mathematical Methods in Medicine 9

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

sp |Q10472 |GALT1 HUMANsp |Q10471 |GALT2 HUMANsp |Q14435 |GALT3 HUMANsp |Q8N4A0 |GALT4 HUMANsp |Q7Z7M9 |GALT5 HUMANsp |Q8NCL4 |GALT6 HUMANsp |Q86SF2 |GALT 7 HUMANsp |Q9NY28 |GALT8 HUMANsp |Q9HCQ5 |GALT9 HUMANsp | Q86SR1 |GLT10 HUMAN

lowast lowastlowast lowastlowastlowastlowastlowast lowastlowast

lowast lowast lowastlowastlowast lowastlowast

lowast

lowast

lowast lowast

lowastlowastlowastlowast lowast lowast lowastlowast

lowast lowast lowastlowastlowast lowast lowast lowastlowastlowastlowastlowast lowast lowastlowast lowast lowast lowastlowast lowast lowastlowast

lowast lowastlowastlowast lowast lowast lowastlowast lowast lowast143 Catalytic subdomain A

258 Catalytic subdomain B

414

Ricin B-type lectin

Figure 4 Multiple sequence alignments and evolutionary conservation behaviour among human GalNAc-T1 to GalNAc-T10 members ofGALNTs family S143 and Y414 residues are found conserved in the catalytic subdomain A and linker B domain (area between catalyticsubdomain B and Ricin B-type lectin) respectively G258 is observed in the vicinity of conservation groups with strongly similar properties(shown with ldquordquo)

The S143 residue is located within a stretch of residuesannotated in Uniprot as catalytic domain A (Figure 4 fromClustal Omega) The hydrophobicity and size differencebetween wild-type and mutant residue (Pro) can disrupt theH-bonding interactions with the neighbouring residues andhence the protein framework The wild-type residue formsa hydrogen bond with the valine on position 139 Due tohigh rigidity of proline moiety this mutation might abolishthe required flexibility of the protein at this position andcould affect the catalytic tendency Based on conservationbehaviour this mutation is probably damaging to the protein

For G258V the wild-type residue is resided in thelinker region between catalytic subdomains A and B andcan distract the required flexibility of core structure bysubstituting Gly258 with valine Additionally the structuralanalysis ofVal258 showed some clashes for Tyr268whichmaycontribute to the extra energy in the protein structure andhence the decrease in stability

In theY414Dvariant Tyr414 indicated fiveH-interactionswith Pro410 Phe411 and Asn417 whereas mutant Asp414exhibited the four H-bonding interactions in the core of theprotein (or protein complex) due to the differences in charge

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 10: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

10 Computational and Mathematical Methods in Medicine

Table 6 Ligand binding sites for wild type andmutant forms (S143PG258V and Y414D)

Wild type S143P G258V Y414DAsn82 Asn82 Asn82 Asn82Phe84 Val123 Phe84 Phe84Val123 Phe124 Val123 Val123Phe124 His125 Phe124 Phe124His125 Asn126 His125 His125Asn126 Glu127 Asn126 Asn126Glu127 Ala128 Glu127 Glu127Thr131 Lau132 Asp156 Asp156Asp156 Asp156 Gly188 Leu189Gly188 Leu189 Leu189 Asp209Leu189 Asp209 Asp209 Ala210Arg193 Ala210 Ala210 His211Asp209 His211 His211 Ile315Ala210 Ile315 Ile315 Trp316His211 Trp316 Trp316 His344Ile315 Arg347 His344 Val345Trp316 Thr350 Val345 Phe346His344 Pro351 Phe346 Arg347Val345 mdash Arg347 Lys348Phe346 mdash Lys348 Ala349mdash mdash Ala349 Thr350mdash mdash Thr350 Pro351mdash mdash Pro351 Tyr352

density and hydrophobicity between wild-type and mutantresidue

35 Simulation for GalNAc-T1 Molecular Binding Sites andProtein-Protein Interactions Including the structure-basedprediction of protein function FTSite identifies the ligandbinding sites for a particular protein Upon querying withFTSite a considerable difference in the number of ligandbinding sites was observed between wild type and mutantforms (S143P G258V and Y414D) of GalNAc-T1 protein(Table 6)

STRING database annotates the functional interactionsamong the proteins in a cell STRING results predictedthe functional association pattern of GalNAc-T1 protein(physical and functional) with MUC1 GBGT1 CHPFST6GALNAC1 B4GalNAc-T1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6 partners with high confidenceshown with bold lines in Figure 5 Based on the confidencescores of GalNAc-T1 protein interactions MUC1 a denselyO-glycosylated transmembrane protein critical to cellularintegrity was chosen for molecular docking analysis inorder to identify the plausible structural and functionalimplications of S143P G258V and Y414D variants TheH-bonding interaction existing between Gly258 (greenportion) and Arg266 residues of native GalNAc-T1 is in turncritical for its interaction with N-H ofThr located in GVTSA(tandem repeat region) of MUC1 by means of H-bonding

However due to the replacement of Gly at 258 position withVal in GalNAc-T1 the H-bonding interaction with MUC1was distorted (Figure 6)

After STRINGmucin biosynthetic pathwaywas retrievedfrom KEGG database to retrieve the molecular reactionand interaction behaviour for GalNAc-T1 and other partnerenzymes (ST6GALNAC1 B4GALNT1 C1GALT1C1 C1GALT1GCNT1 and B3GNT6) to corroborate the STRING results

4 Discussion

The human GalNAc-T1 gene is located on chromosome18q121 region with 11 exons (5882 kb) [1] A 3847 bp lengthm-RNA encodes the functional GalNAc-T1 protein with559 amino acids (642 kDa pI 74) However alternativesplicing of mRNA yields different polynucleotide chainlengths that is 1770 bps (499 aa) and 1120 bps (105 aa)amino acids respectively So far approximately 900 variantslocated in noncoding coding and regulatory regions ofhuman GalNAc-T1 gene are described in dbSNP databaseto date With the advent of high throughput (whole exomeand genome) sequencing practices the number of geneticvariations is growing day by day in an efficient manner [1824 41 42] Hence an important task of human genetics liesin delineating those amino acid variants which can imposespecific structural and functional consequences on proteinfunction [41]

In the current investigation screening for functionalGalNAc-T1 genetic variants in coding region was performedusing sequence- and structure-based algorithms such as SIFTPolyPhen-2 PANTHER and SNP effect From functionalanalysis of SNPs SIFT predicted that 17 of total nsS-NPs are of deleterious type whereas PolyPhen-2 predictedsim22 of total nsSNPs to be highly deleterious But commonpredictions of SIFT and PolyPhen-2 showed that 16 ofvariants were functionalThe significant difference in outputsof SIFT and PolyPhen-2 algorithms is most likely due toutilization of different protein sequence alignments used tocharacterize the variants [42] However good coherence andaccuracy between PANTHER and SIFT (use of similar scor-ingmatrices) have predicted 4 (2222) nsSNPs ofGalNAc-T1to be functional SNPeffect predicted that 3 (166) nsSNPsof the variants to be highly destabilizingThe accuracy of thisprediction percentage remained the same even when com-bined with PolyPhen-2 analysis but increased to 6 (3333)nsSNPs with PANTHER By comparing the output of allthe 4 different in silico tools (SIFT PolyPhen-2 PANTHER-cSNP and SNPeffect) nsSNPs that encodes S143P G258Vand Y414D variants were found to be functionally significantThe differences in prediction capabilities can be attributed tothe fact that every method uses different sets of sequencesand alignments When compared sequence-based predic-tion analysis has a number of advantages over structure-based analysis due to the fact that it considers all types ofeffects at the protein level and is well suitable for proteinswith known relatives But sequence-based predictions areunable to explain the underlying mechanisms between geno-type and phenotype relationships for most of the proteins

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 11: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

Computational and Mathematical Methods in Medicine 11

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

MUC1

B4GALNT1

GBGT1

CHPFGALNT1

C1GALT1C1

GCNT1

ST6GALNAC1

B3GNT6

C1GALT1

ZNF146

(a) (b)

GALNT1 functional partnersGBGT1CHPFST6GALNAC1B4GALNT1MUC1C1GALT1C1C1GALT1GCNT1B3GNT6ZNF146

0970096809520926092509000900089908990680

Nei

ghbo

rhoo

dG

ene f

usio

nC

oocc

urre

nce

Coe

xpre

ssio

nEx

perim

ents

Dat

abas

esTe

xtm

inig

[Hom

olog

y]

Scor

e

Globoside 120572-13-N-acetylgalactosaminyltransferase 1 (347aa)Chondroitin polymerizing factor (775 aa)ST6(120572-N-acetyl-neuraminyl-23-120573-galactosyl-13)-N-acetylgalactosaminide 120572-26-sialy (600 aa)120573-14-N-acetyl-galactosaminyltransferase1 (533 aa)Mucin 1 cell surface associated (273 aa)C1GALT1-specific chaperone 1 (318 aa)Core1 synthase glycoprotein-N-acetylgalactosamine 3-120573-galactosyltransferase 1 (363 aa)Glucosaminyl (N-acetyl) transferase 1 core 2(120573-16-N-acetylglucosaminyltransferase) (428 aa)UDP-GlcNAc-120573 Gal 120573-13-N-acetylglucosaminyltransferase 6 (core 3 synthase) (384 aa)Zinc finger protein 146 (292 aa)

Figure 5 GalNAc-T1 protein-protein interactions with 09 partners One colour is given to each type of evidence in the predicted functionallinks (edges) among eight coloured lines (a) From experimental basis (with score 0925) onlyMUC1 is observed for interaction withGalNAc-T1 From text-mining data GalNAc-T1 interactions are observed for GBGT1 CHPF ST6GALNAC1 B4GALNT1 and MUC1 proteins with0970 0968 0952 0926 and 0925 scores respectively The remaining interactions with C1GALT1C1 CIGALT1 GCNT1 and B3GNT6proteins are simulated with STRING score ranging from 0900 to 0899 (b) Strong association pattern (thick blue lines) of GalNAc-T1 ispredicted for GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1 C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 partners with highconfidence For ST6GALNAC1-GBGT1 ST6GALNAC1-MUC1 GCNT1-CHPF B4GALNT1-GCNT1 CHPF-C1GALT1C1 andGALNT1-ZNF146pairs weak associations are examined in the form of thin blue lines

(a) (b)

Figure 6 3D visualization of G258V variant withMUC1 (a) Ligand binding of GalNAc-T1 native structure withMUC1 indicated single H-bonding interaction (green line) of Gly258 residue (green portion) with Arg266 which is further connected with the GVTSA (tandem repeatregion ofMUC1) by means of H-bonding interaction (b) Picturing of GalNAc-T1 mutant protein structure is observed with one H-bondingloss between Arg266 residue and theMUC1 tandem repeat motif

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 12: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

12 Computational and Mathematical Methods in Medicine

On the contrary the structure-based approach has limitationsin that it cannot be implemented for proteins with unknown3D structures In silico investigation tools that integrateboth sequence- and structure-based approaches will be ofadded advantage in providing reliable prediction results withwider coverage of different aspects of SNP analysis Howevervariability in prediction outputs of these algorithms reflectsboth advantages and disadvantages Therefore decisionsregarding the selection of a suitable tool for SNP analysismustbe subjective to the specific objectives of the investigationundertaken

Our ConSurf and Clustal Omega results indicated thatnsSNPs at positions S143P and Y414D were found in thehighly conserved region and predicted to have potentialimpact on GalNAc-T1 protein The third mutation (G258) isobserved in the vicinity of conservation groups with stronglysimilar properties shown by ldquordquo in Figure 4 Besides thatthe SNPs which encode functional polypeptides are thoseregulatory SNPs that control the gene expression FastSNPtool helped us to prioritize the deleterious SNPs based upontheir impact on determining protein structure deviance intranscriptional levels of the sequence modification in thepremature translation termination and aberrations in thepositions at promoter region for transcription factor bindingAltered O-glycosylation machinery due to aberrant GalNAc-T1 expression may result in aberrantly glycosylated proteinsand new antigenic targets thereby altering host immuno-genic response in ovarian cells [43 44] The two SNPs i-e rs72964406 (ESE motif CCTCATG score is 2897886)and rs34304568 (ESE motif GGACCTAG score 2088850)were located in coding regions and altered ESE (exonicsplicing enhancers) motifs indicating that all these nsSNPsmay have the ability to affect the level location and timingof gene expression or regulate the splicing of GalNAc-T1 genetranscript

The GalNAc-T1 protein structure carries an N-terminaltransmembrane domain a stem region a lumenal catalyticdomain (catalytic AGT1 domain and catalytic BGalNAc-T domain) and a C-terminal ricin-type lectin region(Figure 4) Although all GalNAc-transferases are reportedto share common structural features and the conservedmotifs described above the exact role of each domain incatalysis remains unknown Several biological roles have beendescribed for GalNAc-T1 for example loss of GalNAc-T1leads to reduced leukocyte recruitment and increased rollingvelocity in vivo suggesting the predominant role for GalNAc-T1 in attaching functionally relevant O-linked glycans toselectin ligands [45] Protein structural analysis is performedto fine-tune the picture drawn by the concordance analysisof SIFT PolyPhen-2 and PANTHER Calculation of sol-vent accessibility and force field energy have provided aninsight into the structural and functional impacts of aminoacid substitutions 3D models are constructed to visualizedeviation among the mutant and wild-type protein modelsIn S143P the required protein framework (hydrophobicityand H-bonding interactions) seems to be disturbed at thisposition and could be damaging for the native structureThe G258V substitution carries the tendency to disturb thelocal structure by reframing the backbone flexibility Y414D

variant is found to be associated with regulation of proteinmisfolding Due to high difference in total energy for nativeand mutant (carrying G258V variant) models the properfunctioning of GalNAc-T1 can be disturbed The analysis ofsequence-based motifs and conserved residues is a reliablemethod to identify the functional amino acid residues thatare important for binding catalysis and stability of proteinsUpon analysingwith FTSite a difference in the ligand bindingsites is observed between wild type (20 binding site) andmutant forms (S143P 18 binding sites G258V 23 bindingsites Y414D 23 binding sites) of GalNAc-T1 protein

Protein-protein interaction analysis is a comprehensiveway to understand the global organization of proteomes inthe context of a functional network Interaction networkdisplays biomolecules as nodes and interactions connectingthe two nodes as edges Currently functional network viewfor a single genome is widely used to improve the statis-tical power in human molecular genetics [39] to aid drugdiscovery to better understand metabolic pathways and toderive genotype-phenotype correlations [18 46] STRINGmaps have shown that GalNAc-T1 interacts with GBGT1CHPF ST6GALNAC1 B4GALNAC-T1 MUC1 C1GALT1C1C1GALT1 GCNT1 B3GNT6 and ZNF146 partners by 36interactions The Strong interaction patterns were observedfor GBGT1 CHPF ST6GALNAC1 B4GALNT1 MUC1C1GALT1C1 C1GALT1 GCNT1 B3GNT6 and ZNF146 part-ners From KEGG-Pathway analysis biochemical (stereospe-cific) reactions for GalNAc-T1 ST6GALNAC1 B4GALNT1MUC1 C1GALT1C1 C1GALT1 and GCNT1 indicated theindividual and combined role of each enzyme in regulationof glycosylation phenomena to carry out mucin biosynthesispathway (Figure 7) Noticeably most of these enzymatic pro-teins are involved in cell adhesion and membrane transportTherefore aberrant glycosylations in the formofmutations inGalNAc-T1 gene could distort themucin andmany associatedpathways (ERK SRC andNF-kappa-B pathways) involved inmaintaining cellular response and integrity Of the 3GalNAc-T1 variants tested G258Vwas found to disturb the interactionof Arg266 ofGalNAc-T1withN-HofThr (inGVTSA tandemrepeat region) of MUC1 [47] thus it might impair thepossible transfer of GalNAc residue from UDP-GalNAc tohydroxyl group of SerThr during O-glycosylation reaction(Figure 7) Over expression aberrant intracellular localiza-tion and changes in glycosylation of MUC1 and MUC7 areoften reported in tumors of colon breast ovarian lung andpancreatic origin [24 48] Since MUC1 plays an essentialrole in cellular signalling and microbial pathogenicity it isconceivable to expect that G258V mutation is pathogenic tocellular integrity and susceptibility to certain human disease

5 Conclusion

This study represents the first comprehensive investigationthat identified the functional SNPs in GalNAc-T1 geneusing sequence- and structure-based homology algorithmsAlthough there were notable differences in the predictionbasis of selected in silico algorithms our concordance analysiscorroborated the characterization of suspected SNPs that

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 13: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

Computational and Mathematical Methods in Medicine 13

GalNAc1205721-O-Thr

C1GALT1C1GALT1C1

GCNT1

GALNT1

B3GNT6

GCNT1B4GALT5

Gal1205731-3GalNAc1205721-O-Thr

GlcNAc1205731-6GalNAc1205721-O-Thr

GalNAc1205721-3GalNAc1205721-O-Thr

GlcNAc1205731-3GalNAc1205721-O-Thr

Gal1205731-4GlcNAc1205731-6GalNAc1205721-O-Thr

Figure 7 Complex structure of GalNAc-T1 enzyme with UDPMUC1 (tandem repeat region ldquoGSTAPPAHGVTSAPrdquo) and GalNAc residueGalNAc sugar is attached at Thr of tandem repeat region inMUC1 His125 and Asp156 residues from catalytic A domain and Ile315 Trp316and Thr350 from catalytic B domain are found in contact with UDP (orange red color) Under the action of ST6GALNAC1 B4GALNT1C1GALT1C1 C1GALT1 GCNT1 andB3GNT6GalNAc1-O-Thr is stereospecifically converted into different glycan chains to regulate themucinbiosynthesis pathway

could play significant roles in cellular biology We definedthe structural consequences of S143P G258V and Y414Dvariants on GalNAc-T1 protein in the form of solventaccessibility electrostatic interaction energy calculation andmultiple alignment conservation From potential SNPs theG258V mutation is predicted to cause considerable changein total energy electrostatic intramolecular interactions andfunctional interaction behaviour of GalNAc-T1 protein withMUC1 protein and thereby may impact the possible transferof GalNAc residue from UDP-GalNAc to hydroxyl groupof SerThr during O-glycosylation of proteins Additionally2 regulatory SNPs that influence the splicing and expres-sion pattern of GalNAc-T1 gene have also been identifiedAltered GalNAc-T1 function due to genetic variations andmRNA expression might play a critical role in determiningsusceptibility to complex diseases Furthermore protein-protein interaction pathway and protein-ligand docking havehelped us to understand the stereochemical (anomery andlinkage) roles ofGalNAc-T1 and associated enzymes inmucinbiosynthetic pathway Finally the in silico based nsSNPpredictions would not just be of use in deriving genotype-phenotype relations but to some extent explain the molecularbasis for varied interindividual response to certain drugs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are thankful to the reviewers for their criticalreview and fruitful suggestions Additionally the authorsacknowledge the useful comments from Dr MuhammadNadeem Arshad of Chemistry Department King AbdulazizUniversity KSA

References

[1] E P Bennett D O Weghuis G Merkx A G van KesselH Eiberg and H Clausen ldquoGenomic organization andchromosomal localization of three members of the UDP-N-acetylgalactosaminepolypeptide N-acetylgalactosaminyltrans-ferase familyrdquo Glycobiology vol 8 no 6 pp 547ndash555 1998

[2] Y-Z Chen Y-R Tang Z-Y Sheng and Z Zhang ldquoPredictionof mucin-type O-glycosylation sites in mammalian proteinsusing the composition of k-spaced amino acid pairsrdquo BMCBioinformatics vol 9 article 101 2008

[3] H C Hang and C R Bertozzi ldquoThe chemistry and biology ofmucin-type O-linked glycosylationrdquo Bioorganic and MedicinalChemistry vol 13 no 17 pp 5021ndash5034 2005

[4] T A Gerken O Jamison C L Perrine et al ldquoEmerg-ing paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family ofglycosyltransferasesrdquo Journal of Biological Chemistry vol 286no 16 pp 14493ndash14507 2011

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 14: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

14 Computational and Mathematical Methods in Medicine

[5] M R M Hussain ldquoThe role of Galactose in human health anddiseaserdquo Central European Journal of Medicine vol 7 no 4 pp409ndash419 2012

[6] M R M Hussain H Asfour M Yasir et al ldquoThe microbialpathology of Neu5Ac and gal epitopesrdquo Journal of CarbohydrateChemistry vol 32 no 3 pp 169ndash183 2013

[7] R P McEver and R D Cummings ldquoPerspectives series celladhesion in vascular biology Role of PSGL-1 binding toselectins in leukocyte recruitmentrdquo The Journal of ClinicalInvestigation vol 100 no 3 pp 485ndash491 1997

[8] T Schwientek E P Bennett C Flores et al ldquoFunctionalconservation of subfamilies of putative UDP-N-acetylgalac-tosaminepolypeptide N-acetylgalactosaminyltrans- ferases inDrosophila Caenorhabditis elegans and mammals One sub-family composed of l(2)35Aa is essential in Drosophilardquo Journalof Biological Chemistry vol 277 no 25 pp 22623ndash22638 2002

[9] E P Bennett U Mandel H Clausen T A Gerken T AFritz and L A Tabak ldquoControl of mucin-typeO-glycosylationa classification of the polypeptide GalNAc-transferase genefamilyrdquo Glycobiology vol 22 no 6 pp 736ndash756 2012

[10] C M Phelan Y-Y Tsai E L Goode et al ldquoPolymorphism intheGALNT1 gene and epithelial ovarian cancer in non-hispanicwhite women the Ovarian Cancer Association ConsortiumrdquoCancer Epidemiology Biomarkers and Prevention vol 19 no 2pp 600ndash604 2010

[11] I Tietjen G K Hovingh R R Singaraja et al ldquoSegregationof LIPG CETP and GALNT2 mutations in Caucasian familieswith extremely high HDL cholesterolrdquo PLoS ONE vol 7 no 8Article ID e37437 2012

[12] E L Duncan P Danoy J P Kemp et al ldquoGenome-wideassociation study using extreme truncate selection identifiesnovel genes affecting bone mineral density and fracture riskrdquoPLoS Genetics vol 7 no 4 Article ID e1001372 2011

[13] A M OrsquoHalloran C C Patterson P Horan et al ldquoGeneticpolymorphisms in platelet-related proteins and coronaryartery disease investigation of candidate genes including N-acetylgalactosaminyltransferase 4 (GALNT4) and sulphotrans-ferase 1A12 (SULT1A12)rdquo Journal ofThrombosis andThrombol-ysis vol 27 no 2 pp 175ndash184 2009

[14] K Guda H Moinova J He et al ldquoInactivating germ-lineand somatic mutations in polypeptide N-acetylgalactosam-inyltransferase 12 in human colon cancersrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 106 no 31 pp 12921ndash12925 2009

[15] R Grantham ldquoAmino acid difference formula to help explainprotein evolutionrdquo Science vol 185 no 4154 pp 862ndash864 1974

[16] D Chasman and R M Adams ldquoPredicting the functionalconsequences of non-synonymous single nucleotide polymor-phisms structure-based assessment of amino acid variationrdquoJournal of Molecular Biology vol 307 no 2 pp 683ndash706 2001

[17] S Sunyaev V Ramensky I Koch W Lathe III A S Kon-drashov and P Bork ldquoPrediction of deleterious human allelesrdquoHuman Molecular Genetics vol 10 no 6 pp 591ndash597 2001

[18] Z Wang and J Moult ldquoSNPs protein structure and diseaserdquoHuman Mutation vol 17 no 4 pp 263ndash270 2001

[19] C Baynes C S Healey K A Pooley et al ldquoCommon variantsin the ATM BRCA1 BRCA2 CHEK2 and TP53 cancer suscep-tibility genes are unlikely to increase breast cancer riskrdquo BreastCancer Research vol 9 no 2 article R27 2007

[20] M R M Hussain N A Shaik J Y Al-Aama et al ldquoIn silicoanalysis of Single Nucleotide Polymorphisms (SNPs) in humanBRAFrdquo Gene vol 508 no 2 pp 188ndash196 2012

[21] W W Young Jr D R Holcomb K G Ten Hagen andL A Tabak ldquoExpression of UDP-GalNAcpolypeptide N-acetylgalactosaminyltransferase isoforms in murine tissuesdetermined by real-time PCR a new view of a large familyrdquoGlycobiology vol 13 no 7 pp 549ndash557 2003

[22] M Tenno KOhtsubo F KHagen et al ldquoInitiation of proteinOglycosylation by the polypeptide GalNAcT-1 in vascular biologyand humoral immunityrdquoMolecular and Cellular Biology vol 27no 24 pp 8783ndash8796 2007

[23] P C Ng and S Henikoff ldquoPredicting the effects of amino acidsubstitutions on protein functionrdquo Annual Review of Genomicsand Human Genetics vol 7 pp 61ndash80 2006

[24] M R M Hussain J Nasir and J Y Al-Aama ldquoClinicallysignificant missense variants in human GALNT3 GALNT8GALNT12 and GALNT13 genes intriguing in silico findingsrdquoJournal of Cellular Biochemistry vol 115 no 2 pp 313ndash327 2013

[25] V Ramensky P Bork and S Sunyaev ldquoHuman non-synonymous SNPs server and surveyrdquo Nucleic Acids Researchvol 30 no 17 pp 3894ndash3900 2002

[26] I A Adzhubei S Schmidt L Peshkin et al ldquoA method andserver for predicting damaging missense mutationsrdquo NatureMethods vol 7 no 4 pp 248ndash249 2010

[27] P D Thomas A Kejariwal N Guo et al ldquoApplicationsfor protein sequence-function evolution data mRNAproteinexpression analysis and coding SNP scoring toolsrdquoNucleic AcidsResearch vol 34 pp W645ndashW650 2006

[28] H A Shihab J Gough D N Cooper et al ldquoPredictingthe functional molecular and phenotypic consequences ofamino acid substitutions using hiddenMarkovmodelsrdquoHumanMutation vol 34 no 1 pp 57ndash65 2013

[29] G de Baets J van Durme J Reumers et al ldquoSNPeffect 40 on-line prediction of molecular and structural effects of protein-coding variantsrdquo Nucleic Acids Research vol 40 pp D935ndashD939 2012

[30] I Mayrose D Graur N Ben-Tal and T Pupko ldquoComparisonof site-specific rate-inference methods for protein sequencesempirical Bayesian methods are superiorrdquo Molecular Biologyand Evolution vol 21 no 9 pp 1781ndash1791 2004

[31] T Pupko R E Bell I Mayrose F Glaser and N Ben-TalldquoRate4Site an algorithmic tool for the identification of func-tional regions in proteins by surface mapping of evolutionarydeterminants within their homologuesrdquo Bioinformatics vol 18no 1 pp S71ndashS77 2002

[32] H Ashkenazy E Erez E Martz T Pupko and N Ben-Tal ldquoConSurf 2010 calculating evolutionary conservation insequence and structure of proteins and nucleic acidsrdquo NucleicAcids Research vol 38 no 2 Article ID gkq399 pp W529ndashW533 2010

[33] L Prokunina C Castillejo-Lopez F Oberg et al ldquoA regulatorypolymorphism in PDCD1 is associated with susceptibility tosystemic lupus erythematosus in humansrdquoNature Genetics vol32 no 4 pp 666ndash669 2002

[34] D Gilis and M Rooman ldquoStability changes upon mutation ofsolvent-accessible residues in proteins evaluated by database-derived potentialsrdquo Journal of Molecular Biology vol 257 no5 pp 1112ndash1126 1996

[35] MU Johansson V Zoete OMichielin andNGuex ldquoDefiningand searching for structural motifs using DeepViewSwiss-PdbViewerrdquo BMC Bioinformatics vol 13 article 173 2012

[36] E F Pettersen T D Goddard C C Huang et al ldquoUCSFChimeramdasha visualization system for exploratory research and

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 15: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

Computational and Mathematical Methods in Medicine 15

analysisrdquo Journal of Computational Chemistry vol 25 no 13pp 1605ndash1612 2004

[37] H Venselaar T A H te Beek R K P Kuipers M L Hekkel-man and G Vriend ldquoProtein structure analysis of mutationscausing inheritable diseases An e-Science approach with lifescientist friendly interfacesrdquo BMC Bioinformatics vol 11 article548 2010

[38] C-H Ngan D R Hall B Zerbe L E Grove D Kozakov and SVajda ldquoFTSite high accuracy detection of ligand binding siteson unbound protein structuresrdquo Bioinformatics vol 28 no 2Article ID btr651 pp 286ndash287 2012

[39] D Szklarczyk A Franceschini M Kuhn et al ldquoThe STRINGdatabase in 2011 functional interaction networks of proteinsglobally integrated and scoredrdquo Nucleic Acids Research vol 39no 1 pp D561ndashD568 2011

[40] E Mashiach D Schneidman-Duhovny A Peri Y Shavit RNussinov andH JWolfson ldquoAn integrated suite of fast dockingalgorithmsrdquo Proteins vol 78 no 15 pp 3197ndash3204 2010

[41] M Alanazi Z Abduljaleel W Khan et al ldquoIn silico analysisof single nucleotide polymorphism (SNPs) in human 120573-globingenerdquo PLoS ONE vol 6 no 10 Article ID e25876 2011

[42] S A de Alencar and J C D Lopes ldquoA Comprehensive in silicoanalysis of the functional and structural impact of SNPs inthe IGF1R generdquo Journal of Biomedicine and Biotechnology vol2010 Article ID 715139 8 pages 2010

[43] T A Sellers Y Huang J Cunningham et al ldquoAssociationof single nucleotide polymorphisms in glycosylation geneswith risk of epithelial ovarian cancerrdquo Cancer EpidemiologyBiomarkers and Prevention vol 17 no 2 pp 397ndash404 2008

[44] B Li H J An C Kirmiz C B Lebrilla K S Lam and SMiyamoto ldquoGlycoproteomic analyses of ovarian cancer celllines and sera from ovarian cancer patients show distinct gly-cosylation changes in individual proteinsrdquo Journal of ProteomeResearch vol 7 no 9 pp 3776ndash3788 2008

[45] H Block K Ley and A Zarbock ldquoSevere impairment ofleukocyte recruitment in ppGalNAcT-1-deficient micerdquo TheJournal of Immunology vol 188 no 11 pp 5674ndash5681 2012

[46] L-L Wang Y Li and S-F Zhou ldquoA bioinformatics approachfor the phenotype prediction of nonsynonymous singlenucleotide polymorphisms in human cytochromes P450ErdquoDrug Metabolism and Disposition vol 37 no 5 pp 977ndash9912009

[47] H H Wandall H Hassan E Mirgorodskaya et al ldquoSubstratespecificities of three members of the human UDP-N-acetyl-120572- D-galactosaminepolypeptide N-acetylgalactosaminyltrans-ferase family GalNAc- T1 -T2 and -T3rdquo Journal of BiologicalChemistry vol 272 no 38 pp 23503ndash23514 1997

[48] S J Gendler ldquoMUC1 the renaissance moleculerdquo Journal ofMammary Gland Biology and Neoplasia vol 6 no 3 pp 339ndash353 2001

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 16: In Silico GalNAc-T1 - Hindawi Publishing Corporationdownloads.hindawi.com/journals/cmmm/2014/904052.pdf · e in silico approaches adopted in this investigation o er advantages over

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom