The Transporter Classification Database - Nucleic Acids Research

8
The Transporter Classification Database Milton H. Saier Jr*, Vamsee S. Reddy, Dorjee G. Tamang and A ˚ ke Va ¨ stermark Department of Molecular Biology, University of California at San Diego, La Jolla, CA 92093-0116, USA Received August 21, 2013; Revised October 17, 2013; Accepted October 18, 2013 ABSTRACT The Transporter Classification Database (TCDB; http://www.tcdb.org) serves as a common reference point for transport protein research. The database contains more than 10 000 non-redundant proteins that represent all currently recognized families of transmembrane molecular transport systems. Proteins in TCDB are organized in a five level hier- archical system, where the first two levels are the class and subclass, the second two are the family and subfamily, and the last one is the transport system. Superfamilies that contain multiple families are included as hyperlinks to the five tier TC hierarchy. TCDB includes proteins from all types of living organisms and is the only transporter classification system that is both universal and recognized by the International Union of Biochemistry and Molecular Biology. It has been expanded by manual curation, contains extensive text descriptions providing structural, functional, mechanistic and evolutionary information, is sup- ported by unique software and is interconnected to many other relevant databases. TCDB is of increasing usefulness to the international scientific community and can serve as a model for the expan- sion of database technologies. This manuscript de- scribes an update of the database descriptions previously featured in NAR database issues. INTRODUCTION: THE TC SYSTEM: DESIGN AND RATIONALIZATION In 1995, Fleischmann et al. (1) published the full genome sequence of a living organism, Haemophilus influenzae, the first time such a feat had been accomplished. This revolu- tionary event marked the beginning of the genomics era. Because of our long-standing interest in molecular trans- membrane transport, members of the Saier laboratory recognized the need for a classification system for trans- port systems equivalent to the Enzyme Commission (EC) system already in existence for enzymes (2). The EC system classified enzymes strictly on the basis of function, as it was designed before sequence and phylo- genetic data were available. Even before the advent of the genomics revolution, it became clear that the EC system was tremendously deficient because it could not accommo- date phylogenetic data without restructuring the entire system. Although considered desirable by many, such a restructuring of the EC system has never been achieved. Even before 1995, our laboratory was conducting phylogenetic analyses of transport proteins [for review, see (3)]. We realized that phylogeny reflects protein struc- ture, function and mechanism, and therefore, is an essen- tial component of any molecular classification system. With a desire to conduct whole genome analyses of trans- porters, we recognized a need for a universal system of transport protein classification that took cognizance of both function and phylogeny. With this conviction in mind, we designed what is now known as the Transporter Classification (TC) system. Transporters in the TC Database (TCDB) are classified using a functional/phylogenetic five-tier system (4,5) as follows: N1.L1.N2.N3.N4, where N is a number and L is a letter: N1 is the class; L1 is the subclass; N2 is the family (sometimes actually a superfamily); N3 is the sub- family; (or family in the case of a superfamily) and N4 is the actual transport system. Classes 1–5 are well defined (channels, secondary carriers, primary active transporters, group translocators and transmembrane electron carriers, respectively); classes 6–7 are presently empty, being reserved for yet to be discovered classes, and classes 8 and 9 represent accessory proteins and incompletely characterized proteins, respectively. This system, describing transport systems from all types of living or- ganisms, was formally adopted by the International Union of Biochemistry and Molecular Biology (IUBMB) in June 2001 and has served the international scientific community effectively ever since (6–9). DATABASE CONTENT AND ACCESS Encoded within the relational database schema is the func- tional/phylogenetic TC taxonomy (Figure 1). Users can access the information through our intuitive interface, where information can be viewed at different levels of granularity by returning populated HTML data to the web browser client (the superficial tier). Users can enter *To whom correspondence should be addressed. Tel: +1 858 534 4084; Fax:+1 858 534 7108; Email: [email protected] Published online 12 November 2013 Nucleic Acids Research, 2014, Vol. 42, Database issue D251–D258 doi:10.1093/nar/gkt1097 ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/nar/article/42/D1/D251/1049640 by guest on 02 January 2022

Transcript of The Transporter Classification Database - Nucleic Acids Research

Page 1: The Transporter Classification Database - Nucleic Acids Research

The Transporter Classification DatabaseMilton H Saier Jr Vamsee S Reddy Dorjee G Tamang and Ake Vastermark

Department of Molecular Biology University of California at San Diego La Jolla CA 92093-0116 USA

Received August 21 2013 Revised October 17 2013 Accepted October 18 2013

ABSTRACT

The Transporter Classification Database (TCDBhttpwwwtcdborg) serves as a common referencepoint for transport protein research The databasecontains more than 10 000 non-redundant proteinsthat represent all currently recognized familiesof transmembrane molecular transport systemsProteins in TCDB are organized in a five level hier-archical system where the first two levels are theclass and subclass the second two are the familyand subfamily and the last one is the transportsystem Superfamilies that contain multiplefamilies are included as hyperlinks to the five tierTC hierarchy TCDB includes proteins from alltypes of living organisms and is the only transporterclassification system that is both universal andrecognized by the International Union ofBiochemistry and Molecular Biology It has beenexpanded by manual curation contains extensivetext descriptions providing structural functionalmechanistic and evolutionary information is sup-ported by unique software and is interconnectedto many other relevant databases TCDB is ofincreasing usefulness to the international scientificcommunity and can serve as a model for the expan-sion of database technologies This manuscript de-scribes an update of the database descriptionspreviously featured in NAR database issues

INTRODUCTION THE TC SYSTEM DESIGN ANDRATIONALIZATION

In 1995 Fleischmann et al (1) published the full genomesequence of a living organism Haemophilus influenzae thefirst time such a feat had been accomplished This revolu-tionary event marked the beginning of the genomics eraBecause of our long-standing interest in molecular trans-membrane transport members of the Saier laboratoryrecognized the need for a classification system for trans-port systems equivalent to the Enzyme Commission (EC)system already in existence for enzymes (2) The ECsystem classified enzymes strictly on the basis of

function as it was designed before sequence and phylo-genetic data were available Even before the advent of thegenomics revolution it became clear that the EC systemwas tremendously deficient because it could not accommo-date phylogenetic data without restructuring the entiresystem Although considered desirable by many such arestructuring of the EC system has never been achievedEven before 1995 our laboratory was conducting

phylogenetic analyses of transport proteins [for reviewsee (3)] We realized that phylogeny reflects protein struc-ture function and mechanism and therefore is an essen-tial component of any molecular classification systemWith a desire to conduct whole genome analyses of trans-porters we recognized a need for a universal system oftransport protein classification that took cognizance ofboth function and phylogeny With this conviction inmind we designed what is now known as theTransporter Classification (TC) systemTransporters in the TC Database (TCDB) are classified

using a functionalphylogenetic five-tier system (45) asfollows N1L1N2N3N4 where N is a number and Lis a letter N1 is the class L1 is the subclass N2 is thefamily (sometimes actually a superfamily) N3 is the sub-family (or family in the case of a superfamily) and N4 isthe actual transport system Classes 1ndash5 are well defined(channels secondary carriers primary active transportersgroup translocators and transmembrane electron carriersrespectively) classes 6ndash7 are presently empty beingreserved for yet to be discovered classes and classes 8and 9 represent accessory proteins and incompletelycharacterized proteins respectively This systemdescribing transport systems from all types of living or-ganisms was formally adopted by the International Unionof Biochemistry and Molecular Biology (IUBMB) in June2001 and has served the international scientific communityeffectively ever since (6ndash9)

DATABASE CONTENT AND ACCESS

Encoded within the relational database schema is the func-tionalphylogenetic TC taxonomy (Figure 1) Users canaccess the information through our intuitive interfacewhere information can be viewed at different levels ofgranularity by returning populated HTML data to theweb browser client (the superficial tier) Users can enter

To whom correspondence should be addressed Tel +1 858 534 4084 Fax +1 858 534 7108 Email msaierucsdedu

Published online 12 November 2013 Nucleic Acids Research 2014 Vol 42 Database issue D251ndashD258doi101093nargkt1097

The Author(s) 2013 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby30) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

at the top levels for information about classes and familiesand descend to the deepest level about individual proteinsSince its last publication in the NAR database issue in

2009 (5) there has been significant change in the databasedesign (schema above) Some basic issues pertaining todata integrity redundancy and management have ledto conversion of the MySQL Table Engine fromMyISAM to InnoDB Perhaps the most important justifi-cation for this conversion is the fact that different levels ofTC classification have a type of parentndashchild relationshipA foreign key constraint should allow cascading actionwhen a row (tuple) is insertedupdateddeleted Thus allrelated tables are affected leaving no orphaned recordsRoughly one half of the schema follows the standardrelationships between class subclass superfamilyfamily cluster or subfamily and system and the otherhalf shows tables of information pertaining to uniqueUniProt protein accession numbersThe steps involved and basic ideas behind the TCDB

Admin interface for curation are the same as above andfollow the DB design schema However the look and feelof the interface has changed since its update in 2010 alongwith some new options such as lsquoView Task Queuersquo and

lsquoView Staff Logsrsquo We share our mapping file with differ-ent databases and these files are automatically updatedevery time a new protein is added to the database

The entire web interface has been revamped The newlook and feel should be consistent across all majorbrowsers easier to navigate URL friendly and overalla huge improvement from the previous HTML frame-based web pages For example the browse tab forviewing the TC System (httpwwwtcdborgbrowsephp) has been entirely redesigned using jQuery For amore detailed description of the capabilities available tothe user see Wakabayashi et al (10)

In addition to the search option under the search tabone can search TCDB from a search box on the main pageusing single or multiple terms including TC ID keyword protein name or abbreviation organismal sourceauthor name UniProt accession number PDB IDnumber associated disease reference etc The followingdetails are returned with a protein search or can be easilyaccessed following such a search

(i) TC ID (ii) reference (iii) accession number (iv)protein name (v) length (vi) molecular weight (vii)species (viii) predicted number of TMSs (ix) location

Figure 1 Current MySQL schema displayed using Workbench 60 CE and showing the tables currently in TCDBrsquos database architecture Each linein a table represents a column and displays which datatype (such as int varchar text etc) can be stored Ten tables which are not being useddirectly by TCDB but that have been used for maintenance tasks are not shown in the diagram test lang error proteinold tc2acc broke tc2acc 1flags cflags temp_tms temp_preds and misc A table that has a trifork (entity relationships) pointing toward it contains a column with explicit IDsfrom another table The tables having no entity relationships are grouped on the left The diagram contains four layers (left to right and from top tobottom) the protein layer (green) the family layer (yellow) the ontology layer (blue) and the compounds layer (red)

D252 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

topologyorientation and (x) database of interactingproteins (DIPs) and Pfam reference

The user is also given an option of either BLASTINGPSI-BLASTING the protein against the non-redundantNational Center for Biotechnology Information (NCBI)or TCDB (accessed from the sidebar) Additionalanalysis options such as predicting number of TMSsthrough hydropathy plots are also available (see below)

TCDB collaborates with many important databases (seeReference 10 for recent technical improvements) andshares cross-database links with them these are availableon the individual protein pages Internal hyperlinks con-necting references to classes families and proteins havebeen updated

RECENT TECHNICAL IMPROVEMENTS (2011ndash13)

We have

(1) Incorporated an improved administration pagebuilt-in semi-automatic machine learning tools (11)and user contributions allowing protein historytracking see Wakabayashi et al (10)

(2) Updated software to BLAST 2227(3) Replaced the WHAT program (12) with a function-

ally similar python version to increase speed andreliability

(4) Made the TCDB BLAST database availablegenerated in real-time

(5) Made the TMSTATS Program (13) available foranalyzing topological (TMS) statistics using threedifferent topological prediction programsHMMTOP (14) MEMSAT (15) andSPOCTUPUS (16) giving histograms of TMS dis-tribution for any protein or for any TC classsubclass family subfamily or any combination ofthese

(6) Made Global Sequence Alignment Tool (GSAT)(13) available for performing pairwise alignmentsGSAT performs a shuffle-based alignment todetect distant homologs using the Needleman andWunsch algorithm

(7) Implemented Protocols 12 Protocol 1 runs a PSI-BLAST search of the NCBI protein database withiterations collects results removes redundantsmallsimilar sequences annotates tabulates and countsTMSs Protocol 2 allows the rapid identificationand quantitative evaluation of homologs betweenany two FASTA files using the GSAT program(13)

(8) Established a homology section that replaces theGAP (17) and ICC programs with GSAT andProtocol 2 (13) and included class-wide compari-sons that can be performed with these programs

(9) Incorporated a semi-automatic protein screeningprogram

(10) Cross-referenced TCDB with HOGENOM (httppbiluniv-lyon1frdatabaseshogenomacceuilphp)DIP (18) RefSeq (19) Entrez (20) Pfam (21)BioCyc (22) KEGG (23) PDB (24) and DrugBank

(11) Improved search tools that now separate results bysystem cluster family superfamily and reference

(12) Implemented GBLAST which provides a searchtool designed to identify potential transporters infully sequenced genomes or DNA segments (25ndash27)

(13) Implemented Ancient Rep which provides horizon-tal and vertical search approaches to find trans-membrane repeat units within a single protein ora list of homologs respectively (13)

(14) Updated UniProtKB (28) cross-reference files witha continuously updated dynamic version as of 15August 2013

(15) Provided links to DrugBank (29) allowing reso-lution to the well-known validated human drugtargets presented by Rask-Andersen et al (30) aswell as bacterial drug targets

(16) Implemented the Superfamily Tree programs SFT1and SFT2 which use tens of thousands of BLASTbit scores instead of multiple alignments thusavoiding the pitfalls often encountered whendetermining the phylogeny of distantly relatedproteins (31ndash33) While SFT1 constructs treesallowing visualization of individual proteins SFT2allows depiction of familysubfamily relationships(31ndash33)

(17) Provided a mechanism for user-generated input

GROWTH OF THE DATABASE (2010ndash13)

A file containing the current sequence set is available fordownload from httpwwwtcdborgpublictcdb About150 TC families are introduced each year reflecting theextensive and continual manual curation work being con-ducted Figure 2 shows the parallel growth of TCDBprotein family and superfamily compositions from 2010to 2013 However it should be noted that each yearseveral families in Class 9 are moved to classes 1ndash5 whensufficient information becomes available to allow defin-ition of their mechanisms of action

SUPERFAMILY ADDITIONS (2011ndash13)

Analyses (34ndash43) have revealed distant relationshipsbetween preexisting TC families These relationshipshave been integrated into TCDB as a hyperlink andsuperfamily relationships are mentioned with hyperlinksin the description of each constituent family Thenumber of superfamilies that are either new or expanded(marked with superscript lsquoarsquo in Table 1) has more thandoubled during the last 3 years (Figure 2) and thefurther expansion of such knowledge continues

ESTABLISHING HOMOLOGY BETWEEN PROTEINSUSING TCDB-RELATED SOFTWARE

Affiliation with a family requires satisfying rigorous stat-istical criteria of homology Superfamily status is based onthe superfamily principle (4445) stating that if protein Ais homologous to protein B and protein B is homologousto protein C then protein A must be homologous to

Nucleic Acids Research 2014 Vol 42 Database issue D253

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

protein C regardless of the degree of sequence similarityobserved between proteins A and C To avoid the concernof convergent evolution the minimal length of alignedsequences to establish homology is 60 residues and thecomparison score must be at least 12 standard deviationsusing the GSAT program [see also Wakabayashi et al(10)] As the protein databases grow this value must beincreased (44ndash46) It should be noted that homologymeans lsquoderived from a common evolutionary originrsquoHomology is therefore an absolute term and does notrequire a specific degree of sequence similarity betweenany two protein sequences such as sequences A and Cdiscussed above (45)Summarizing we have developed and perfected novel

tools suited for the analysis of transporters (httpsaier-144-21ucsdedu) These are geared toward (i) superfamilyrecognition (ii) detection of internal repeats (iii) genomeanalyses of transporters (25264748) (iv) integralmembrane topological analyses (31ndash334950) and (v)family (3851ndash58)superfamily phylogenetic tree construc-tion using two very different methods (31ndash33) Theseprograms can be found in the lsquoBioToolsrsquo link of TCDBA reference resource providing detailed information onthese programs can be found in our Wiki (http13223914424) and in a chapter of a recent book edited byChristine A Orengo (10)

OTHER TRANSPORT DATABASES

Only TCDB is comprehensive including transport systemsfrom all living organisms and only TCDB has beenadopted by the IUBMB However several databases havebeen developed (see Table 2) which represent transportersin restricted groups of organisms or are restricted to acertain category of transporter (i) TransportDB (59)contains computerized annotations of transport proteinsin organisms with fully sequenced genomes and classifiesthem according to TCDB using a semi-automated pipeline(ii) YTPdb (60) includes 298 Saccharomyces cerevisiaetransporter proteins It is organized by TC class

although TCs are not provided Each entry is a wikiwhere users can contribute It is easy to use but lacks thedetailed text descriptions of sequences and families that canbe found in TCDB (iii) Aramemnon (61) providesmanually curated protein descriptions for six plantspecies using a clustering algorithm that has been appliedon a matrix of pairwise distances between sequences (iv)TheMedicago trunculata transporter database (62) focuseson transporters in a single plant genome based on TCDB(v) ABCdb (63) contains lists of ABC transporters in pro-karyotes in 21 families with functional predictionsimproved by the addition of references to TCDB (vi)ABCISSE (64) tabulates 34 324 partners of 13 276 ABCtransporter systems in 276 genomes It is built around aphylogeny of 34 families of ABC ATPases (not themembrane constituents) organized in three classes withtext descriptions only for the families TCDB currentlyincludes 92 families of ABC transporter systems 35families of uptake porters 45 families of prokaryotic ex-porters and 12 families of eukaryotic exporters (vii) TheHuman ATP-Binding Cassette Transporters (httpnutrigene4tcomhumanabchtm) categorizes 49 transportsystems into subfamilies AndashG (65) It is a list not adatabase providing some links to other resources Allthese human transporters have been entered into TCDB(viii) SLC tables (66) classify secondary carriers inmammals especially human and mouse SLC contains 52families compared with 115 in the equivalent TC subclassof 2A We have interconnected the two systems andincluded all human carriers in TCDB The tables revealingthe family relationships between the TC and SLC systemscan be found at the top of subclass 2A in TCDB The wormSLC database lists multiple homologs of individual SLCsin Caenorhabditis elegans (ix) The membrane proteins ofknown three-dimensional structure database (67) contains379 entries that constitute a subset of PDB not all of themtransporters PDB entries are grouped broadly by type (x)The UCSF PMT is a SNP database showing schematicdiagrams of transporters with SNPs marked out in thesequence but does not attempt to provide TC numbers

20132010 2011 2012

6

7

8

9

10

20

25

30

35

40

45

50

6

7

8

75

65 of

pro

tein

s x

10-3

of

fam

ilies

10-2

of

sup

erfa

mili

es|| | |

| snietorP

ylima

F

ylimafr epu

S

Figure 2 Growth of TCDB since August 2010 (A) Number of thousands of proteins (solid line) (B) number of hundreds of families (broken line)(C) number of superfamilies (dashed line) Numbers of proteins families and superfamilies in TCDB as of 19 August 2013 were 9853 778 and 49respectively

D254 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

(xi) The ARDB contains antibiotic resistance genesproviding a list of four types of multidrug resistance trans-porter types ABC (TC 3A1) MFS (TC 2A1) RND(TC 2A6) and SMR (TC 2A71)

HARMONIZATION AND FUTURE GOALS

The most important goals we have identified for futuredevelopment of TCDB include (i) the creation of anontology for the TCDB database (ii) improving ourintegration with Pfam and (iii) streamlining the use ofphylogeny and synteny information to provide functionalpredictions Some of the new functions will be

implemented as links and some as software Syntenyshould probably be implemented as links because the in-formation is often already available elsewhere (MicrobesOnline JGIrsquos intuitive resource IMG SEED andRegPredict) Pfam may prove more difficult becausemany families in Pfam are incomplete or not appropriatelyarranged in clans Working with Pfam as we have in thepast (69) we plan to improve upon the transport proteinsection of this databaseIt is well-known that many families that include domain

duplicated transporters do not accurately reflect thedomain borders in the way hidden Markov models(HMMs) have been trained (68) Currently we do notshow lsquorepeat unitsrsquo in TCDB even though this informa-tion is presented in many of our publications We willcontinue to work with Pfam to integrate and coordinateinformation in both databases in a systematic way (69)Ideally such a process should be automated or semi-automatedAnother worthwhile goal is to establish the user base so

we can serve the needs of the scientific community moreeffectively We plan to collect more access statistics tounderstand the needs of the user community GoogleAnalytics was installed in 2011 but improvements arerequired so we can recognize which TCDB features aremost usedOne million PubMed abstracts are created every year

and 10 of the 2012 abstracts were not indexed as of May2013 Other databases that link to TCDB such asEcoGene (70) manually review literature lsquoTransporterrsquois a MESH term PubMed uses but there is a 6-monthdelay to add MESH terms and sometimes the wordlsquoTransporterrsquo is not obvious from the title TCDB usesmachine learning classifiers as well as keyword searcheswhich are continuously extracted from TCDB and used assearch terms to identify relevant articles We are consider-ing new ways for users to provide sequence data andinformation either with or without the use of email sug-gestions by email could be used as test sets to evaluate theefficiency of an automated process We are also consider-ing implementing links for reference sequence and infor-mation input from users Adding a feature allowingTCDB to be searched as a library of HMMs is alsounder consideration Current TCDB users report thatthe present system of presenting search results is satisfac-tory but we constantly strive to improve the databaseand suggestions from users are most welcomeTCDB needs an ontological hierarchical system and a

controlled vocabulary EBIrsquos ChemDB (71) has created achemical classification system and we have already set upa prototype which can be accessed from this link httpwwwtcdborgontology The substrate text needs to beextracted from the description and then correlated withChemDB One system already exists but due toinconsistencies in the description it has been difficult toimplement If we could link with gene ontology TCnumbers would be more accessible Another importantarea for improvement concerns user access to the mostrecent entries Perhaps TCDB should have lsquorecent re-leasesrsquo such as those of Pfam Since we already trackprotein histories adding this feature would not be

Table 1 Transport protein superfamilies in TCDB

1 Aerolysina

2 Amino acidPolyamineorganoCation (APC)a

3 ATP-Binding Cassette-1 (ABC1)4 ATP-Binding Cassette-2 (ABC2) with the ECF

sub-superfamily5 ATP-Binding Cassette-3 (ABC3)6 Bacterial bacteriocin (BB)a

7 Bilearseniteriboflavin transporter (BART)a

8 Cation diffusion facilitator (CDF)a

9 CationProton antiporter (CPA)10 Cecropin11 Circular bacterial bacteriocin (CBB)a

12 Claudina

13 Corynebacterial PorAPorHa

14 Defensin15 Drugmetabolite transporter (DMT)16 Endomembrane protein translocon (EMPT)a

17 Epithelial Na+ channel (ENaCP2X)18 Gap junction (GJ)a

19 General bacterial porin (GBP)20 Holin Ia

21 Holin IIa

22 Holin IIIa

23 Holin IVa

24 Holin Va

25 Holin VIa

26 Holin VIIa

27 Huwentoxin28 Ion transporter (IT)29 Lysine exporter (LysE)30 Major facilitator (MFS)a

31 Major intrinsic protein (MIP)a

32 Melittin33 Membrane attack complexperforin (MACPF)a

34 Mercury (Mer)35 Mitochondrial carrier (MC)36 Mycobacterialnocardial porin (MspA)a

37 Multidrugoligosaccharidyl-lipidpolysaccharide (MOP)Flippasea

38 P-type ATPase (P-ATPase)39 Phosphotransferase system AscGat (PTS-AG)40 Phosphotransferase system GlcFruLac (PTS-GFL)41 Resistance-nodulation-cell division (RND)42 RTX-toxin43 T4 immunity (T4 IMM)a

44 Transmembrane inner membrane-17 (Tim17)45 TransporteropsinG protein-coupled receptor (TOG)46 TRCTAMP-B (TRCTAMP)a

47 Outer membrane protein (OMP) insertase (YaeTTpsB)48 Voltage-gated ion channel (42)49 Viral envelope glycoprotein (Env)a

aNew or recently expanded superfamilies

Nucleic Acids Research 2014 Vol 42 Database issue D255

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

difficult Some basic statistics where database growth canbe followed are already available at httpwwwtcdborgsearchindexphpWe are currently undertaking the development of

standardized workflows to confirm homology resultsfrom TCDBrsquos in-house statistical methods based on struc-tural superimposition and HMMHMM comparisonsFor instance we use structural superimposition inaddition to sequence statistical analyses to identify orconfirm structural and evolutionary relationshipsbetween members of a superfamily (40) This helps to es-tablish reference points in structural space for homologydetection

CONCLUSION

In 2006 TCDB contained 3000 proteins classified into400 families but in 2013 it exceeded 10 000 proteins in750 families The availability of TCDB has allowedmajor basic research advances including answering funda-mental biological questions determining the routes ofevolution taken for the appearance of these proteins iden-tifying superfamily relationships and allowing structuralfunctional and mechanistic predictions Within practicallimits TCDB reflects the current state of our knowledgeconcerning its constituent parts

FUNDING

TCDB is supported by NIH [GM 077402-05 and GM094610-01] Funding for open access charge NIH

Conflict of interest statement None declared

REFERENCES

1 FleischmannRD AdamsMD WhiteO ClaytonRAKirknessEF KerlavageAR BultCJ TombJFDoughertyBA MerrickJM et al (1995) Whole-genomerandom sequencing and assembly of Haemophilus influenzae RdScience 269 496ndash512

2 BairochA (1994) The ENZYME data bank Nucleic Acids Res22 3626ndash3627

3 SaierMH Jr (1994) Computer-aided analyses oftransport protein sequences gleaning evidence concerning

function structure biogenesis and evolution Microbiol Rev 5871ndash93

4 SaierMH Jr TranCV and BaraboteRD (2006) TCDB theTransporter Classification Database for membrane transportprotein analyses and information Nucleic Acids Res 34D181ndashD186

5 SaierMH Jr YenMR NotoK TamangDG and ElkanC(2009) The Transporter Classification Database recent advancesNucleic Acids Res 37 D274ndashD278

6 SaierMH Jr (2000) A functional-phylogenetic classificationsystem for transmembrane solute transporters Microbiol MolBiol Rev 64 354ndash411

7 BuschW and SaierMH Jr (2004) The IUBMB-endorsedtransporter classification system Mol Biotechnol 27 253ndash262

8 BuschW and SaierMH Jr (2003) The IUBMB-endorsedtransporter classification system Methods Mol Biol 227 21ndash36

9 BuschW and SaierMH Jr (2002) The transporter classification(TC) system 2002 Crit Rev Biochem Mol Biol 37 287ndash337

10 WakabayashiST ShlykovMA KumarU ReddyVMalhotraA ClarkeEL ChenJS CastilloR De La MareRSunEI et al (2013) Deducing transport protein evolution basedon sequence structure and function In ChristineAO andAlexB (eds) Protein Families Relating Protein SequenceStructure and Function 1st edn Wiley Hoboken NJ

11 SehgalAK DasS NotoK SaierMH Jr and ElkanC (2011)Identifying relevant data for a biological database handcraftedrules versus machine learning IEEEACM Trans Comput BiolBioinform 8 851ndash857

12 ZhaiY and SaierMH Jr (2001) A web-based program (WHAT)for the simultaneous prediction of hydropathy amphipathicitysecondary structure and transmembrane topology for asingle protein sequence J Mol Microbiol Biotechnol 3501ndash502

13 ReddyVS and SaierMH Jr (2012) BioV Suitemdasha collection ofprograms for the study of transport protein evolution FEBS J279 2036ndash2046

14 TusnadyGE and SimonI (2001) The HMMTOPtransmembrane topology prediction server Bioinformatics 17849ndash850

15 JonesDT (2007) Improving the accuracy of transmembraneprotein topology prediction using evolutionary informationBioinformatics 23 538ndash544

16 ViklundH BernselA SkwarkM and ElofssonA (2008)SPOCTOPUS a combined predictor of signal peptides andmembrane protein topology Bioinformatics 24 2928ndash2929

17 DevereuxJ HaeberliP and SmithiesO (1984) A comprehensiveset of sequence analysis programs for the VAX Nucleic AcidsRes 12 387ndash395

18 XenariosI RiceDW SalwinskiL BaronMK MarcotteEMand EisenbergD (2000) DIP the database of interactingproteins Nucleic Acids Res 28 289ndash291

19 PruittKD TatusovaT BrownGR and MaglottDR (2012)NCBI Reference Sequences (RefSeq) current status new features

Table 2 List of known transporter databases

Name URL Interconnectedwith TCDB

TransportDB httpwwwmembranetransportorg YesYTPdb httpytpdbbiopark-itbe YesAramemnon httparamemnonbotanikuni-koelnde NoM trunculata TDB httpbioinformaticscaueducnMtTransporterbrowsephp YesABCdb httpswww-abcdbbiotoulfr YesABCISSE httpwww1pasteurfrrechercheunitespmtgabcdatabaseiphtml NoHuman ABC TDB httpnutrigene4tcomhumanabchtm YesSLC tables httpwwwbioparadigmsorgslcintrohtm Yes in TCDBWorm SLC db httpwwwWormSLCorg NoMP struc httpblancobiomoluciedumpstruc NoUCSF PMT httppharmacogeneticsucsfedu NoARDB httpardbcbcbumdedu No

D256 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

and genome annotation policy Nucleic Acids Res 40D130ndashD135

20 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

21 PuntaM CoggillPC EberhardtRY MistryJ TateJBoursnellC PangN ForslundK CericG ClementsJ et al(2012) The Pfam protein families database Nucleic Acids Res40 D290ndashD301

22 LatendresseM PaleyS and KarpPD (2012) Browsingmetabolic and regulatory networks with BioCyc Methods MolBiol 804 197ndash216

23 KanehisaM GotoS SatoY FurumichiM and TanabeM(2012) KEGG for integration and interpretation oflarge-scale molecular data sets Nucleic Acids Res 40D109ndashD114

24 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for researchand education Nucleic Acids Res 40 D475ndashD482

25 YoumJ and SaierMH Jr (2012) Comparative analyses oftransport proteins encoded within the genomes of Mycobacteriumtuberculosis and Mycobacterium leprae Biochim Biophys Acta1818 776ndash797

26 TamangDG RabusR BaraboteRD and SaierMH Jr (2009)Comprehensive analyses of transport proteins encoded within thegenome of lsquolsquoAromatoleum aromaticumrsquorsquo strain EbN1 J MembrBiol 229 53ndash90

27 PaparoditisP VastermarkA LeAJ FuerstJA andSaierMH Jr (2013) Bioinformatic analyses of integralmembrane transport proteins encoded within the genome of theplanctomycetes species Rhodopirellula baltica BiochimBiophys Acta 1838 193ndash215

28 UniProt Consortium (2013) Update on activities at the UniversalProtein Resource (UniProt) in 2013 Nucleic Acids Res 41D43ndashD47

29 KnoxC LawV JewisonT LiuP LyS FrolkisA PonABancoK MakC NeveuV et al (2011) DrugBank 30 acomprehensive resource for lsquoomicsrsquo research on drugs NucleicAcids Res 39 D1035ndashD1041

30 Rask-AndersenM AlmenMS and SchiothHB (2011) Trendsin the exploitation of novel drug targets Nat Rev Drug Discov10 579ndash590

31 ChenJS ReddyV ChenJH ShlykovMA ZhengWHChoJ YenMR and SaierMH Jr (2011) Phylogeneticcharacterization of transport protein superfamiliessuperiority of SuperfamilyTree programs over thosebased on multiple alignments J Mol Microbiol Biotechnol 2183ndash96

32 YenMR ChoiJ and SaierMH Jr (2009) Bioinformaticanalyses of transmembrane transport novel software for deducingprotein phylogeny topology and evolution J Mol MicrobiolBiotechnol 17 163ndash176

33 YenMR ChenJS MarquezJL SunEI and SaierMH(2010) Multidrug resistance phylogenetic characterization ofsuperfamilies of secondary carriers that include drug exportersMethods Mol Biol 637 47ndash64

34 WongFH ChenJS ReddyV DayJL ShlykovMAWakabayashiST and SaierMH Jr (2012) The amino acid-polyamine-organocation superfamily J Mol MicrobiolBiotechnol 22 105ndash113

35 ReddyVS ShlykovMA CastilloR SunEI andSaierMH Jr (2012) The major facilitator superfamily (MFS)revisited FEBS J 279 2022ndash2035

36 ShlykovMA ZhengWH ChenJS and SaierMH Jr (2012)Bioinformatic characterization of the 4-Toluene Sulfonate UptakePermease (TSUP) family of transmembrane proteins BiochimBiophys Acta 1818 703ndash717

37 ChanH BabayanV BlyuminE GandhiC HakKHarakeD KumarK LeeP LiTT LiuHY et al (2010) Thep-type ATPase superfamily J Mol Microbiol Biotechnol 195ndash104

38 RettnerRE and SaierMH Jr (2010) The autoinducer-2 exportersuperfamily J Mol Microbiol Biotechnol 18 195ndash205

39 LamVH LeeJH SilverioA ChanH GomolplitinantKMPovolotskyTL OrlovaE SunEI WelliverCH andSaierMH Jr (2011) Pathways of transport protein evolutionrecent advances Biol Chem 392 5ndash12

40 ZhengWH VastermarkA ShlykovMA ReddyV SunEIand SaierMH Jr (2013) Evolutionary relationships ofATP-Binding Cassette (ABC) uptake porters BMC Microbiol13 98

41 MatiasMG GomolplitinantKM TamangDG andSaierMH Jr (2010) Animal Ca2+ release-activated Ca2+(CRAC) channels appear to be homologous to and derivedfrom the ubiquitous cation diffusion facilitators BMC Res Notes3 158

42 WangB DukarevichM SunEI YenMR and SaierMH Jr(2009) Membrane porters of ATP-binding cassette transportsystems are polyphyletic J Membr Biol 231 1ndash10

43 YeeDC ShlykovMA VastermarkA ReddyVS AroraSSunEI and SaierMH Jr (2013) The Transporter-Opsin-Gprotein-coupled receptor (TOG) Superfamily FEBS J 2805780ndash5800

44 SaierMH Jr (1994) Computer-aided analyses of transportprotein sequences gleaning evidence concerning functionstructure biogenesis and evolution Microbiol Rev 58 71ndash93

45 DoolittleRF (1994) Convergent evolution the need to beexplicit Trends Biochem Sci 19 15ndash18

46 DayhoffMO BarkerWC and HuntLT (1983) Establishinghomologies in protein sequences Methods Enzymol 91 524ndash545

47 CoyneRS HannickL ShanmugamD HostetlerJB BramiDJoardarVS JohnsonJ RaduneD SinghI BadgerJH et al(2011) Comparative genomics of the pathogenic ciliateIchthyophthirius multifiliis its free-living relatives and a hostspecies provide insights into adoption of a parasitic lifestyle andprospects for disease control Genome Biol 12 R100

48 PodarM AndersonI MakarovaKS ElkinsJG IvanovaNWallMA LykidisA MavromatisK SunH HudsonME et al(2008) A genomic analysis of the archaeal system Ignicoccushospitalis-Nanoarchaeum equitans Genome Biol 9 R158

49 ZhaiY and SaierMH Jr (2002) A simple sensitive program fordetecting internal repeats in sets of multiply aligned homologousproteins J Mol Microbiol Biotechnol 4 375ndash377

50 ZhaiY and SaierMH Jr (2001) A web-based program for theprediction of average hydropathy average amphipathicity andaverage similarity of multiply aligned homologous proteinsJ Mol Microbiol Biotechnol 3 285ndash286

51 SilverioAL and SaierMH Jr (2011) Bioinformaticcharacterization of the trimeric intracellular cation-specific channelprotein family J Membr Biol 241 77ndash101

52 GomolplitinantKM and SaierMH Jr (2011) Evolutionof the oligopeptide transporter family J Membr Biol 24089ndash110

53 TsaiJC YenMR CastilloR LeytonDL HendersonIRand SaierMH Jr (2010) The bacterial intimins and invasins alarge and novel family of secreted proteins PLoS One 5 e14403

54 CastilloR and SaierMH (2010) Functional promiscuity ofhomologues of the bacterial ArsA ATPases Int J Microbiol2010 187373

55 PovolotskyTL OrlovaE TamangDG and SaierMH Jr(2010) Defense against cannibalism the SdpI family of bacterialimmunitysignal transduction proteins J Membr Biol 235145ndash162

56 XiaoAY WangJ and SaierMH (2010) Bacterial adaptormembrane fusion proteins and the structurally dissimilar outermembrane auxiliary proteins have exchanged central domainsin alpha-proteobacteria Int J Microbiol 2010 589391

57 TheverMD and SaierMH Jr (2009) Bioinformaticcharacterization of p-type ATPases encoded within the fullysequenced genomes of 26 eukaryotes J Membr Biol 229115ndash130

58 VastermarkA and SaierMH Jr (2013) Evolutionary relationshipbetween 5+5 and 7+7 inverted repeat folds within the aminoacid-polyamine-organocation superfamily Proteins August 28(doi 101002prot24401 epub ahead of print)

59 RenQ ChenK and PaulsenIT (2007) TransportDB acomprehensive database resource for cytoplasmic membrane

Nucleic Acids Research 2014 Vol 42 Database issue D257

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

transport systems and outer membrane channels Nucleic AcidsRes 35 D274ndashD279

60 BroheeS BarriotR MoreauY and AndreB (2010) YTPdba wiki database of yeast membrane transporters BiochimBiophys Acta 1798 1908ndash1912

61 SchwackeR SchneiderA van der GraaffE FischerKCatoniE DesimoneM FrommerWB FluggeUI andKunzeR (2003) ARAMEMNON a novel database forArabidopsis integral membrane proteins Plant Physiol 13116ndash26

62 MiaoZ LiD ZhangZ DongJ SuZ and WangT (2012)Medicago truncatula transporter database a comprehensivedatabase resource for M truncatula transporters BMC Genomics13 60

63 FichantG BasseMJ and QuentinY (2006) ABCdb an onlineresource for ABC transporter repertories from sequenced archaealand bacterial genomes FEMS Microbiol Lett 256 333ndash339

64 BouigeP LaurentD PiloyanL and DassaE (2002)Phylogenetic and functional classification of ATP-binding cassette(ABC) systems Curr Protein Pept Sci 3 541ndash559

65 VasiliouV VasiliouK and NebertDW (2009) Human ATP-binding cassette (ABC) transporter family Hum Genomics 3281ndash290

66 HedigerMA ClemenconB BurrierRE and BrufordEA(2013) The ABCs of membrane transporters in healthand disease (SLC series) introduction Mol Aspects Med 3495ndash107

67 WhiteSH (2009) Biophysical dissection of membrane proteinsNature 459 344ndash346

68 VastermarkA AlmenMS SimmenMW FredrikssonR andSchiothHB (2011) Functional specialization in nucleotide sugartransporters occurred through differentiation of the gene clusterEamA (DUF6) before the radiation of Viridiplantae BMC EvolBiol 11 123

69 ReddyBL and SaierMH Jr (2013) Topological andphylogenetic analyses of bacterial holin families and superfamiliesBiochim Biophys Acta 1828 2654ndash2671

70 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res41 D613ndashD624

71 ChenJ SwamidassSJ DouY BruandJ and BaldiP (2005)ChemDB a public database of small molecules and relatedchemoinformatics resources Bioinformatics 21 4133ndash4139

D258 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

Page 2: The Transporter Classification Database - Nucleic Acids Research

at the top levels for information about classes and familiesand descend to the deepest level about individual proteinsSince its last publication in the NAR database issue in

2009 (5) there has been significant change in the databasedesign (schema above) Some basic issues pertaining todata integrity redundancy and management have ledto conversion of the MySQL Table Engine fromMyISAM to InnoDB Perhaps the most important justifi-cation for this conversion is the fact that different levels ofTC classification have a type of parentndashchild relationshipA foreign key constraint should allow cascading actionwhen a row (tuple) is insertedupdateddeleted Thus allrelated tables are affected leaving no orphaned recordsRoughly one half of the schema follows the standardrelationships between class subclass superfamilyfamily cluster or subfamily and system and the otherhalf shows tables of information pertaining to uniqueUniProt protein accession numbersThe steps involved and basic ideas behind the TCDB

Admin interface for curation are the same as above andfollow the DB design schema However the look and feelof the interface has changed since its update in 2010 alongwith some new options such as lsquoView Task Queuersquo and

lsquoView Staff Logsrsquo We share our mapping file with differ-ent databases and these files are automatically updatedevery time a new protein is added to the database

The entire web interface has been revamped The newlook and feel should be consistent across all majorbrowsers easier to navigate URL friendly and overalla huge improvement from the previous HTML frame-based web pages For example the browse tab forviewing the TC System (httpwwwtcdborgbrowsephp) has been entirely redesigned using jQuery For amore detailed description of the capabilities available tothe user see Wakabayashi et al (10)

In addition to the search option under the search tabone can search TCDB from a search box on the main pageusing single or multiple terms including TC ID keyword protein name or abbreviation organismal sourceauthor name UniProt accession number PDB IDnumber associated disease reference etc The followingdetails are returned with a protein search or can be easilyaccessed following such a search

(i) TC ID (ii) reference (iii) accession number (iv)protein name (v) length (vi) molecular weight (vii)species (viii) predicted number of TMSs (ix) location

Figure 1 Current MySQL schema displayed using Workbench 60 CE and showing the tables currently in TCDBrsquos database architecture Each linein a table represents a column and displays which datatype (such as int varchar text etc) can be stored Ten tables which are not being useddirectly by TCDB but that have been used for maintenance tasks are not shown in the diagram test lang error proteinold tc2acc broke tc2acc 1flags cflags temp_tms temp_preds and misc A table that has a trifork (entity relationships) pointing toward it contains a column with explicit IDsfrom another table The tables having no entity relationships are grouped on the left The diagram contains four layers (left to right and from top tobottom) the protein layer (green) the family layer (yellow) the ontology layer (blue) and the compounds layer (red)

D252 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

topologyorientation and (x) database of interactingproteins (DIPs) and Pfam reference

The user is also given an option of either BLASTINGPSI-BLASTING the protein against the non-redundantNational Center for Biotechnology Information (NCBI)or TCDB (accessed from the sidebar) Additionalanalysis options such as predicting number of TMSsthrough hydropathy plots are also available (see below)

TCDB collaborates with many important databases (seeReference 10 for recent technical improvements) andshares cross-database links with them these are availableon the individual protein pages Internal hyperlinks con-necting references to classes families and proteins havebeen updated

RECENT TECHNICAL IMPROVEMENTS (2011ndash13)

We have

(1) Incorporated an improved administration pagebuilt-in semi-automatic machine learning tools (11)and user contributions allowing protein historytracking see Wakabayashi et al (10)

(2) Updated software to BLAST 2227(3) Replaced the WHAT program (12) with a function-

ally similar python version to increase speed andreliability

(4) Made the TCDB BLAST database availablegenerated in real-time

(5) Made the TMSTATS Program (13) available foranalyzing topological (TMS) statistics using threedifferent topological prediction programsHMMTOP (14) MEMSAT (15) andSPOCTUPUS (16) giving histograms of TMS dis-tribution for any protein or for any TC classsubclass family subfamily or any combination ofthese

(6) Made Global Sequence Alignment Tool (GSAT)(13) available for performing pairwise alignmentsGSAT performs a shuffle-based alignment todetect distant homologs using the Needleman andWunsch algorithm

(7) Implemented Protocols 12 Protocol 1 runs a PSI-BLAST search of the NCBI protein database withiterations collects results removes redundantsmallsimilar sequences annotates tabulates and countsTMSs Protocol 2 allows the rapid identificationand quantitative evaluation of homologs betweenany two FASTA files using the GSAT program(13)

(8) Established a homology section that replaces theGAP (17) and ICC programs with GSAT andProtocol 2 (13) and included class-wide compari-sons that can be performed with these programs

(9) Incorporated a semi-automatic protein screeningprogram

(10) Cross-referenced TCDB with HOGENOM (httppbiluniv-lyon1frdatabaseshogenomacceuilphp)DIP (18) RefSeq (19) Entrez (20) Pfam (21)BioCyc (22) KEGG (23) PDB (24) and DrugBank

(11) Improved search tools that now separate results bysystem cluster family superfamily and reference

(12) Implemented GBLAST which provides a searchtool designed to identify potential transporters infully sequenced genomes or DNA segments (25ndash27)

(13) Implemented Ancient Rep which provides horizon-tal and vertical search approaches to find trans-membrane repeat units within a single protein ora list of homologs respectively (13)

(14) Updated UniProtKB (28) cross-reference files witha continuously updated dynamic version as of 15August 2013

(15) Provided links to DrugBank (29) allowing reso-lution to the well-known validated human drugtargets presented by Rask-Andersen et al (30) aswell as bacterial drug targets

(16) Implemented the Superfamily Tree programs SFT1and SFT2 which use tens of thousands of BLASTbit scores instead of multiple alignments thusavoiding the pitfalls often encountered whendetermining the phylogeny of distantly relatedproteins (31ndash33) While SFT1 constructs treesallowing visualization of individual proteins SFT2allows depiction of familysubfamily relationships(31ndash33)

(17) Provided a mechanism for user-generated input

GROWTH OF THE DATABASE (2010ndash13)

A file containing the current sequence set is available fordownload from httpwwwtcdborgpublictcdb About150 TC families are introduced each year reflecting theextensive and continual manual curation work being con-ducted Figure 2 shows the parallel growth of TCDBprotein family and superfamily compositions from 2010to 2013 However it should be noted that each yearseveral families in Class 9 are moved to classes 1ndash5 whensufficient information becomes available to allow defin-ition of their mechanisms of action

SUPERFAMILY ADDITIONS (2011ndash13)

Analyses (34ndash43) have revealed distant relationshipsbetween preexisting TC families These relationshipshave been integrated into TCDB as a hyperlink andsuperfamily relationships are mentioned with hyperlinksin the description of each constituent family Thenumber of superfamilies that are either new or expanded(marked with superscript lsquoarsquo in Table 1) has more thandoubled during the last 3 years (Figure 2) and thefurther expansion of such knowledge continues

ESTABLISHING HOMOLOGY BETWEEN PROTEINSUSING TCDB-RELATED SOFTWARE

Affiliation with a family requires satisfying rigorous stat-istical criteria of homology Superfamily status is based onthe superfamily principle (4445) stating that if protein Ais homologous to protein B and protein B is homologousto protein C then protein A must be homologous to

Nucleic Acids Research 2014 Vol 42 Database issue D253

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

protein C regardless of the degree of sequence similarityobserved between proteins A and C To avoid the concernof convergent evolution the minimal length of alignedsequences to establish homology is 60 residues and thecomparison score must be at least 12 standard deviationsusing the GSAT program [see also Wakabayashi et al(10)] As the protein databases grow this value must beincreased (44ndash46) It should be noted that homologymeans lsquoderived from a common evolutionary originrsquoHomology is therefore an absolute term and does notrequire a specific degree of sequence similarity betweenany two protein sequences such as sequences A and Cdiscussed above (45)Summarizing we have developed and perfected novel

tools suited for the analysis of transporters (httpsaier-144-21ucsdedu) These are geared toward (i) superfamilyrecognition (ii) detection of internal repeats (iii) genomeanalyses of transporters (25264748) (iv) integralmembrane topological analyses (31ndash334950) and (v)family (3851ndash58)superfamily phylogenetic tree construc-tion using two very different methods (31ndash33) Theseprograms can be found in the lsquoBioToolsrsquo link of TCDBA reference resource providing detailed information onthese programs can be found in our Wiki (http13223914424) and in a chapter of a recent book edited byChristine A Orengo (10)

OTHER TRANSPORT DATABASES

Only TCDB is comprehensive including transport systemsfrom all living organisms and only TCDB has beenadopted by the IUBMB However several databases havebeen developed (see Table 2) which represent transportersin restricted groups of organisms or are restricted to acertain category of transporter (i) TransportDB (59)contains computerized annotations of transport proteinsin organisms with fully sequenced genomes and classifiesthem according to TCDB using a semi-automated pipeline(ii) YTPdb (60) includes 298 Saccharomyces cerevisiaetransporter proteins It is organized by TC class

although TCs are not provided Each entry is a wikiwhere users can contribute It is easy to use but lacks thedetailed text descriptions of sequences and families that canbe found in TCDB (iii) Aramemnon (61) providesmanually curated protein descriptions for six plantspecies using a clustering algorithm that has been appliedon a matrix of pairwise distances between sequences (iv)TheMedicago trunculata transporter database (62) focuseson transporters in a single plant genome based on TCDB(v) ABCdb (63) contains lists of ABC transporters in pro-karyotes in 21 families with functional predictionsimproved by the addition of references to TCDB (vi)ABCISSE (64) tabulates 34 324 partners of 13 276 ABCtransporter systems in 276 genomes It is built around aphylogeny of 34 families of ABC ATPases (not themembrane constituents) organized in three classes withtext descriptions only for the families TCDB currentlyincludes 92 families of ABC transporter systems 35families of uptake porters 45 families of prokaryotic ex-porters and 12 families of eukaryotic exporters (vii) TheHuman ATP-Binding Cassette Transporters (httpnutrigene4tcomhumanabchtm) categorizes 49 transportsystems into subfamilies AndashG (65) It is a list not adatabase providing some links to other resources Allthese human transporters have been entered into TCDB(viii) SLC tables (66) classify secondary carriers inmammals especially human and mouse SLC contains 52families compared with 115 in the equivalent TC subclassof 2A We have interconnected the two systems andincluded all human carriers in TCDB The tables revealingthe family relationships between the TC and SLC systemscan be found at the top of subclass 2A in TCDB The wormSLC database lists multiple homologs of individual SLCsin Caenorhabditis elegans (ix) The membrane proteins ofknown three-dimensional structure database (67) contains379 entries that constitute a subset of PDB not all of themtransporters PDB entries are grouped broadly by type (x)The UCSF PMT is a SNP database showing schematicdiagrams of transporters with SNPs marked out in thesequence but does not attempt to provide TC numbers

20132010 2011 2012

6

7

8

9

10

20

25

30

35

40

45

50

6

7

8

75

65 of

pro

tein

s x

10-3

of

fam

ilies

10-2

of

sup

erfa

mili

es|| | |

| snietorP

ylima

F

ylimafr epu

S

Figure 2 Growth of TCDB since August 2010 (A) Number of thousands of proteins (solid line) (B) number of hundreds of families (broken line)(C) number of superfamilies (dashed line) Numbers of proteins families and superfamilies in TCDB as of 19 August 2013 were 9853 778 and 49respectively

D254 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

(xi) The ARDB contains antibiotic resistance genesproviding a list of four types of multidrug resistance trans-porter types ABC (TC 3A1) MFS (TC 2A1) RND(TC 2A6) and SMR (TC 2A71)

HARMONIZATION AND FUTURE GOALS

The most important goals we have identified for futuredevelopment of TCDB include (i) the creation of anontology for the TCDB database (ii) improving ourintegration with Pfam and (iii) streamlining the use ofphylogeny and synteny information to provide functionalpredictions Some of the new functions will be

implemented as links and some as software Syntenyshould probably be implemented as links because the in-formation is often already available elsewhere (MicrobesOnline JGIrsquos intuitive resource IMG SEED andRegPredict) Pfam may prove more difficult becausemany families in Pfam are incomplete or not appropriatelyarranged in clans Working with Pfam as we have in thepast (69) we plan to improve upon the transport proteinsection of this databaseIt is well-known that many families that include domain

duplicated transporters do not accurately reflect thedomain borders in the way hidden Markov models(HMMs) have been trained (68) Currently we do notshow lsquorepeat unitsrsquo in TCDB even though this informa-tion is presented in many of our publications We willcontinue to work with Pfam to integrate and coordinateinformation in both databases in a systematic way (69)Ideally such a process should be automated or semi-automatedAnother worthwhile goal is to establish the user base so

we can serve the needs of the scientific community moreeffectively We plan to collect more access statistics tounderstand the needs of the user community GoogleAnalytics was installed in 2011 but improvements arerequired so we can recognize which TCDB features aremost usedOne million PubMed abstracts are created every year

and 10 of the 2012 abstracts were not indexed as of May2013 Other databases that link to TCDB such asEcoGene (70) manually review literature lsquoTransporterrsquois a MESH term PubMed uses but there is a 6-monthdelay to add MESH terms and sometimes the wordlsquoTransporterrsquo is not obvious from the title TCDB usesmachine learning classifiers as well as keyword searcheswhich are continuously extracted from TCDB and used assearch terms to identify relevant articles We are consider-ing new ways for users to provide sequence data andinformation either with or without the use of email sug-gestions by email could be used as test sets to evaluate theefficiency of an automated process We are also consider-ing implementing links for reference sequence and infor-mation input from users Adding a feature allowingTCDB to be searched as a library of HMMs is alsounder consideration Current TCDB users report thatthe present system of presenting search results is satisfac-tory but we constantly strive to improve the databaseand suggestions from users are most welcomeTCDB needs an ontological hierarchical system and a

controlled vocabulary EBIrsquos ChemDB (71) has created achemical classification system and we have already set upa prototype which can be accessed from this link httpwwwtcdborgontology The substrate text needs to beextracted from the description and then correlated withChemDB One system already exists but due toinconsistencies in the description it has been difficult toimplement If we could link with gene ontology TCnumbers would be more accessible Another importantarea for improvement concerns user access to the mostrecent entries Perhaps TCDB should have lsquorecent re-leasesrsquo such as those of Pfam Since we already trackprotein histories adding this feature would not be

Table 1 Transport protein superfamilies in TCDB

1 Aerolysina

2 Amino acidPolyamineorganoCation (APC)a

3 ATP-Binding Cassette-1 (ABC1)4 ATP-Binding Cassette-2 (ABC2) with the ECF

sub-superfamily5 ATP-Binding Cassette-3 (ABC3)6 Bacterial bacteriocin (BB)a

7 Bilearseniteriboflavin transporter (BART)a

8 Cation diffusion facilitator (CDF)a

9 CationProton antiporter (CPA)10 Cecropin11 Circular bacterial bacteriocin (CBB)a

12 Claudina

13 Corynebacterial PorAPorHa

14 Defensin15 Drugmetabolite transporter (DMT)16 Endomembrane protein translocon (EMPT)a

17 Epithelial Na+ channel (ENaCP2X)18 Gap junction (GJ)a

19 General bacterial porin (GBP)20 Holin Ia

21 Holin IIa

22 Holin IIIa

23 Holin IVa

24 Holin Va

25 Holin VIa

26 Holin VIIa

27 Huwentoxin28 Ion transporter (IT)29 Lysine exporter (LysE)30 Major facilitator (MFS)a

31 Major intrinsic protein (MIP)a

32 Melittin33 Membrane attack complexperforin (MACPF)a

34 Mercury (Mer)35 Mitochondrial carrier (MC)36 Mycobacterialnocardial porin (MspA)a

37 Multidrugoligosaccharidyl-lipidpolysaccharide (MOP)Flippasea

38 P-type ATPase (P-ATPase)39 Phosphotransferase system AscGat (PTS-AG)40 Phosphotransferase system GlcFruLac (PTS-GFL)41 Resistance-nodulation-cell division (RND)42 RTX-toxin43 T4 immunity (T4 IMM)a

44 Transmembrane inner membrane-17 (Tim17)45 TransporteropsinG protein-coupled receptor (TOG)46 TRCTAMP-B (TRCTAMP)a

47 Outer membrane protein (OMP) insertase (YaeTTpsB)48 Voltage-gated ion channel (42)49 Viral envelope glycoprotein (Env)a

aNew or recently expanded superfamilies

Nucleic Acids Research 2014 Vol 42 Database issue D255

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

difficult Some basic statistics where database growth canbe followed are already available at httpwwwtcdborgsearchindexphpWe are currently undertaking the development of

standardized workflows to confirm homology resultsfrom TCDBrsquos in-house statistical methods based on struc-tural superimposition and HMMHMM comparisonsFor instance we use structural superimposition inaddition to sequence statistical analyses to identify orconfirm structural and evolutionary relationshipsbetween members of a superfamily (40) This helps to es-tablish reference points in structural space for homologydetection

CONCLUSION

In 2006 TCDB contained 3000 proteins classified into400 families but in 2013 it exceeded 10 000 proteins in750 families The availability of TCDB has allowedmajor basic research advances including answering funda-mental biological questions determining the routes ofevolution taken for the appearance of these proteins iden-tifying superfamily relationships and allowing structuralfunctional and mechanistic predictions Within practicallimits TCDB reflects the current state of our knowledgeconcerning its constituent parts

FUNDING

TCDB is supported by NIH [GM 077402-05 and GM094610-01] Funding for open access charge NIH

Conflict of interest statement None declared

REFERENCES

1 FleischmannRD AdamsMD WhiteO ClaytonRAKirknessEF KerlavageAR BultCJ TombJFDoughertyBA MerrickJM et al (1995) Whole-genomerandom sequencing and assembly of Haemophilus influenzae RdScience 269 496ndash512

2 BairochA (1994) The ENZYME data bank Nucleic Acids Res22 3626ndash3627

3 SaierMH Jr (1994) Computer-aided analyses oftransport protein sequences gleaning evidence concerning

function structure biogenesis and evolution Microbiol Rev 5871ndash93

4 SaierMH Jr TranCV and BaraboteRD (2006) TCDB theTransporter Classification Database for membrane transportprotein analyses and information Nucleic Acids Res 34D181ndashD186

5 SaierMH Jr YenMR NotoK TamangDG and ElkanC(2009) The Transporter Classification Database recent advancesNucleic Acids Res 37 D274ndashD278

6 SaierMH Jr (2000) A functional-phylogenetic classificationsystem for transmembrane solute transporters Microbiol MolBiol Rev 64 354ndash411

7 BuschW and SaierMH Jr (2004) The IUBMB-endorsedtransporter classification system Mol Biotechnol 27 253ndash262

8 BuschW and SaierMH Jr (2003) The IUBMB-endorsedtransporter classification system Methods Mol Biol 227 21ndash36

9 BuschW and SaierMH Jr (2002) The transporter classification(TC) system 2002 Crit Rev Biochem Mol Biol 37 287ndash337

10 WakabayashiST ShlykovMA KumarU ReddyVMalhotraA ClarkeEL ChenJS CastilloR De La MareRSunEI et al (2013) Deducing transport protein evolution basedon sequence structure and function In ChristineAO andAlexB (eds) Protein Families Relating Protein SequenceStructure and Function 1st edn Wiley Hoboken NJ

11 SehgalAK DasS NotoK SaierMH Jr and ElkanC (2011)Identifying relevant data for a biological database handcraftedrules versus machine learning IEEEACM Trans Comput BiolBioinform 8 851ndash857

12 ZhaiY and SaierMH Jr (2001) A web-based program (WHAT)for the simultaneous prediction of hydropathy amphipathicitysecondary structure and transmembrane topology for asingle protein sequence J Mol Microbiol Biotechnol 3501ndash502

13 ReddyVS and SaierMH Jr (2012) BioV Suitemdasha collection ofprograms for the study of transport protein evolution FEBS J279 2036ndash2046

14 TusnadyGE and SimonI (2001) The HMMTOPtransmembrane topology prediction server Bioinformatics 17849ndash850

15 JonesDT (2007) Improving the accuracy of transmembraneprotein topology prediction using evolutionary informationBioinformatics 23 538ndash544

16 ViklundH BernselA SkwarkM and ElofssonA (2008)SPOCTOPUS a combined predictor of signal peptides andmembrane protein topology Bioinformatics 24 2928ndash2929

17 DevereuxJ HaeberliP and SmithiesO (1984) A comprehensiveset of sequence analysis programs for the VAX Nucleic AcidsRes 12 387ndash395

18 XenariosI RiceDW SalwinskiL BaronMK MarcotteEMand EisenbergD (2000) DIP the database of interactingproteins Nucleic Acids Res 28 289ndash291

19 PruittKD TatusovaT BrownGR and MaglottDR (2012)NCBI Reference Sequences (RefSeq) current status new features

Table 2 List of known transporter databases

Name URL Interconnectedwith TCDB

TransportDB httpwwwmembranetransportorg YesYTPdb httpytpdbbiopark-itbe YesAramemnon httparamemnonbotanikuni-koelnde NoM trunculata TDB httpbioinformaticscaueducnMtTransporterbrowsephp YesABCdb httpswww-abcdbbiotoulfr YesABCISSE httpwww1pasteurfrrechercheunitespmtgabcdatabaseiphtml NoHuman ABC TDB httpnutrigene4tcomhumanabchtm YesSLC tables httpwwwbioparadigmsorgslcintrohtm Yes in TCDBWorm SLC db httpwwwWormSLCorg NoMP struc httpblancobiomoluciedumpstruc NoUCSF PMT httppharmacogeneticsucsfedu NoARDB httpardbcbcbumdedu No

D256 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

and genome annotation policy Nucleic Acids Res 40D130ndashD135

20 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

21 PuntaM CoggillPC EberhardtRY MistryJ TateJBoursnellC PangN ForslundK CericG ClementsJ et al(2012) The Pfam protein families database Nucleic Acids Res40 D290ndashD301

22 LatendresseM PaleyS and KarpPD (2012) Browsingmetabolic and regulatory networks with BioCyc Methods MolBiol 804 197ndash216

23 KanehisaM GotoS SatoY FurumichiM and TanabeM(2012) KEGG for integration and interpretation oflarge-scale molecular data sets Nucleic Acids Res 40D109ndashD114

24 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for researchand education Nucleic Acids Res 40 D475ndashD482

25 YoumJ and SaierMH Jr (2012) Comparative analyses oftransport proteins encoded within the genomes of Mycobacteriumtuberculosis and Mycobacterium leprae Biochim Biophys Acta1818 776ndash797

26 TamangDG RabusR BaraboteRD and SaierMH Jr (2009)Comprehensive analyses of transport proteins encoded within thegenome of lsquolsquoAromatoleum aromaticumrsquorsquo strain EbN1 J MembrBiol 229 53ndash90

27 PaparoditisP VastermarkA LeAJ FuerstJA andSaierMH Jr (2013) Bioinformatic analyses of integralmembrane transport proteins encoded within the genome of theplanctomycetes species Rhodopirellula baltica BiochimBiophys Acta 1838 193ndash215

28 UniProt Consortium (2013) Update on activities at the UniversalProtein Resource (UniProt) in 2013 Nucleic Acids Res 41D43ndashD47

29 KnoxC LawV JewisonT LiuP LyS FrolkisA PonABancoK MakC NeveuV et al (2011) DrugBank 30 acomprehensive resource for lsquoomicsrsquo research on drugs NucleicAcids Res 39 D1035ndashD1041

30 Rask-AndersenM AlmenMS and SchiothHB (2011) Trendsin the exploitation of novel drug targets Nat Rev Drug Discov10 579ndash590

31 ChenJS ReddyV ChenJH ShlykovMA ZhengWHChoJ YenMR and SaierMH Jr (2011) Phylogeneticcharacterization of transport protein superfamiliessuperiority of SuperfamilyTree programs over thosebased on multiple alignments J Mol Microbiol Biotechnol 2183ndash96

32 YenMR ChoiJ and SaierMH Jr (2009) Bioinformaticanalyses of transmembrane transport novel software for deducingprotein phylogeny topology and evolution J Mol MicrobiolBiotechnol 17 163ndash176

33 YenMR ChenJS MarquezJL SunEI and SaierMH(2010) Multidrug resistance phylogenetic characterization ofsuperfamilies of secondary carriers that include drug exportersMethods Mol Biol 637 47ndash64

34 WongFH ChenJS ReddyV DayJL ShlykovMAWakabayashiST and SaierMH Jr (2012) The amino acid-polyamine-organocation superfamily J Mol MicrobiolBiotechnol 22 105ndash113

35 ReddyVS ShlykovMA CastilloR SunEI andSaierMH Jr (2012) The major facilitator superfamily (MFS)revisited FEBS J 279 2022ndash2035

36 ShlykovMA ZhengWH ChenJS and SaierMH Jr (2012)Bioinformatic characterization of the 4-Toluene Sulfonate UptakePermease (TSUP) family of transmembrane proteins BiochimBiophys Acta 1818 703ndash717

37 ChanH BabayanV BlyuminE GandhiC HakKHarakeD KumarK LeeP LiTT LiuHY et al (2010) Thep-type ATPase superfamily J Mol Microbiol Biotechnol 195ndash104

38 RettnerRE and SaierMH Jr (2010) The autoinducer-2 exportersuperfamily J Mol Microbiol Biotechnol 18 195ndash205

39 LamVH LeeJH SilverioA ChanH GomolplitinantKMPovolotskyTL OrlovaE SunEI WelliverCH andSaierMH Jr (2011) Pathways of transport protein evolutionrecent advances Biol Chem 392 5ndash12

40 ZhengWH VastermarkA ShlykovMA ReddyV SunEIand SaierMH Jr (2013) Evolutionary relationships ofATP-Binding Cassette (ABC) uptake porters BMC Microbiol13 98

41 MatiasMG GomolplitinantKM TamangDG andSaierMH Jr (2010) Animal Ca2+ release-activated Ca2+(CRAC) channels appear to be homologous to and derivedfrom the ubiquitous cation diffusion facilitators BMC Res Notes3 158

42 WangB DukarevichM SunEI YenMR and SaierMH Jr(2009) Membrane porters of ATP-binding cassette transportsystems are polyphyletic J Membr Biol 231 1ndash10

43 YeeDC ShlykovMA VastermarkA ReddyVS AroraSSunEI and SaierMH Jr (2013) The Transporter-Opsin-Gprotein-coupled receptor (TOG) Superfamily FEBS J 2805780ndash5800

44 SaierMH Jr (1994) Computer-aided analyses of transportprotein sequences gleaning evidence concerning functionstructure biogenesis and evolution Microbiol Rev 58 71ndash93

45 DoolittleRF (1994) Convergent evolution the need to beexplicit Trends Biochem Sci 19 15ndash18

46 DayhoffMO BarkerWC and HuntLT (1983) Establishinghomologies in protein sequences Methods Enzymol 91 524ndash545

47 CoyneRS HannickL ShanmugamD HostetlerJB BramiDJoardarVS JohnsonJ RaduneD SinghI BadgerJH et al(2011) Comparative genomics of the pathogenic ciliateIchthyophthirius multifiliis its free-living relatives and a hostspecies provide insights into adoption of a parasitic lifestyle andprospects for disease control Genome Biol 12 R100

48 PodarM AndersonI MakarovaKS ElkinsJG IvanovaNWallMA LykidisA MavromatisK SunH HudsonME et al(2008) A genomic analysis of the archaeal system Ignicoccushospitalis-Nanoarchaeum equitans Genome Biol 9 R158

49 ZhaiY and SaierMH Jr (2002) A simple sensitive program fordetecting internal repeats in sets of multiply aligned homologousproteins J Mol Microbiol Biotechnol 4 375ndash377

50 ZhaiY and SaierMH Jr (2001) A web-based program for theprediction of average hydropathy average amphipathicity andaverage similarity of multiply aligned homologous proteinsJ Mol Microbiol Biotechnol 3 285ndash286

51 SilverioAL and SaierMH Jr (2011) Bioinformaticcharacterization of the trimeric intracellular cation-specific channelprotein family J Membr Biol 241 77ndash101

52 GomolplitinantKM and SaierMH Jr (2011) Evolutionof the oligopeptide transporter family J Membr Biol 24089ndash110

53 TsaiJC YenMR CastilloR LeytonDL HendersonIRand SaierMH Jr (2010) The bacterial intimins and invasins alarge and novel family of secreted proteins PLoS One 5 e14403

54 CastilloR and SaierMH (2010) Functional promiscuity ofhomologues of the bacterial ArsA ATPases Int J Microbiol2010 187373

55 PovolotskyTL OrlovaE TamangDG and SaierMH Jr(2010) Defense against cannibalism the SdpI family of bacterialimmunitysignal transduction proteins J Membr Biol 235145ndash162

56 XiaoAY WangJ and SaierMH (2010) Bacterial adaptormembrane fusion proteins and the structurally dissimilar outermembrane auxiliary proteins have exchanged central domainsin alpha-proteobacteria Int J Microbiol 2010 589391

57 TheverMD and SaierMH Jr (2009) Bioinformaticcharacterization of p-type ATPases encoded within the fullysequenced genomes of 26 eukaryotes J Membr Biol 229115ndash130

58 VastermarkA and SaierMH Jr (2013) Evolutionary relationshipbetween 5+5 and 7+7 inverted repeat folds within the aminoacid-polyamine-organocation superfamily Proteins August 28(doi 101002prot24401 epub ahead of print)

59 RenQ ChenK and PaulsenIT (2007) TransportDB acomprehensive database resource for cytoplasmic membrane

Nucleic Acids Research 2014 Vol 42 Database issue D257

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

transport systems and outer membrane channels Nucleic AcidsRes 35 D274ndashD279

60 BroheeS BarriotR MoreauY and AndreB (2010) YTPdba wiki database of yeast membrane transporters BiochimBiophys Acta 1798 1908ndash1912

61 SchwackeR SchneiderA van der GraaffE FischerKCatoniE DesimoneM FrommerWB FluggeUI andKunzeR (2003) ARAMEMNON a novel database forArabidopsis integral membrane proteins Plant Physiol 13116ndash26

62 MiaoZ LiD ZhangZ DongJ SuZ and WangT (2012)Medicago truncatula transporter database a comprehensivedatabase resource for M truncatula transporters BMC Genomics13 60

63 FichantG BasseMJ and QuentinY (2006) ABCdb an onlineresource for ABC transporter repertories from sequenced archaealand bacterial genomes FEMS Microbiol Lett 256 333ndash339

64 BouigeP LaurentD PiloyanL and DassaE (2002)Phylogenetic and functional classification of ATP-binding cassette(ABC) systems Curr Protein Pept Sci 3 541ndash559

65 VasiliouV VasiliouK and NebertDW (2009) Human ATP-binding cassette (ABC) transporter family Hum Genomics 3281ndash290

66 HedigerMA ClemenconB BurrierRE and BrufordEA(2013) The ABCs of membrane transporters in healthand disease (SLC series) introduction Mol Aspects Med 3495ndash107

67 WhiteSH (2009) Biophysical dissection of membrane proteinsNature 459 344ndash346

68 VastermarkA AlmenMS SimmenMW FredrikssonR andSchiothHB (2011) Functional specialization in nucleotide sugartransporters occurred through differentiation of the gene clusterEamA (DUF6) before the radiation of Viridiplantae BMC EvolBiol 11 123

69 ReddyBL and SaierMH Jr (2013) Topological andphylogenetic analyses of bacterial holin families and superfamiliesBiochim Biophys Acta 1828 2654ndash2671

70 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res41 D613ndashD624

71 ChenJ SwamidassSJ DouY BruandJ and BaldiP (2005)ChemDB a public database of small molecules and relatedchemoinformatics resources Bioinformatics 21 4133ndash4139

D258 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

Page 3: The Transporter Classification Database - Nucleic Acids Research

topologyorientation and (x) database of interactingproteins (DIPs) and Pfam reference

The user is also given an option of either BLASTINGPSI-BLASTING the protein against the non-redundantNational Center for Biotechnology Information (NCBI)or TCDB (accessed from the sidebar) Additionalanalysis options such as predicting number of TMSsthrough hydropathy plots are also available (see below)

TCDB collaborates with many important databases (seeReference 10 for recent technical improvements) andshares cross-database links with them these are availableon the individual protein pages Internal hyperlinks con-necting references to classes families and proteins havebeen updated

RECENT TECHNICAL IMPROVEMENTS (2011ndash13)

We have

(1) Incorporated an improved administration pagebuilt-in semi-automatic machine learning tools (11)and user contributions allowing protein historytracking see Wakabayashi et al (10)

(2) Updated software to BLAST 2227(3) Replaced the WHAT program (12) with a function-

ally similar python version to increase speed andreliability

(4) Made the TCDB BLAST database availablegenerated in real-time

(5) Made the TMSTATS Program (13) available foranalyzing topological (TMS) statistics using threedifferent topological prediction programsHMMTOP (14) MEMSAT (15) andSPOCTUPUS (16) giving histograms of TMS dis-tribution for any protein or for any TC classsubclass family subfamily or any combination ofthese

(6) Made Global Sequence Alignment Tool (GSAT)(13) available for performing pairwise alignmentsGSAT performs a shuffle-based alignment todetect distant homologs using the Needleman andWunsch algorithm

(7) Implemented Protocols 12 Protocol 1 runs a PSI-BLAST search of the NCBI protein database withiterations collects results removes redundantsmallsimilar sequences annotates tabulates and countsTMSs Protocol 2 allows the rapid identificationand quantitative evaluation of homologs betweenany two FASTA files using the GSAT program(13)

(8) Established a homology section that replaces theGAP (17) and ICC programs with GSAT andProtocol 2 (13) and included class-wide compari-sons that can be performed with these programs

(9) Incorporated a semi-automatic protein screeningprogram

(10) Cross-referenced TCDB with HOGENOM (httppbiluniv-lyon1frdatabaseshogenomacceuilphp)DIP (18) RefSeq (19) Entrez (20) Pfam (21)BioCyc (22) KEGG (23) PDB (24) and DrugBank

(11) Improved search tools that now separate results bysystem cluster family superfamily and reference

(12) Implemented GBLAST which provides a searchtool designed to identify potential transporters infully sequenced genomes or DNA segments (25ndash27)

(13) Implemented Ancient Rep which provides horizon-tal and vertical search approaches to find trans-membrane repeat units within a single protein ora list of homologs respectively (13)

(14) Updated UniProtKB (28) cross-reference files witha continuously updated dynamic version as of 15August 2013

(15) Provided links to DrugBank (29) allowing reso-lution to the well-known validated human drugtargets presented by Rask-Andersen et al (30) aswell as bacterial drug targets

(16) Implemented the Superfamily Tree programs SFT1and SFT2 which use tens of thousands of BLASTbit scores instead of multiple alignments thusavoiding the pitfalls often encountered whendetermining the phylogeny of distantly relatedproteins (31ndash33) While SFT1 constructs treesallowing visualization of individual proteins SFT2allows depiction of familysubfamily relationships(31ndash33)

(17) Provided a mechanism for user-generated input

GROWTH OF THE DATABASE (2010ndash13)

A file containing the current sequence set is available fordownload from httpwwwtcdborgpublictcdb About150 TC families are introduced each year reflecting theextensive and continual manual curation work being con-ducted Figure 2 shows the parallel growth of TCDBprotein family and superfamily compositions from 2010to 2013 However it should be noted that each yearseveral families in Class 9 are moved to classes 1ndash5 whensufficient information becomes available to allow defin-ition of their mechanisms of action

SUPERFAMILY ADDITIONS (2011ndash13)

Analyses (34ndash43) have revealed distant relationshipsbetween preexisting TC families These relationshipshave been integrated into TCDB as a hyperlink andsuperfamily relationships are mentioned with hyperlinksin the description of each constituent family Thenumber of superfamilies that are either new or expanded(marked with superscript lsquoarsquo in Table 1) has more thandoubled during the last 3 years (Figure 2) and thefurther expansion of such knowledge continues

ESTABLISHING HOMOLOGY BETWEEN PROTEINSUSING TCDB-RELATED SOFTWARE

Affiliation with a family requires satisfying rigorous stat-istical criteria of homology Superfamily status is based onthe superfamily principle (4445) stating that if protein Ais homologous to protein B and protein B is homologousto protein C then protein A must be homologous to

Nucleic Acids Research 2014 Vol 42 Database issue D253

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

protein C regardless of the degree of sequence similarityobserved between proteins A and C To avoid the concernof convergent evolution the minimal length of alignedsequences to establish homology is 60 residues and thecomparison score must be at least 12 standard deviationsusing the GSAT program [see also Wakabayashi et al(10)] As the protein databases grow this value must beincreased (44ndash46) It should be noted that homologymeans lsquoderived from a common evolutionary originrsquoHomology is therefore an absolute term and does notrequire a specific degree of sequence similarity betweenany two protein sequences such as sequences A and Cdiscussed above (45)Summarizing we have developed and perfected novel

tools suited for the analysis of transporters (httpsaier-144-21ucsdedu) These are geared toward (i) superfamilyrecognition (ii) detection of internal repeats (iii) genomeanalyses of transporters (25264748) (iv) integralmembrane topological analyses (31ndash334950) and (v)family (3851ndash58)superfamily phylogenetic tree construc-tion using two very different methods (31ndash33) Theseprograms can be found in the lsquoBioToolsrsquo link of TCDBA reference resource providing detailed information onthese programs can be found in our Wiki (http13223914424) and in a chapter of a recent book edited byChristine A Orengo (10)

OTHER TRANSPORT DATABASES

Only TCDB is comprehensive including transport systemsfrom all living organisms and only TCDB has beenadopted by the IUBMB However several databases havebeen developed (see Table 2) which represent transportersin restricted groups of organisms or are restricted to acertain category of transporter (i) TransportDB (59)contains computerized annotations of transport proteinsin organisms with fully sequenced genomes and classifiesthem according to TCDB using a semi-automated pipeline(ii) YTPdb (60) includes 298 Saccharomyces cerevisiaetransporter proteins It is organized by TC class

although TCs are not provided Each entry is a wikiwhere users can contribute It is easy to use but lacks thedetailed text descriptions of sequences and families that canbe found in TCDB (iii) Aramemnon (61) providesmanually curated protein descriptions for six plantspecies using a clustering algorithm that has been appliedon a matrix of pairwise distances between sequences (iv)TheMedicago trunculata transporter database (62) focuseson transporters in a single plant genome based on TCDB(v) ABCdb (63) contains lists of ABC transporters in pro-karyotes in 21 families with functional predictionsimproved by the addition of references to TCDB (vi)ABCISSE (64) tabulates 34 324 partners of 13 276 ABCtransporter systems in 276 genomes It is built around aphylogeny of 34 families of ABC ATPases (not themembrane constituents) organized in three classes withtext descriptions only for the families TCDB currentlyincludes 92 families of ABC transporter systems 35families of uptake porters 45 families of prokaryotic ex-porters and 12 families of eukaryotic exporters (vii) TheHuman ATP-Binding Cassette Transporters (httpnutrigene4tcomhumanabchtm) categorizes 49 transportsystems into subfamilies AndashG (65) It is a list not adatabase providing some links to other resources Allthese human transporters have been entered into TCDB(viii) SLC tables (66) classify secondary carriers inmammals especially human and mouse SLC contains 52families compared with 115 in the equivalent TC subclassof 2A We have interconnected the two systems andincluded all human carriers in TCDB The tables revealingthe family relationships between the TC and SLC systemscan be found at the top of subclass 2A in TCDB The wormSLC database lists multiple homologs of individual SLCsin Caenorhabditis elegans (ix) The membrane proteins ofknown three-dimensional structure database (67) contains379 entries that constitute a subset of PDB not all of themtransporters PDB entries are grouped broadly by type (x)The UCSF PMT is a SNP database showing schematicdiagrams of transporters with SNPs marked out in thesequence but does not attempt to provide TC numbers

20132010 2011 2012

6

7

8

9

10

20

25

30

35

40

45

50

6

7

8

75

65 of

pro

tein

s x

10-3

of

fam

ilies

10-2

of

sup

erfa

mili

es|| | |

| snietorP

ylima

F

ylimafr epu

S

Figure 2 Growth of TCDB since August 2010 (A) Number of thousands of proteins (solid line) (B) number of hundreds of families (broken line)(C) number of superfamilies (dashed line) Numbers of proteins families and superfamilies in TCDB as of 19 August 2013 were 9853 778 and 49respectively

D254 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

(xi) The ARDB contains antibiotic resistance genesproviding a list of four types of multidrug resistance trans-porter types ABC (TC 3A1) MFS (TC 2A1) RND(TC 2A6) and SMR (TC 2A71)

HARMONIZATION AND FUTURE GOALS

The most important goals we have identified for futuredevelopment of TCDB include (i) the creation of anontology for the TCDB database (ii) improving ourintegration with Pfam and (iii) streamlining the use ofphylogeny and synteny information to provide functionalpredictions Some of the new functions will be

implemented as links and some as software Syntenyshould probably be implemented as links because the in-formation is often already available elsewhere (MicrobesOnline JGIrsquos intuitive resource IMG SEED andRegPredict) Pfam may prove more difficult becausemany families in Pfam are incomplete or not appropriatelyarranged in clans Working with Pfam as we have in thepast (69) we plan to improve upon the transport proteinsection of this databaseIt is well-known that many families that include domain

duplicated transporters do not accurately reflect thedomain borders in the way hidden Markov models(HMMs) have been trained (68) Currently we do notshow lsquorepeat unitsrsquo in TCDB even though this informa-tion is presented in many of our publications We willcontinue to work with Pfam to integrate and coordinateinformation in both databases in a systematic way (69)Ideally such a process should be automated or semi-automatedAnother worthwhile goal is to establish the user base so

we can serve the needs of the scientific community moreeffectively We plan to collect more access statistics tounderstand the needs of the user community GoogleAnalytics was installed in 2011 but improvements arerequired so we can recognize which TCDB features aremost usedOne million PubMed abstracts are created every year

and 10 of the 2012 abstracts were not indexed as of May2013 Other databases that link to TCDB such asEcoGene (70) manually review literature lsquoTransporterrsquois a MESH term PubMed uses but there is a 6-monthdelay to add MESH terms and sometimes the wordlsquoTransporterrsquo is not obvious from the title TCDB usesmachine learning classifiers as well as keyword searcheswhich are continuously extracted from TCDB and used assearch terms to identify relevant articles We are consider-ing new ways for users to provide sequence data andinformation either with or without the use of email sug-gestions by email could be used as test sets to evaluate theefficiency of an automated process We are also consider-ing implementing links for reference sequence and infor-mation input from users Adding a feature allowingTCDB to be searched as a library of HMMs is alsounder consideration Current TCDB users report thatthe present system of presenting search results is satisfac-tory but we constantly strive to improve the databaseand suggestions from users are most welcomeTCDB needs an ontological hierarchical system and a

controlled vocabulary EBIrsquos ChemDB (71) has created achemical classification system and we have already set upa prototype which can be accessed from this link httpwwwtcdborgontology The substrate text needs to beextracted from the description and then correlated withChemDB One system already exists but due toinconsistencies in the description it has been difficult toimplement If we could link with gene ontology TCnumbers would be more accessible Another importantarea for improvement concerns user access to the mostrecent entries Perhaps TCDB should have lsquorecent re-leasesrsquo such as those of Pfam Since we already trackprotein histories adding this feature would not be

Table 1 Transport protein superfamilies in TCDB

1 Aerolysina

2 Amino acidPolyamineorganoCation (APC)a

3 ATP-Binding Cassette-1 (ABC1)4 ATP-Binding Cassette-2 (ABC2) with the ECF

sub-superfamily5 ATP-Binding Cassette-3 (ABC3)6 Bacterial bacteriocin (BB)a

7 Bilearseniteriboflavin transporter (BART)a

8 Cation diffusion facilitator (CDF)a

9 CationProton antiporter (CPA)10 Cecropin11 Circular bacterial bacteriocin (CBB)a

12 Claudina

13 Corynebacterial PorAPorHa

14 Defensin15 Drugmetabolite transporter (DMT)16 Endomembrane protein translocon (EMPT)a

17 Epithelial Na+ channel (ENaCP2X)18 Gap junction (GJ)a

19 General bacterial porin (GBP)20 Holin Ia

21 Holin IIa

22 Holin IIIa

23 Holin IVa

24 Holin Va

25 Holin VIa

26 Holin VIIa

27 Huwentoxin28 Ion transporter (IT)29 Lysine exporter (LysE)30 Major facilitator (MFS)a

31 Major intrinsic protein (MIP)a

32 Melittin33 Membrane attack complexperforin (MACPF)a

34 Mercury (Mer)35 Mitochondrial carrier (MC)36 Mycobacterialnocardial porin (MspA)a

37 Multidrugoligosaccharidyl-lipidpolysaccharide (MOP)Flippasea

38 P-type ATPase (P-ATPase)39 Phosphotransferase system AscGat (PTS-AG)40 Phosphotransferase system GlcFruLac (PTS-GFL)41 Resistance-nodulation-cell division (RND)42 RTX-toxin43 T4 immunity (T4 IMM)a

44 Transmembrane inner membrane-17 (Tim17)45 TransporteropsinG protein-coupled receptor (TOG)46 TRCTAMP-B (TRCTAMP)a

47 Outer membrane protein (OMP) insertase (YaeTTpsB)48 Voltage-gated ion channel (42)49 Viral envelope glycoprotein (Env)a

aNew or recently expanded superfamilies

Nucleic Acids Research 2014 Vol 42 Database issue D255

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

difficult Some basic statistics where database growth canbe followed are already available at httpwwwtcdborgsearchindexphpWe are currently undertaking the development of

standardized workflows to confirm homology resultsfrom TCDBrsquos in-house statistical methods based on struc-tural superimposition and HMMHMM comparisonsFor instance we use structural superimposition inaddition to sequence statistical analyses to identify orconfirm structural and evolutionary relationshipsbetween members of a superfamily (40) This helps to es-tablish reference points in structural space for homologydetection

CONCLUSION

In 2006 TCDB contained 3000 proteins classified into400 families but in 2013 it exceeded 10 000 proteins in750 families The availability of TCDB has allowedmajor basic research advances including answering funda-mental biological questions determining the routes ofevolution taken for the appearance of these proteins iden-tifying superfamily relationships and allowing structuralfunctional and mechanistic predictions Within practicallimits TCDB reflects the current state of our knowledgeconcerning its constituent parts

FUNDING

TCDB is supported by NIH [GM 077402-05 and GM094610-01] Funding for open access charge NIH

Conflict of interest statement None declared

REFERENCES

1 FleischmannRD AdamsMD WhiteO ClaytonRAKirknessEF KerlavageAR BultCJ TombJFDoughertyBA MerrickJM et al (1995) Whole-genomerandom sequencing and assembly of Haemophilus influenzae RdScience 269 496ndash512

2 BairochA (1994) The ENZYME data bank Nucleic Acids Res22 3626ndash3627

3 SaierMH Jr (1994) Computer-aided analyses oftransport protein sequences gleaning evidence concerning

function structure biogenesis and evolution Microbiol Rev 5871ndash93

4 SaierMH Jr TranCV and BaraboteRD (2006) TCDB theTransporter Classification Database for membrane transportprotein analyses and information Nucleic Acids Res 34D181ndashD186

5 SaierMH Jr YenMR NotoK TamangDG and ElkanC(2009) The Transporter Classification Database recent advancesNucleic Acids Res 37 D274ndashD278

6 SaierMH Jr (2000) A functional-phylogenetic classificationsystem for transmembrane solute transporters Microbiol MolBiol Rev 64 354ndash411

7 BuschW and SaierMH Jr (2004) The IUBMB-endorsedtransporter classification system Mol Biotechnol 27 253ndash262

8 BuschW and SaierMH Jr (2003) The IUBMB-endorsedtransporter classification system Methods Mol Biol 227 21ndash36

9 BuschW and SaierMH Jr (2002) The transporter classification(TC) system 2002 Crit Rev Biochem Mol Biol 37 287ndash337

10 WakabayashiST ShlykovMA KumarU ReddyVMalhotraA ClarkeEL ChenJS CastilloR De La MareRSunEI et al (2013) Deducing transport protein evolution basedon sequence structure and function In ChristineAO andAlexB (eds) Protein Families Relating Protein SequenceStructure and Function 1st edn Wiley Hoboken NJ

11 SehgalAK DasS NotoK SaierMH Jr and ElkanC (2011)Identifying relevant data for a biological database handcraftedrules versus machine learning IEEEACM Trans Comput BiolBioinform 8 851ndash857

12 ZhaiY and SaierMH Jr (2001) A web-based program (WHAT)for the simultaneous prediction of hydropathy amphipathicitysecondary structure and transmembrane topology for asingle protein sequence J Mol Microbiol Biotechnol 3501ndash502

13 ReddyVS and SaierMH Jr (2012) BioV Suitemdasha collection ofprograms for the study of transport protein evolution FEBS J279 2036ndash2046

14 TusnadyGE and SimonI (2001) The HMMTOPtransmembrane topology prediction server Bioinformatics 17849ndash850

15 JonesDT (2007) Improving the accuracy of transmembraneprotein topology prediction using evolutionary informationBioinformatics 23 538ndash544

16 ViklundH BernselA SkwarkM and ElofssonA (2008)SPOCTOPUS a combined predictor of signal peptides andmembrane protein topology Bioinformatics 24 2928ndash2929

17 DevereuxJ HaeberliP and SmithiesO (1984) A comprehensiveset of sequence analysis programs for the VAX Nucleic AcidsRes 12 387ndash395

18 XenariosI RiceDW SalwinskiL BaronMK MarcotteEMand EisenbergD (2000) DIP the database of interactingproteins Nucleic Acids Res 28 289ndash291

19 PruittKD TatusovaT BrownGR and MaglottDR (2012)NCBI Reference Sequences (RefSeq) current status new features

Table 2 List of known transporter databases

Name URL Interconnectedwith TCDB

TransportDB httpwwwmembranetransportorg YesYTPdb httpytpdbbiopark-itbe YesAramemnon httparamemnonbotanikuni-koelnde NoM trunculata TDB httpbioinformaticscaueducnMtTransporterbrowsephp YesABCdb httpswww-abcdbbiotoulfr YesABCISSE httpwww1pasteurfrrechercheunitespmtgabcdatabaseiphtml NoHuman ABC TDB httpnutrigene4tcomhumanabchtm YesSLC tables httpwwwbioparadigmsorgslcintrohtm Yes in TCDBWorm SLC db httpwwwWormSLCorg NoMP struc httpblancobiomoluciedumpstruc NoUCSF PMT httppharmacogeneticsucsfedu NoARDB httpardbcbcbumdedu No

D256 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

and genome annotation policy Nucleic Acids Res 40D130ndashD135

20 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

21 PuntaM CoggillPC EberhardtRY MistryJ TateJBoursnellC PangN ForslundK CericG ClementsJ et al(2012) The Pfam protein families database Nucleic Acids Res40 D290ndashD301

22 LatendresseM PaleyS and KarpPD (2012) Browsingmetabolic and regulatory networks with BioCyc Methods MolBiol 804 197ndash216

23 KanehisaM GotoS SatoY FurumichiM and TanabeM(2012) KEGG for integration and interpretation oflarge-scale molecular data sets Nucleic Acids Res 40D109ndashD114

24 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for researchand education Nucleic Acids Res 40 D475ndashD482

25 YoumJ and SaierMH Jr (2012) Comparative analyses oftransport proteins encoded within the genomes of Mycobacteriumtuberculosis and Mycobacterium leprae Biochim Biophys Acta1818 776ndash797

26 TamangDG RabusR BaraboteRD and SaierMH Jr (2009)Comprehensive analyses of transport proteins encoded within thegenome of lsquolsquoAromatoleum aromaticumrsquorsquo strain EbN1 J MembrBiol 229 53ndash90

27 PaparoditisP VastermarkA LeAJ FuerstJA andSaierMH Jr (2013) Bioinformatic analyses of integralmembrane transport proteins encoded within the genome of theplanctomycetes species Rhodopirellula baltica BiochimBiophys Acta 1838 193ndash215

28 UniProt Consortium (2013) Update on activities at the UniversalProtein Resource (UniProt) in 2013 Nucleic Acids Res 41D43ndashD47

29 KnoxC LawV JewisonT LiuP LyS FrolkisA PonABancoK MakC NeveuV et al (2011) DrugBank 30 acomprehensive resource for lsquoomicsrsquo research on drugs NucleicAcids Res 39 D1035ndashD1041

30 Rask-AndersenM AlmenMS and SchiothHB (2011) Trendsin the exploitation of novel drug targets Nat Rev Drug Discov10 579ndash590

31 ChenJS ReddyV ChenJH ShlykovMA ZhengWHChoJ YenMR and SaierMH Jr (2011) Phylogeneticcharacterization of transport protein superfamiliessuperiority of SuperfamilyTree programs over thosebased on multiple alignments J Mol Microbiol Biotechnol 2183ndash96

32 YenMR ChoiJ and SaierMH Jr (2009) Bioinformaticanalyses of transmembrane transport novel software for deducingprotein phylogeny topology and evolution J Mol MicrobiolBiotechnol 17 163ndash176

33 YenMR ChenJS MarquezJL SunEI and SaierMH(2010) Multidrug resistance phylogenetic characterization ofsuperfamilies of secondary carriers that include drug exportersMethods Mol Biol 637 47ndash64

34 WongFH ChenJS ReddyV DayJL ShlykovMAWakabayashiST and SaierMH Jr (2012) The amino acid-polyamine-organocation superfamily J Mol MicrobiolBiotechnol 22 105ndash113

35 ReddyVS ShlykovMA CastilloR SunEI andSaierMH Jr (2012) The major facilitator superfamily (MFS)revisited FEBS J 279 2022ndash2035

36 ShlykovMA ZhengWH ChenJS and SaierMH Jr (2012)Bioinformatic characterization of the 4-Toluene Sulfonate UptakePermease (TSUP) family of transmembrane proteins BiochimBiophys Acta 1818 703ndash717

37 ChanH BabayanV BlyuminE GandhiC HakKHarakeD KumarK LeeP LiTT LiuHY et al (2010) Thep-type ATPase superfamily J Mol Microbiol Biotechnol 195ndash104

38 RettnerRE and SaierMH Jr (2010) The autoinducer-2 exportersuperfamily J Mol Microbiol Biotechnol 18 195ndash205

39 LamVH LeeJH SilverioA ChanH GomolplitinantKMPovolotskyTL OrlovaE SunEI WelliverCH andSaierMH Jr (2011) Pathways of transport protein evolutionrecent advances Biol Chem 392 5ndash12

40 ZhengWH VastermarkA ShlykovMA ReddyV SunEIand SaierMH Jr (2013) Evolutionary relationships ofATP-Binding Cassette (ABC) uptake porters BMC Microbiol13 98

41 MatiasMG GomolplitinantKM TamangDG andSaierMH Jr (2010) Animal Ca2+ release-activated Ca2+(CRAC) channels appear to be homologous to and derivedfrom the ubiquitous cation diffusion facilitators BMC Res Notes3 158

42 WangB DukarevichM SunEI YenMR and SaierMH Jr(2009) Membrane porters of ATP-binding cassette transportsystems are polyphyletic J Membr Biol 231 1ndash10

43 YeeDC ShlykovMA VastermarkA ReddyVS AroraSSunEI and SaierMH Jr (2013) The Transporter-Opsin-Gprotein-coupled receptor (TOG) Superfamily FEBS J 2805780ndash5800

44 SaierMH Jr (1994) Computer-aided analyses of transportprotein sequences gleaning evidence concerning functionstructure biogenesis and evolution Microbiol Rev 58 71ndash93

45 DoolittleRF (1994) Convergent evolution the need to beexplicit Trends Biochem Sci 19 15ndash18

46 DayhoffMO BarkerWC and HuntLT (1983) Establishinghomologies in protein sequences Methods Enzymol 91 524ndash545

47 CoyneRS HannickL ShanmugamD HostetlerJB BramiDJoardarVS JohnsonJ RaduneD SinghI BadgerJH et al(2011) Comparative genomics of the pathogenic ciliateIchthyophthirius multifiliis its free-living relatives and a hostspecies provide insights into adoption of a parasitic lifestyle andprospects for disease control Genome Biol 12 R100

48 PodarM AndersonI MakarovaKS ElkinsJG IvanovaNWallMA LykidisA MavromatisK SunH HudsonME et al(2008) A genomic analysis of the archaeal system Ignicoccushospitalis-Nanoarchaeum equitans Genome Biol 9 R158

49 ZhaiY and SaierMH Jr (2002) A simple sensitive program fordetecting internal repeats in sets of multiply aligned homologousproteins J Mol Microbiol Biotechnol 4 375ndash377

50 ZhaiY and SaierMH Jr (2001) A web-based program for theprediction of average hydropathy average amphipathicity andaverage similarity of multiply aligned homologous proteinsJ Mol Microbiol Biotechnol 3 285ndash286

51 SilverioAL and SaierMH Jr (2011) Bioinformaticcharacterization of the trimeric intracellular cation-specific channelprotein family J Membr Biol 241 77ndash101

52 GomolplitinantKM and SaierMH Jr (2011) Evolutionof the oligopeptide transporter family J Membr Biol 24089ndash110

53 TsaiJC YenMR CastilloR LeytonDL HendersonIRand SaierMH Jr (2010) The bacterial intimins and invasins alarge and novel family of secreted proteins PLoS One 5 e14403

54 CastilloR and SaierMH (2010) Functional promiscuity ofhomologues of the bacterial ArsA ATPases Int J Microbiol2010 187373

55 PovolotskyTL OrlovaE TamangDG and SaierMH Jr(2010) Defense against cannibalism the SdpI family of bacterialimmunitysignal transduction proteins J Membr Biol 235145ndash162

56 XiaoAY WangJ and SaierMH (2010) Bacterial adaptormembrane fusion proteins and the structurally dissimilar outermembrane auxiliary proteins have exchanged central domainsin alpha-proteobacteria Int J Microbiol 2010 589391

57 TheverMD and SaierMH Jr (2009) Bioinformaticcharacterization of p-type ATPases encoded within the fullysequenced genomes of 26 eukaryotes J Membr Biol 229115ndash130

58 VastermarkA and SaierMH Jr (2013) Evolutionary relationshipbetween 5+5 and 7+7 inverted repeat folds within the aminoacid-polyamine-organocation superfamily Proteins August 28(doi 101002prot24401 epub ahead of print)

59 RenQ ChenK and PaulsenIT (2007) TransportDB acomprehensive database resource for cytoplasmic membrane

Nucleic Acids Research 2014 Vol 42 Database issue D257

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

transport systems and outer membrane channels Nucleic AcidsRes 35 D274ndashD279

60 BroheeS BarriotR MoreauY and AndreB (2010) YTPdba wiki database of yeast membrane transporters BiochimBiophys Acta 1798 1908ndash1912

61 SchwackeR SchneiderA van der GraaffE FischerKCatoniE DesimoneM FrommerWB FluggeUI andKunzeR (2003) ARAMEMNON a novel database forArabidopsis integral membrane proteins Plant Physiol 13116ndash26

62 MiaoZ LiD ZhangZ DongJ SuZ and WangT (2012)Medicago truncatula transporter database a comprehensivedatabase resource for M truncatula transporters BMC Genomics13 60

63 FichantG BasseMJ and QuentinY (2006) ABCdb an onlineresource for ABC transporter repertories from sequenced archaealand bacterial genomes FEMS Microbiol Lett 256 333ndash339

64 BouigeP LaurentD PiloyanL and DassaE (2002)Phylogenetic and functional classification of ATP-binding cassette(ABC) systems Curr Protein Pept Sci 3 541ndash559

65 VasiliouV VasiliouK and NebertDW (2009) Human ATP-binding cassette (ABC) transporter family Hum Genomics 3281ndash290

66 HedigerMA ClemenconB BurrierRE and BrufordEA(2013) The ABCs of membrane transporters in healthand disease (SLC series) introduction Mol Aspects Med 3495ndash107

67 WhiteSH (2009) Biophysical dissection of membrane proteinsNature 459 344ndash346

68 VastermarkA AlmenMS SimmenMW FredrikssonR andSchiothHB (2011) Functional specialization in nucleotide sugartransporters occurred through differentiation of the gene clusterEamA (DUF6) before the radiation of Viridiplantae BMC EvolBiol 11 123

69 ReddyBL and SaierMH Jr (2013) Topological andphylogenetic analyses of bacterial holin families and superfamiliesBiochim Biophys Acta 1828 2654ndash2671

70 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res41 D613ndashD624

71 ChenJ SwamidassSJ DouY BruandJ and BaldiP (2005)ChemDB a public database of small molecules and relatedchemoinformatics resources Bioinformatics 21 4133ndash4139

D258 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

Page 4: The Transporter Classification Database - Nucleic Acids Research

protein C regardless of the degree of sequence similarityobserved between proteins A and C To avoid the concernof convergent evolution the minimal length of alignedsequences to establish homology is 60 residues and thecomparison score must be at least 12 standard deviationsusing the GSAT program [see also Wakabayashi et al(10)] As the protein databases grow this value must beincreased (44ndash46) It should be noted that homologymeans lsquoderived from a common evolutionary originrsquoHomology is therefore an absolute term and does notrequire a specific degree of sequence similarity betweenany two protein sequences such as sequences A and Cdiscussed above (45)Summarizing we have developed and perfected novel

tools suited for the analysis of transporters (httpsaier-144-21ucsdedu) These are geared toward (i) superfamilyrecognition (ii) detection of internal repeats (iii) genomeanalyses of transporters (25264748) (iv) integralmembrane topological analyses (31ndash334950) and (v)family (3851ndash58)superfamily phylogenetic tree construc-tion using two very different methods (31ndash33) Theseprograms can be found in the lsquoBioToolsrsquo link of TCDBA reference resource providing detailed information onthese programs can be found in our Wiki (http13223914424) and in a chapter of a recent book edited byChristine A Orengo (10)

OTHER TRANSPORT DATABASES

Only TCDB is comprehensive including transport systemsfrom all living organisms and only TCDB has beenadopted by the IUBMB However several databases havebeen developed (see Table 2) which represent transportersin restricted groups of organisms or are restricted to acertain category of transporter (i) TransportDB (59)contains computerized annotations of transport proteinsin organisms with fully sequenced genomes and classifiesthem according to TCDB using a semi-automated pipeline(ii) YTPdb (60) includes 298 Saccharomyces cerevisiaetransporter proteins It is organized by TC class

although TCs are not provided Each entry is a wikiwhere users can contribute It is easy to use but lacks thedetailed text descriptions of sequences and families that canbe found in TCDB (iii) Aramemnon (61) providesmanually curated protein descriptions for six plantspecies using a clustering algorithm that has been appliedon a matrix of pairwise distances between sequences (iv)TheMedicago trunculata transporter database (62) focuseson transporters in a single plant genome based on TCDB(v) ABCdb (63) contains lists of ABC transporters in pro-karyotes in 21 families with functional predictionsimproved by the addition of references to TCDB (vi)ABCISSE (64) tabulates 34 324 partners of 13 276 ABCtransporter systems in 276 genomes It is built around aphylogeny of 34 families of ABC ATPases (not themembrane constituents) organized in three classes withtext descriptions only for the families TCDB currentlyincludes 92 families of ABC transporter systems 35families of uptake porters 45 families of prokaryotic ex-porters and 12 families of eukaryotic exporters (vii) TheHuman ATP-Binding Cassette Transporters (httpnutrigene4tcomhumanabchtm) categorizes 49 transportsystems into subfamilies AndashG (65) It is a list not adatabase providing some links to other resources Allthese human transporters have been entered into TCDB(viii) SLC tables (66) classify secondary carriers inmammals especially human and mouse SLC contains 52families compared with 115 in the equivalent TC subclassof 2A We have interconnected the two systems andincluded all human carriers in TCDB The tables revealingthe family relationships between the TC and SLC systemscan be found at the top of subclass 2A in TCDB The wormSLC database lists multiple homologs of individual SLCsin Caenorhabditis elegans (ix) The membrane proteins ofknown three-dimensional structure database (67) contains379 entries that constitute a subset of PDB not all of themtransporters PDB entries are grouped broadly by type (x)The UCSF PMT is a SNP database showing schematicdiagrams of transporters with SNPs marked out in thesequence but does not attempt to provide TC numbers

20132010 2011 2012

6

7

8

9

10

20

25

30

35

40

45

50

6

7

8

75

65 of

pro

tein

s x

10-3

of

fam

ilies

10-2

of

sup

erfa

mili

es|| | |

| snietorP

ylima

F

ylimafr epu

S

Figure 2 Growth of TCDB since August 2010 (A) Number of thousands of proteins (solid line) (B) number of hundreds of families (broken line)(C) number of superfamilies (dashed line) Numbers of proteins families and superfamilies in TCDB as of 19 August 2013 were 9853 778 and 49respectively

D254 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

(xi) The ARDB contains antibiotic resistance genesproviding a list of four types of multidrug resistance trans-porter types ABC (TC 3A1) MFS (TC 2A1) RND(TC 2A6) and SMR (TC 2A71)

HARMONIZATION AND FUTURE GOALS

The most important goals we have identified for futuredevelopment of TCDB include (i) the creation of anontology for the TCDB database (ii) improving ourintegration with Pfam and (iii) streamlining the use ofphylogeny and synteny information to provide functionalpredictions Some of the new functions will be

implemented as links and some as software Syntenyshould probably be implemented as links because the in-formation is often already available elsewhere (MicrobesOnline JGIrsquos intuitive resource IMG SEED andRegPredict) Pfam may prove more difficult becausemany families in Pfam are incomplete or not appropriatelyarranged in clans Working with Pfam as we have in thepast (69) we plan to improve upon the transport proteinsection of this databaseIt is well-known that many families that include domain

duplicated transporters do not accurately reflect thedomain borders in the way hidden Markov models(HMMs) have been trained (68) Currently we do notshow lsquorepeat unitsrsquo in TCDB even though this informa-tion is presented in many of our publications We willcontinue to work with Pfam to integrate and coordinateinformation in both databases in a systematic way (69)Ideally such a process should be automated or semi-automatedAnother worthwhile goal is to establish the user base so

we can serve the needs of the scientific community moreeffectively We plan to collect more access statistics tounderstand the needs of the user community GoogleAnalytics was installed in 2011 but improvements arerequired so we can recognize which TCDB features aremost usedOne million PubMed abstracts are created every year

and 10 of the 2012 abstracts were not indexed as of May2013 Other databases that link to TCDB such asEcoGene (70) manually review literature lsquoTransporterrsquois a MESH term PubMed uses but there is a 6-monthdelay to add MESH terms and sometimes the wordlsquoTransporterrsquo is not obvious from the title TCDB usesmachine learning classifiers as well as keyword searcheswhich are continuously extracted from TCDB and used assearch terms to identify relevant articles We are consider-ing new ways for users to provide sequence data andinformation either with or without the use of email sug-gestions by email could be used as test sets to evaluate theefficiency of an automated process We are also consider-ing implementing links for reference sequence and infor-mation input from users Adding a feature allowingTCDB to be searched as a library of HMMs is alsounder consideration Current TCDB users report thatthe present system of presenting search results is satisfac-tory but we constantly strive to improve the databaseand suggestions from users are most welcomeTCDB needs an ontological hierarchical system and a

controlled vocabulary EBIrsquos ChemDB (71) has created achemical classification system and we have already set upa prototype which can be accessed from this link httpwwwtcdborgontology The substrate text needs to beextracted from the description and then correlated withChemDB One system already exists but due toinconsistencies in the description it has been difficult toimplement If we could link with gene ontology TCnumbers would be more accessible Another importantarea for improvement concerns user access to the mostrecent entries Perhaps TCDB should have lsquorecent re-leasesrsquo such as those of Pfam Since we already trackprotein histories adding this feature would not be

Table 1 Transport protein superfamilies in TCDB

1 Aerolysina

2 Amino acidPolyamineorganoCation (APC)a

3 ATP-Binding Cassette-1 (ABC1)4 ATP-Binding Cassette-2 (ABC2) with the ECF

sub-superfamily5 ATP-Binding Cassette-3 (ABC3)6 Bacterial bacteriocin (BB)a

7 Bilearseniteriboflavin transporter (BART)a

8 Cation diffusion facilitator (CDF)a

9 CationProton antiporter (CPA)10 Cecropin11 Circular bacterial bacteriocin (CBB)a

12 Claudina

13 Corynebacterial PorAPorHa

14 Defensin15 Drugmetabolite transporter (DMT)16 Endomembrane protein translocon (EMPT)a

17 Epithelial Na+ channel (ENaCP2X)18 Gap junction (GJ)a

19 General bacterial porin (GBP)20 Holin Ia

21 Holin IIa

22 Holin IIIa

23 Holin IVa

24 Holin Va

25 Holin VIa

26 Holin VIIa

27 Huwentoxin28 Ion transporter (IT)29 Lysine exporter (LysE)30 Major facilitator (MFS)a

31 Major intrinsic protein (MIP)a

32 Melittin33 Membrane attack complexperforin (MACPF)a

34 Mercury (Mer)35 Mitochondrial carrier (MC)36 Mycobacterialnocardial porin (MspA)a

37 Multidrugoligosaccharidyl-lipidpolysaccharide (MOP)Flippasea

38 P-type ATPase (P-ATPase)39 Phosphotransferase system AscGat (PTS-AG)40 Phosphotransferase system GlcFruLac (PTS-GFL)41 Resistance-nodulation-cell division (RND)42 RTX-toxin43 T4 immunity (T4 IMM)a

44 Transmembrane inner membrane-17 (Tim17)45 TransporteropsinG protein-coupled receptor (TOG)46 TRCTAMP-B (TRCTAMP)a

47 Outer membrane protein (OMP) insertase (YaeTTpsB)48 Voltage-gated ion channel (42)49 Viral envelope glycoprotein (Env)a

aNew or recently expanded superfamilies

Nucleic Acids Research 2014 Vol 42 Database issue D255

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

difficult Some basic statistics where database growth canbe followed are already available at httpwwwtcdborgsearchindexphpWe are currently undertaking the development of

standardized workflows to confirm homology resultsfrom TCDBrsquos in-house statistical methods based on struc-tural superimposition and HMMHMM comparisonsFor instance we use structural superimposition inaddition to sequence statistical analyses to identify orconfirm structural and evolutionary relationshipsbetween members of a superfamily (40) This helps to es-tablish reference points in structural space for homologydetection

CONCLUSION

In 2006 TCDB contained 3000 proteins classified into400 families but in 2013 it exceeded 10 000 proteins in750 families The availability of TCDB has allowedmajor basic research advances including answering funda-mental biological questions determining the routes ofevolution taken for the appearance of these proteins iden-tifying superfamily relationships and allowing structuralfunctional and mechanistic predictions Within practicallimits TCDB reflects the current state of our knowledgeconcerning its constituent parts

FUNDING

TCDB is supported by NIH [GM 077402-05 and GM094610-01] Funding for open access charge NIH

Conflict of interest statement None declared

REFERENCES

1 FleischmannRD AdamsMD WhiteO ClaytonRAKirknessEF KerlavageAR BultCJ TombJFDoughertyBA MerrickJM et al (1995) Whole-genomerandom sequencing and assembly of Haemophilus influenzae RdScience 269 496ndash512

2 BairochA (1994) The ENZYME data bank Nucleic Acids Res22 3626ndash3627

3 SaierMH Jr (1994) Computer-aided analyses oftransport protein sequences gleaning evidence concerning

function structure biogenesis and evolution Microbiol Rev 5871ndash93

4 SaierMH Jr TranCV and BaraboteRD (2006) TCDB theTransporter Classification Database for membrane transportprotein analyses and information Nucleic Acids Res 34D181ndashD186

5 SaierMH Jr YenMR NotoK TamangDG and ElkanC(2009) The Transporter Classification Database recent advancesNucleic Acids Res 37 D274ndashD278

6 SaierMH Jr (2000) A functional-phylogenetic classificationsystem for transmembrane solute transporters Microbiol MolBiol Rev 64 354ndash411

7 BuschW and SaierMH Jr (2004) The IUBMB-endorsedtransporter classification system Mol Biotechnol 27 253ndash262

8 BuschW and SaierMH Jr (2003) The IUBMB-endorsedtransporter classification system Methods Mol Biol 227 21ndash36

9 BuschW and SaierMH Jr (2002) The transporter classification(TC) system 2002 Crit Rev Biochem Mol Biol 37 287ndash337

10 WakabayashiST ShlykovMA KumarU ReddyVMalhotraA ClarkeEL ChenJS CastilloR De La MareRSunEI et al (2013) Deducing transport protein evolution basedon sequence structure and function In ChristineAO andAlexB (eds) Protein Families Relating Protein SequenceStructure and Function 1st edn Wiley Hoboken NJ

11 SehgalAK DasS NotoK SaierMH Jr and ElkanC (2011)Identifying relevant data for a biological database handcraftedrules versus machine learning IEEEACM Trans Comput BiolBioinform 8 851ndash857

12 ZhaiY and SaierMH Jr (2001) A web-based program (WHAT)for the simultaneous prediction of hydropathy amphipathicitysecondary structure and transmembrane topology for asingle protein sequence J Mol Microbiol Biotechnol 3501ndash502

13 ReddyVS and SaierMH Jr (2012) BioV Suitemdasha collection ofprograms for the study of transport protein evolution FEBS J279 2036ndash2046

14 TusnadyGE and SimonI (2001) The HMMTOPtransmembrane topology prediction server Bioinformatics 17849ndash850

15 JonesDT (2007) Improving the accuracy of transmembraneprotein topology prediction using evolutionary informationBioinformatics 23 538ndash544

16 ViklundH BernselA SkwarkM and ElofssonA (2008)SPOCTOPUS a combined predictor of signal peptides andmembrane protein topology Bioinformatics 24 2928ndash2929

17 DevereuxJ HaeberliP and SmithiesO (1984) A comprehensiveset of sequence analysis programs for the VAX Nucleic AcidsRes 12 387ndash395

18 XenariosI RiceDW SalwinskiL BaronMK MarcotteEMand EisenbergD (2000) DIP the database of interactingproteins Nucleic Acids Res 28 289ndash291

19 PruittKD TatusovaT BrownGR and MaglottDR (2012)NCBI Reference Sequences (RefSeq) current status new features

Table 2 List of known transporter databases

Name URL Interconnectedwith TCDB

TransportDB httpwwwmembranetransportorg YesYTPdb httpytpdbbiopark-itbe YesAramemnon httparamemnonbotanikuni-koelnde NoM trunculata TDB httpbioinformaticscaueducnMtTransporterbrowsephp YesABCdb httpswww-abcdbbiotoulfr YesABCISSE httpwww1pasteurfrrechercheunitespmtgabcdatabaseiphtml NoHuman ABC TDB httpnutrigene4tcomhumanabchtm YesSLC tables httpwwwbioparadigmsorgslcintrohtm Yes in TCDBWorm SLC db httpwwwWormSLCorg NoMP struc httpblancobiomoluciedumpstruc NoUCSF PMT httppharmacogeneticsucsfedu NoARDB httpardbcbcbumdedu No

D256 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

and genome annotation policy Nucleic Acids Res 40D130ndashD135

20 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

21 PuntaM CoggillPC EberhardtRY MistryJ TateJBoursnellC PangN ForslundK CericG ClementsJ et al(2012) The Pfam protein families database Nucleic Acids Res40 D290ndashD301

22 LatendresseM PaleyS and KarpPD (2012) Browsingmetabolic and regulatory networks with BioCyc Methods MolBiol 804 197ndash216

23 KanehisaM GotoS SatoY FurumichiM and TanabeM(2012) KEGG for integration and interpretation oflarge-scale molecular data sets Nucleic Acids Res 40D109ndashD114

24 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for researchand education Nucleic Acids Res 40 D475ndashD482

25 YoumJ and SaierMH Jr (2012) Comparative analyses oftransport proteins encoded within the genomes of Mycobacteriumtuberculosis and Mycobacterium leprae Biochim Biophys Acta1818 776ndash797

26 TamangDG RabusR BaraboteRD and SaierMH Jr (2009)Comprehensive analyses of transport proteins encoded within thegenome of lsquolsquoAromatoleum aromaticumrsquorsquo strain EbN1 J MembrBiol 229 53ndash90

27 PaparoditisP VastermarkA LeAJ FuerstJA andSaierMH Jr (2013) Bioinformatic analyses of integralmembrane transport proteins encoded within the genome of theplanctomycetes species Rhodopirellula baltica BiochimBiophys Acta 1838 193ndash215

28 UniProt Consortium (2013) Update on activities at the UniversalProtein Resource (UniProt) in 2013 Nucleic Acids Res 41D43ndashD47

29 KnoxC LawV JewisonT LiuP LyS FrolkisA PonABancoK MakC NeveuV et al (2011) DrugBank 30 acomprehensive resource for lsquoomicsrsquo research on drugs NucleicAcids Res 39 D1035ndashD1041

30 Rask-AndersenM AlmenMS and SchiothHB (2011) Trendsin the exploitation of novel drug targets Nat Rev Drug Discov10 579ndash590

31 ChenJS ReddyV ChenJH ShlykovMA ZhengWHChoJ YenMR and SaierMH Jr (2011) Phylogeneticcharacterization of transport protein superfamiliessuperiority of SuperfamilyTree programs over thosebased on multiple alignments J Mol Microbiol Biotechnol 2183ndash96

32 YenMR ChoiJ and SaierMH Jr (2009) Bioinformaticanalyses of transmembrane transport novel software for deducingprotein phylogeny topology and evolution J Mol MicrobiolBiotechnol 17 163ndash176

33 YenMR ChenJS MarquezJL SunEI and SaierMH(2010) Multidrug resistance phylogenetic characterization ofsuperfamilies of secondary carriers that include drug exportersMethods Mol Biol 637 47ndash64

34 WongFH ChenJS ReddyV DayJL ShlykovMAWakabayashiST and SaierMH Jr (2012) The amino acid-polyamine-organocation superfamily J Mol MicrobiolBiotechnol 22 105ndash113

35 ReddyVS ShlykovMA CastilloR SunEI andSaierMH Jr (2012) The major facilitator superfamily (MFS)revisited FEBS J 279 2022ndash2035

36 ShlykovMA ZhengWH ChenJS and SaierMH Jr (2012)Bioinformatic characterization of the 4-Toluene Sulfonate UptakePermease (TSUP) family of transmembrane proteins BiochimBiophys Acta 1818 703ndash717

37 ChanH BabayanV BlyuminE GandhiC HakKHarakeD KumarK LeeP LiTT LiuHY et al (2010) Thep-type ATPase superfamily J Mol Microbiol Biotechnol 195ndash104

38 RettnerRE and SaierMH Jr (2010) The autoinducer-2 exportersuperfamily J Mol Microbiol Biotechnol 18 195ndash205

39 LamVH LeeJH SilverioA ChanH GomolplitinantKMPovolotskyTL OrlovaE SunEI WelliverCH andSaierMH Jr (2011) Pathways of transport protein evolutionrecent advances Biol Chem 392 5ndash12

40 ZhengWH VastermarkA ShlykovMA ReddyV SunEIand SaierMH Jr (2013) Evolutionary relationships ofATP-Binding Cassette (ABC) uptake porters BMC Microbiol13 98

41 MatiasMG GomolplitinantKM TamangDG andSaierMH Jr (2010) Animal Ca2+ release-activated Ca2+(CRAC) channels appear to be homologous to and derivedfrom the ubiquitous cation diffusion facilitators BMC Res Notes3 158

42 WangB DukarevichM SunEI YenMR and SaierMH Jr(2009) Membrane porters of ATP-binding cassette transportsystems are polyphyletic J Membr Biol 231 1ndash10

43 YeeDC ShlykovMA VastermarkA ReddyVS AroraSSunEI and SaierMH Jr (2013) The Transporter-Opsin-Gprotein-coupled receptor (TOG) Superfamily FEBS J 2805780ndash5800

44 SaierMH Jr (1994) Computer-aided analyses of transportprotein sequences gleaning evidence concerning functionstructure biogenesis and evolution Microbiol Rev 58 71ndash93

45 DoolittleRF (1994) Convergent evolution the need to beexplicit Trends Biochem Sci 19 15ndash18

46 DayhoffMO BarkerWC and HuntLT (1983) Establishinghomologies in protein sequences Methods Enzymol 91 524ndash545

47 CoyneRS HannickL ShanmugamD HostetlerJB BramiDJoardarVS JohnsonJ RaduneD SinghI BadgerJH et al(2011) Comparative genomics of the pathogenic ciliateIchthyophthirius multifiliis its free-living relatives and a hostspecies provide insights into adoption of a parasitic lifestyle andprospects for disease control Genome Biol 12 R100

48 PodarM AndersonI MakarovaKS ElkinsJG IvanovaNWallMA LykidisA MavromatisK SunH HudsonME et al(2008) A genomic analysis of the archaeal system Ignicoccushospitalis-Nanoarchaeum equitans Genome Biol 9 R158

49 ZhaiY and SaierMH Jr (2002) A simple sensitive program fordetecting internal repeats in sets of multiply aligned homologousproteins J Mol Microbiol Biotechnol 4 375ndash377

50 ZhaiY and SaierMH Jr (2001) A web-based program for theprediction of average hydropathy average amphipathicity andaverage similarity of multiply aligned homologous proteinsJ Mol Microbiol Biotechnol 3 285ndash286

51 SilverioAL and SaierMH Jr (2011) Bioinformaticcharacterization of the trimeric intracellular cation-specific channelprotein family J Membr Biol 241 77ndash101

52 GomolplitinantKM and SaierMH Jr (2011) Evolutionof the oligopeptide transporter family J Membr Biol 24089ndash110

53 TsaiJC YenMR CastilloR LeytonDL HendersonIRand SaierMH Jr (2010) The bacterial intimins and invasins alarge and novel family of secreted proteins PLoS One 5 e14403

54 CastilloR and SaierMH (2010) Functional promiscuity ofhomologues of the bacterial ArsA ATPases Int J Microbiol2010 187373

55 PovolotskyTL OrlovaE TamangDG and SaierMH Jr(2010) Defense against cannibalism the SdpI family of bacterialimmunitysignal transduction proteins J Membr Biol 235145ndash162

56 XiaoAY WangJ and SaierMH (2010) Bacterial adaptormembrane fusion proteins and the structurally dissimilar outermembrane auxiliary proteins have exchanged central domainsin alpha-proteobacteria Int J Microbiol 2010 589391

57 TheverMD and SaierMH Jr (2009) Bioinformaticcharacterization of p-type ATPases encoded within the fullysequenced genomes of 26 eukaryotes J Membr Biol 229115ndash130

58 VastermarkA and SaierMH Jr (2013) Evolutionary relationshipbetween 5+5 and 7+7 inverted repeat folds within the aminoacid-polyamine-organocation superfamily Proteins August 28(doi 101002prot24401 epub ahead of print)

59 RenQ ChenK and PaulsenIT (2007) TransportDB acomprehensive database resource for cytoplasmic membrane

Nucleic Acids Research 2014 Vol 42 Database issue D257

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

transport systems and outer membrane channels Nucleic AcidsRes 35 D274ndashD279

60 BroheeS BarriotR MoreauY and AndreB (2010) YTPdba wiki database of yeast membrane transporters BiochimBiophys Acta 1798 1908ndash1912

61 SchwackeR SchneiderA van der GraaffE FischerKCatoniE DesimoneM FrommerWB FluggeUI andKunzeR (2003) ARAMEMNON a novel database forArabidopsis integral membrane proteins Plant Physiol 13116ndash26

62 MiaoZ LiD ZhangZ DongJ SuZ and WangT (2012)Medicago truncatula transporter database a comprehensivedatabase resource for M truncatula transporters BMC Genomics13 60

63 FichantG BasseMJ and QuentinY (2006) ABCdb an onlineresource for ABC transporter repertories from sequenced archaealand bacterial genomes FEMS Microbiol Lett 256 333ndash339

64 BouigeP LaurentD PiloyanL and DassaE (2002)Phylogenetic and functional classification of ATP-binding cassette(ABC) systems Curr Protein Pept Sci 3 541ndash559

65 VasiliouV VasiliouK and NebertDW (2009) Human ATP-binding cassette (ABC) transporter family Hum Genomics 3281ndash290

66 HedigerMA ClemenconB BurrierRE and BrufordEA(2013) The ABCs of membrane transporters in healthand disease (SLC series) introduction Mol Aspects Med 3495ndash107

67 WhiteSH (2009) Biophysical dissection of membrane proteinsNature 459 344ndash346

68 VastermarkA AlmenMS SimmenMW FredrikssonR andSchiothHB (2011) Functional specialization in nucleotide sugartransporters occurred through differentiation of the gene clusterEamA (DUF6) before the radiation of Viridiplantae BMC EvolBiol 11 123

69 ReddyBL and SaierMH Jr (2013) Topological andphylogenetic analyses of bacterial holin families and superfamiliesBiochim Biophys Acta 1828 2654ndash2671

70 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res41 D613ndashD624

71 ChenJ SwamidassSJ DouY BruandJ and BaldiP (2005)ChemDB a public database of small molecules and relatedchemoinformatics resources Bioinformatics 21 4133ndash4139

D258 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

Page 5: The Transporter Classification Database - Nucleic Acids Research

(xi) The ARDB contains antibiotic resistance genesproviding a list of four types of multidrug resistance trans-porter types ABC (TC 3A1) MFS (TC 2A1) RND(TC 2A6) and SMR (TC 2A71)

HARMONIZATION AND FUTURE GOALS

The most important goals we have identified for futuredevelopment of TCDB include (i) the creation of anontology for the TCDB database (ii) improving ourintegration with Pfam and (iii) streamlining the use ofphylogeny and synteny information to provide functionalpredictions Some of the new functions will be

implemented as links and some as software Syntenyshould probably be implemented as links because the in-formation is often already available elsewhere (MicrobesOnline JGIrsquos intuitive resource IMG SEED andRegPredict) Pfam may prove more difficult becausemany families in Pfam are incomplete or not appropriatelyarranged in clans Working with Pfam as we have in thepast (69) we plan to improve upon the transport proteinsection of this databaseIt is well-known that many families that include domain

duplicated transporters do not accurately reflect thedomain borders in the way hidden Markov models(HMMs) have been trained (68) Currently we do notshow lsquorepeat unitsrsquo in TCDB even though this informa-tion is presented in many of our publications We willcontinue to work with Pfam to integrate and coordinateinformation in both databases in a systematic way (69)Ideally such a process should be automated or semi-automatedAnother worthwhile goal is to establish the user base so

we can serve the needs of the scientific community moreeffectively We plan to collect more access statistics tounderstand the needs of the user community GoogleAnalytics was installed in 2011 but improvements arerequired so we can recognize which TCDB features aremost usedOne million PubMed abstracts are created every year

and 10 of the 2012 abstracts were not indexed as of May2013 Other databases that link to TCDB such asEcoGene (70) manually review literature lsquoTransporterrsquois a MESH term PubMed uses but there is a 6-monthdelay to add MESH terms and sometimes the wordlsquoTransporterrsquo is not obvious from the title TCDB usesmachine learning classifiers as well as keyword searcheswhich are continuously extracted from TCDB and used assearch terms to identify relevant articles We are consider-ing new ways for users to provide sequence data andinformation either with or without the use of email sug-gestions by email could be used as test sets to evaluate theefficiency of an automated process We are also consider-ing implementing links for reference sequence and infor-mation input from users Adding a feature allowingTCDB to be searched as a library of HMMs is alsounder consideration Current TCDB users report thatthe present system of presenting search results is satisfac-tory but we constantly strive to improve the databaseand suggestions from users are most welcomeTCDB needs an ontological hierarchical system and a

controlled vocabulary EBIrsquos ChemDB (71) has created achemical classification system and we have already set upa prototype which can be accessed from this link httpwwwtcdborgontology The substrate text needs to beextracted from the description and then correlated withChemDB One system already exists but due toinconsistencies in the description it has been difficult toimplement If we could link with gene ontology TCnumbers would be more accessible Another importantarea for improvement concerns user access to the mostrecent entries Perhaps TCDB should have lsquorecent re-leasesrsquo such as those of Pfam Since we already trackprotein histories adding this feature would not be

Table 1 Transport protein superfamilies in TCDB

1 Aerolysina

2 Amino acidPolyamineorganoCation (APC)a

3 ATP-Binding Cassette-1 (ABC1)4 ATP-Binding Cassette-2 (ABC2) with the ECF

sub-superfamily5 ATP-Binding Cassette-3 (ABC3)6 Bacterial bacteriocin (BB)a

7 Bilearseniteriboflavin transporter (BART)a

8 Cation diffusion facilitator (CDF)a

9 CationProton antiporter (CPA)10 Cecropin11 Circular bacterial bacteriocin (CBB)a

12 Claudina

13 Corynebacterial PorAPorHa

14 Defensin15 Drugmetabolite transporter (DMT)16 Endomembrane protein translocon (EMPT)a

17 Epithelial Na+ channel (ENaCP2X)18 Gap junction (GJ)a

19 General bacterial porin (GBP)20 Holin Ia

21 Holin IIa

22 Holin IIIa

23 Holin IVa

24 Holin Va

25 Holin VIa

26 Holin VIIa

27 Huwentoxin28 Ion transporter (IT)29 Lysine exporter (LysE)30 Major facilitator (MFS)a

31 Major intrinsic protein (MIP)a

32 Melittin33 Membrane attack complexperforin (MACPF)a

34 Mercury (Mer)35 Mitochondrial carrier (MC)36 Mycobacterialnocardial porin (MspA)a

37 Multidrugoligosaccharidyl-lipidpolysaccharide (MOP)Flippasea

38 P-type ATPase (P-ATPase)39 Phosphotransferase system AscGat (PTS-AG)40 Phosphotransferase system GlcFruLac (PTS-GFL)41 Resistance-nodulation-cell division (RND)42 RTX-toxin43 T4 immunity (T4 IMM)a

44 Transmembrane inner membrane-17 (Tim17)45 TransporteropsinG protein-coupled receptor (TOG)46 TRCTAMP-B (TRCTAMP)a

47 Outer membrane protein (OMP) insertase (YaeTTpsB)48 Voltage-gated ion channel (42)49 Viral envelope glycoprotein (Env)a

aNew or recently expanded superfamilies

Nucleic Acids Research 2014 Vol 42 Database issue D255

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

difficult Some basic statistics where database growth canbe followed are already available at httpwwwtcdborgsearchindexphpWe are currently undertaking the development of

standardized workflows to confirm homology resultsfrom TCDBrsquos in-house statistical methods based on struc-tural superimposition and HMMHMM comparisonsFor instance we use structural superimposition inaddition to sequence statistical analyses to identify orconfirm structural and evolutionary relationshipsbetween members of a superfamily (40) This helps to es-tablish reference points in structural space for homologydetection

CONCLUSION

In 2006 TCDB contained 3000 proteins classified into400 families but in 2013 it exceeded 10 000 proteins in750 families The availability of TCDB has allowedmajor basic research advances including answering funda-mental biological questions determining the routes ofevolution taken for the appearance of these proteins iden-tifying superfamily relationships and allowing structuralfunctional and mechanistic predictions Within practicallimits TCDB reflects the current state of our knowledgeconcerning its constituent parts

FUNDING

TCDB is supported by NIH [GM 077402-05 and GM094610-01] Funding for open access charge NIH

Conflict of interest statement None declared

REFERENCES

1 FleischmannRD AdamsMD WhiteO ClaytonRAKirknessEF KerlavageAR BultCJ TombJFDoughertyBA MerrickJM et al (1995) Whole-genomerandom sequencing and assembly of Haemophilus influenzae RdScience 269 496ndash512

2 BairochA (1994) The ENZYME data bank Nucleic Acids Res22 3626ndash3627

3 SaierMH Jr (1994) Computer-aided analyses oftransport protein sequences gleaning evidence concerning

function structure biogenesis and evolution Microbiol Rev 5871ndash93

4 SaierMH Jr TranCV and BaraboteRD (2006) TCDB theTransporter Classification Database for membrane transportprotein analyses and information Nucleic Acids Res 34D181ndashD186

5 SaierMH Jr YenMR NotoK TamangDG and ElkanC(2009) The Transporter Classification Database recent advancesNucleic Acids Res 37 D274ndashD278

6 SaierMH Jr (2000) A functional-phylogenetic classificationsystem for transmembrane solute transporters Microbiol MolBiol Rev 64 354ndash411

7 BuschW and SaierMH Jr (2004) The IUBMB-endorsedtransporter classification system Mol Biotechnol 27 253ndash262

8 BuschW and SaierMH Jr (2003) The IUBMB-endorsedtransporter classification system Methods Mol Biol 227 21ndash36

9 BuschW and SaierMH Jr (2002) The transporter classification(TC) system 2002 Crit Rev Biochem Mol Biol 37 287ndash337

10 WakabayashiST ShlykovMA KumarU ReddyVMalhotraA ClarkeEL ChenJS CastilloR De La MareRSunEI et al (2013) Deducing transport protein evolution basedon sequence structure and function In ChristineAO andAlexB (eds) Protein Families Relating Protein SequenceStructure and Function 1st edn Wiley Hoboken NJ

11 SehgalAK DasS NotoK SaierMH Jr and ElkanC (2011)Identifying relevant data for a biological database handcraftedrules versus machine learning IEEEACM Trans Comput BiolBioinform 8 851ndash857

12 ZhaiY and SaierMH Jr (2001) A web-based program (WHAT)for the simultaneous prediction of hydropathy amphipathicitysecondary structure and transmembrane topology for asingle protein sequence J Mol Microbiol Biotechnol 3501ndash502

13 ReddyVS and SaierMH Jr (2012) BioV Suitemdasha collection ofprograms for the study of transport protein evolution FEBS J279 2036ndash2046

14 TusnadyGE and SimonI (2001) The HMMTOPtransmembrane topology prediction server Bioinformatics 17849ndash850

15 JonesDT (2007) Improving the accuracy of transmembraneprotein topology prediction using evolutionary informationBioinformatics 23 538ndash544

16 ViklundH BernselA SkwarkM and ElofssonA (2008)SPOCTOPUS a combined predictor of signal peptides andmembrane protein topology Bioinformatics 24 2928ndash2929

17 DevereuxJ HaeberliP and SmithiesO (1984) A comprehensiveset of sequence analysis programs for the VAX Nucleic AcidsRes 12 387ndash395

18 XenariosI RiceDW SalwinskiL BaronMK MarcotteEMand EisenbergD (2000) DIP the database of interactingproteins Nucleic Acids Res 28 289ndash291

19 PruittKD TatusovaT BrownGR and MaglottDR (2012)NCBI Reference Sequences (RefSeq) current status new features

Table 2 List of known transporter databases

Name URL Interconnectedwith TCDB

TransportDB httpwwwmembranetransportorg YesYTPdb httpytpdbbiopark-itbe YesAramemnon httparamemnonbotanikuni-koelnde NoM trunculata TDB httpbioinformaticscaueducnMtTransporterbrowsephp YesABCdb httpswww-abcdbbiotoulfr YesABCISSE httpwww1pasteurfrrechercheunitespmtgabcdatabaseiphtml NoHuman ABC TDB httpnutrigene4tcomhumanabchtm YesSLC tables httpwwwbioparadigmsorgslcintrohtm Yes in TCDBWorm SLC db httpwwwWormSLCorg NoMP struc httpblancobiomoluciedumpstruc NoUCSF PMT httppharmacogeneticsucsfedu NoARDB httpardbcbcbumdedu No

D256 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

and genome annotation policy Nucleic Acids Res 40D130ndashD135

20 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

21 PuntaM CoggillPC EberhardtRY MistryJ TateJBoursnellC PangN ForslundK CericG ClementsJ et al(2012) The Pfam protein families database Nucleic Acids Res40 D290ndashD301

22 LatendresseM PaleyS and KarpPD (2012) Browsingmetabolic and regulatory networks with BioCyc Methods MolBiol 804 197ndash216

23 KanehisaM GotoS SatoY FurumichiM and TanabeM(2012) KEGG for integration and interpretation oflarge-scale molecular data sets Nucleic Acids Res 40D109ndashD114

24 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for researchand education Nucleic Acids Res 40 D475ndashD482

25 YoumJ and SaierMH Jr (2012) Comparative analyses oftransport proteins encoded within the genomes of Mycobacteriumtuberculosis and Mycobacterium leprae Biochim Biophys Acta1818 776ndash797

26 TamangDG RabusR BaraboteRD and SaierMH Jr (2009)Comprehensive analyses of transport proteins encoded within thegenome of lsquolsquoAromatoleum aromaticumrsquorsquo strain EbN1 J MembrBiol 229 53ndash90

27 PaparoditisP VastermarkA LeAJ FuerstJA andSaierMH Jr (2013) Bioinformatic analyses of integralmembrane transport proteins encoded within the genome of theplanctomycetes species Rhodopirellula baltica BiochimBiophys Acta 1838 193ndash215

28 UniProt Consortium (2013) Update on activities at the UniversalProtein Resource (UniProt) in 2013 Nucleic Acids Res 41D43ndashD47

29 KnoxC LawV JewisonT LiuP LyS FrolkisA PonABancoK MakC NeveuV et al (2011) DrugBank 30 acomprehensive resource for lsquoomicsrsquo research on drugs NucleicAcids Res 39 D1035ndashD1041

30 Rask-AndersenM AlmenMS and SchiothHB (2011) Trendsin the exploitation of novel drug targets Nat Rev Drug Discov10 579ndash590

31 ChenJS ReddyV ChenJH ShlykovMA ZhengWHChoJ YenMR and SaierMH Jr (2011) Phylogeneticcharacterization of transport protein superfamiliessuperiority of SuperfamilyTree programs over thosebased on multiple alignments J Mol Microbiol Biotechnol 2183ndash96

32 YenMR ChoiJ and SaierMH Jr (2009) Bioinformaticanalyses of transmembrane transport novel software for deducingprotein phylogeny topology and evolution J Mol MicrobiolBiotechnol 17 163ndash176

33 YenMR ChenJS MarquezJL SunEI and SaierMH(2010) Multidrug resistance phylogenetic characterization ofsuperfamilies of secondary carriers that include drug exportersMethods Mol Biol 637 47ndash64

34 WongFH ChenJS ReddyV DayJL ShlykovMAWakabayashiST and SaierMH Jr (2012) The amino acid-polyamine-organocation superfamily J Mol MicrobiolBiotechnol 22 105ndash113

35 ReddyVS ShlykovMA CastilloR SunEI andSaierMH Jr (2012) The major facilitator superfamily (MFS)revisited FEBS J 279 2022ndash2035

36 ShlykovMA ZhengWH ChenJS and SaierMH Jr (2012)Bioinformatic characterization of the 4-Toluene Sulfonate UptakePermease (TSUP) family of transmembrane proteins BiochimBiophys Acta 1818 703ndash717

37 ChanH BabayanV BlyuminE GandhiC HakKHarakeD KumarK LeeP LiTT LiuHY et al (2010) Thep-type ATPase superfamily J Mol Microbiol Biotechnol 195ndash104

38 RettnerRE and SaierMH Jr (2010) The autoinducer-2 exportersuperfamily J Mol Microbiol Biotechnol 18 195ndash205

39 LamVH LeeJH SilverioA ChanH GomolplitinantKMPovolotskyTL OrlovaE SunEI WelliverCH andSaierMH Jr (2011) Pathways of transport protein evolutionrecent advances Biol Chem 392 5ndash12

40 ZhengWH VastermarkA ShlykovMA ReddyV SunEIand SaierMH Jr (2013) Evolutionary relationships ofATP-Binding Cassette (ABC) uptake porters BMC Microbiol13 98

41 MatiasMG GomolplitinantKM TamangDG andSaierMH Jr (2010) Animal Ca2+ release-activated Ca2+(CRAC) channels appear to be homologous to and derivedfrom the ubiquitous cation diffusion facilitators BMC Res Notes3 158

42 WangB DukarevichM SunEI YenMR and SaierMH Jr(2009) Membrane porters of ATP-binding cassette transportsystems are polyphyletic J Membr Biol 231 1ndash10

43 YeeDC ShlykovMA VastermarkA ReddyVS AroraSSunEI and SaierMH Jr (2013) The Transporter-Opsin-Gprotein-coupled receptor (TOG) Superfamily FEBS J 2805780ndash5800

44 SaierMH Jr (1994) Computer-aided analyses of transportprotein sequences gleaning evidence concerning functionstructure biogenesis and evolution Microbiol Rev 58 71ndash93

45 DoolittleRF (1994) Convergent evolution the need to beexplicit Trends Biochem Sci 19 15ndash18

46 DayhoffMO BarkerWC and HuntLT (1983) Establishinghomologies in protein sequences Methods Enzymol 91 524ndash545

47 CoyneRS HannickL ShanmugamD HostetlerJB BramiDJoardarVS JohnsonJ RaduneD SinghI BadgerJH et al(2011) Comparative genomics of the pathogenic ciliateIchthyophthirius multifiliis its free-living relatives and a hostspecies provide insights into adoption of a parasitic lifestyle andprospects for disease control Genome Biol 12 R100

48 PodarM AndersonI MakarovaKS ElkinsJG IvanovaNWallMA LykidisA MavromatisK SunH HudsonME et al(2008) A genomic analysis of the archaeal system Ignicoccushospitalis-Nanoarchaeum equitans Genome Biol 9 R158

49 ZhaiY and SaierMH Jr (2002) A simple sensitive program fordetecting internal repeats in sets of multiply aligned homologousproteins J Mol Microbiol Biotechnol 4 375ndash377

50 ZhaiY and SaierMH Jr (2001) A web-based program for theprediction of average hydropathy average amphipathicity andaverage similarity of multiply aligned homologous proteinsJ Mol Microbiol Biotechnol 3 285ndash286

51 SilverioAL and SaierMH Jr (2011) Bioinformaticcharacterization of the trimeric intracellular cation-specific channelprotein family J Membr Biol 241 77ndash101

52 GomolplitinantKM and SaierMH Jr (2011) Evolutionof the oligopeptide transporter family J Membr Biol 24089ndash110

53 TsaiJC YenMR CastilloR LeytonDL HendersonIRand SaierMH Jr (2010) The bacterial intimins and invasins alarge and novel family of secreted proteins PLoS One 5 e14403

54 CastilloR and SaierMH (2010) Functional promiscuity ofhomologues of the bacterial ArsA ATPases Int J Microbiol2010 187373

55 PovolotskyTL OrlovaE TamangDG and SaierMH Jr(2010) Defense against cannibalism the SdpI family of bacterialimmunitysignal transduction proteins J Membr Biol 235145ndash162

56 XiaoAY WangJ and SaierMH (2010) Bacterial adaptormembrane fusion proteins and the structurally dissimilar outermembrane auxiliary proteins have exchanged central domainsin alpha-proteobacteria Int J Microbiol 2010 589391

57 TheverMD and SaierMH Jr (2009) Bioinformaticcharacterization of p-type ATPases encoded within the fullysequenced genomes of 26 eukaryotes J Membr Biol 229115ndash130

58 VastermarkA and SaierMH Jr (2013) Evolutionary relationshipbetween 5+5 and 7+7 inverted repeat folds within the aminoacid-polyamine-organocation superfamily Proteins August 28(doi 101002prot24401 epub ahead of print)

59 RenQ ChenK and PaulsenIT (2007) TransportDB acomprehensive database resource for cytoplasmic membrane

Nucleic Acids Research 2014 Vol 42 Database issue D257

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

transport systems and outer membrane channels Nucleic AcidsRes 35 D274ndashD279

60 BroheeS BarriotR MoreauY and AndreB (2010) YTPdba wiki database of yeast membrane transporters BiochimBiophys Acta 1798 1908ndash1912

61 SchwackeR SchneiderA van der GraaffE FischerKCatoniE DesimoneM FrommerWB FluggeUI andKunzeR (2003) ARAMEMNON a novel database forArabidopsis integral membrane proteins Plant Physiol 13116ndash26

62 MiaoZ LiD ZhangZ DongJ SuZ and WangT (2012)Medicago truncatula transporter database a comprehensivedatabase resource for M truncatula transporters BMC Genomics13 60

63 FichantG BasseMJ and QuentinY (2006) ABCdb an onlineresource for ABC transporter repertories from sequenced archaealand bacterial genomes FEMS Microbiol Lett 256 333ndash339

64 BouigeP LaurentD PiloyanL and DassaE (2002)Phylogenetic and functional classification of ATP-binding cassette(ABC) systems Curr Protein Pept Sci 3 541ndash559

65 VasiliouV VasiliouK and NebertDW (2009) Human ATP-binding cassette (ABC) transporter family Hum Genomics 3281ndash290

66 HedigerMA ClemenconB BurrierRE and BrufordEA(2013) The ABCs of membrane transporters in healthand disease (SLC series) introduction Mol Aspects Med 3495ndash107

67 WhiteSH (2009) Biophysical dissection of membrane proteinsNature 459 344ndash346

68 VastermarkA AlmenMS SimmenMW FredrikssonR andSchiothHB (2011) Functional specialization in nucleotide sugartransporters occurred through differentiation of the gene clusterEamA (DUF6) before the radiation of Viridiplantae BMC EvolBiol 11 123

69 ReddyBL and SaierMH Jr (2013) Topological andphylogenetic analyses of bacterial holin families and superfamiliesBiochim Biophys Acta 1828 2654ndash2671

70 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res41 D613ndashD624

71 ChenJ SwamidassSJ DouY BruandJ and BaldiP (2005)ChemDB a public database of small molecules and relatedchemoinformatics resources Bioinformatics 21 4133ndash4139

D258 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

Page 6: The Transporter Classification Database - Nucleic Acids Research

difficult Some basic statistics where database growth canbe followed are already available at httpwwwtcdborgsearchindexphpWe are currently undertaking the development of

standardized workflows to confirm homology resultsfrom TCDBrsquos in-house statistical methods based on struc-tural superimposition and HMMHMM comparisonsFor instance we use structural superimposition inaddition to sequence statistical analyses to identify orconfirm structural and evolutionary relationshipsbetween members of a superfamily (40) This helps to es-tablish reference points in structural space for homologydetection

CONCLUSION

In 2006 TCDB contained 3000 proteins classified into400 families but in 2013 it exceeded 10 000 proteins in750 families The availability of TCDB has allowedmajor basic research advances including answering funda-mental biological questions determining the routes ofevolution taken for the appearance of these proteins iden-tifying superfamily relationships and allowing structuralfunctional and mechanistic predictions Within practicallimits TCDB reflects the current state of our knowledgeconcerning its constituent parts

FUNDING

TCDB is supported by NIH [GM 077402-05 and GM094610-01] Funding for open access charge NIH

Conflict of interest statement None declared

REFERENCES

1 FleischmannRD AdamsMD WhiteO ClaytonRAKirknessEF KerlavageAR BultCJ TombJFDoughertyBA MerrickJM et al (1995) Whole-genomerandom sequencing and assembly of Haemophilus influenzae RdScience 269 496ndash512

2 BairochA (1994) The ENZYME data bank Nucleic Acids Res22 3626ndash3627

3 SaierMH Jr (1994) Computer-aided analyses oftransport protein sequences gleaning evidence concerning

function structure biogenesis and evolution Microbiol Rev 5871ndash93

4 SaierMH Jr TranCV and BaraboteRD (2006) TCDB theTransporter Classification Database for membrane transportprotein analyses and information Nucleic Acids Res 34D181ndashD186

5 SaierMH Jr YenMR NotoK TamangDG and ElkanC(2009) The Transporter Classification Database recent advancesNucleic Acids Res 37 D274ndashD278

6 SaierMH Jr (2000) A functional-phylogenetic classificationsystem for transmembrane solute transporters Microbiol MolBiol Rev 64 354ndash411

7 BuschW and SaierMH Jr (2004) The IUBMB-endorsedtransporter classification system Mol Biotechnol 27 253ndash262

8 BuschW and SaierMH Jr (2003) The IUBMB-endorsedtransporter classification system Methods Mol Biol 227 21ndash36

9 BuschW and SaierMH Jr (2002) The transporter classification(TC) system 2002 Crit Rev Biochem Mol Biol 37 287ndash337

10 WakabayashiST ShlykovMA KumarU ReddyVMalhotraA ClarkeEL ChenJS CastilloR De La MareRSunEI et al (2013) Deducing transport protein evolution basedon sequence structure and function In ChristineAO andAlexB (eds) Protein Families Relating Protein SequenceStructure and Function 1st edn Wiley Hoboken NJ

11 SehgalAK DasS NotoK SaierMH Jr and ElkanC (2011)Identifying relevant data for a biological database handcraftedrules versus machine learning IEEEACM Trans Comput BiolBioinform 8 851ndash857

12 ZhaiY and SaierMH Jr (2001) A web-based program (WHAT)for the simultaneous prediction of hydropathy amphipathicitysecondary structure and transmembrane topology for asingle protein sequence J Mol Microbiol Biotechnol 3501ndash502

13 ReddyVS and SaierMH Jr (2012) BioV Suitemdasha collection ofprograms for the study of transport protein evolution FEBS J279 2036ndash2046

14 TusnadyGE and SimonI (2001) The HMMTOPtransmembrane topology prediction server Bioinformatics 17849ndash850

15 JonesDT (2007) Improving the accuracy of transmembraneprotein topology prediction using evolutionary informationBioinformatics 23 538ndash544

16 ViklundH BernselA SkwarkM and ElofssonA (2008)SPOCTOPUS a combined predictor of signal peptides andmembrane protein topology Bioinformatics 24 2928ndash2929

17 DevereuxJ HaeberliP and SmithiesO (1984) A comprehensiveset of sequence analysis programs for the VAX Nucleic AcidsRes 12 387ndash395

18 XenariosI RiceDW SalwinskiL BaronMK MarcotteEMand EisenbergD (2000) DIP the database of interactingproteins Nucleic Acids Res 28 289ndash291

19 PruittKD TatusovaT BrownGR and MaglottDR (2012)NCBI Reference Sequences (RefSeq) current status new features

Table 2 List of known transporter databases

Name URL Interconnectedwith TCDB

TransportDB httpwwwmembranetransportorg YesYTPdb httpytpdbbiopark-itbe YesAramemnon httparamemnonbotanikuni-koelnde NoM trunculata TDB httpbioinformaticscaueducnMtTransporterbrowsephp YesABCdb httpswww-abcdbbiotoulfr YesABCISSE httpwww1pasteurfrrechercheunitespmtgabcdatabaseiphtml NoHuman ABC TDB httpnutrigene4tcomhumanabchtm YesSLC tables httpwwwbioparadigmsorgslcintrohtm Yes in TCDBWorm SLC db httpwwwWormSLCorg NoMP struc httpblancobiomoluciedumpstruc NoUCSF PMT httppharmacogeneticsucsfedu NoARDB httpardbcbcbumdedu No

D256 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

and genome annotation policy Nucleic Acids Res 40D130ndashD135

20 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

21 PuntaM CoggillPC EberhardtRY MistryJ TateJBoursnellC PangN ForslundK CericG ClementsJ et al(2012) The Pfam protein families database Nucleic Acids Res40 D290ndashD301

22 LatendresseM PaleyS and KarpPD (2012) Browsingmetabolic and regulatory networks with BioCyc Methods MolBiol 804 197ndash216

23 KanehisaM GotoS SatoY FurumichiM and TanabeM(2012) KEGG for integration and interpretation oflarge-scale molecular data sets Nucleic Acids Res 40D109ndashD114

24 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for researchand education Nucleic Acids Res 40 D475ndashD482

25 YoumJ and SaierMH Jr (2012) Comparative analyses oftransport proteins encoded within the genomes of Mycobacteriumtuberculosis and Mycobacterium leprae Biochim Biophys Acta1818 776ndash797

26 TamangDG RabusR BaraboteRD and SaierMH Jr (2009)Comprehensive analyses of transport proteins encoded within thegenome of lsquolsquoAromatoleum aromaticumrsquorsquo strain EbN1 J MembrBiol 229 53ndash90

27 PaparoditisP VastermarkA LeAJ FuerstJA andSaierMH Jr (2013) Bioinformatic analyses of integralmembrane transport proteins encoded within the genome of theplanctomycetes species Rhodopirellula baltica BiochimBiophys Acta 1838 193ndash215

28 UniProt Consortium (2013) Update on activities at the UniversalProtein Resource (UniProt) in 2013 Nucleic Acids Res 41D43ndashD47

29 KnoxC LawV JewisonT LiuP LyS FrolkisA PonABancoK MakC NeveuV et al (2011) DrugBank 30 acomprehensive resource for lsquoomicsrsquo research on drugs NucleicAcids Res 39 D1035ndashD1041

30 Rask-AndersenM AlmenMS and SchiothHB (2011) Trendsin the exploitation of novel drug targets Nat Rev Drug Discov10 579ndash590

31 ChenJS ReddyV ChenJH ShlykovMA ZhengWHChoJ YenMR and SaierMH Jr (2011) Phylogeneticcharacterization of transport protein superfamiliessuperiority of SuperfamilyTree programs over thosebased on multiple alignments J Mol Microbiol Biotechnol 2183ndash96

32 YenMR ChoiJ and SaierMH Jr (2009) Bioinformaticanalyses of transmembrane transport novel software for deducingprotein phylogeny topology and evolution J Mol MicrobiolBiotechnol 17 163ndash176

33 YenMR ChenJS MarquezJL SunEI and SaierMH(2010) Multidrug resistance phylogenetic characterization ofsuperfamilies of secondary carriers that include drug exportersMethods Mol Biol 637 47ndash64

34 WongFH ChenJS ReddyV DayJL ShlykovMAWakabayashiST and SaierMH Jr (2012) The amino acid-polyamine-organocation superfamily J Mol MicrobiolBiotechnol 22 105ndash113

35 ReddyVS ShlykovMA CastilloR SunEI andSaierMH Jr (2012) The major facilitator superfamily (MFS)revisited FEBS J 279 2022ndash2035

36 ShlykovMA ZhengWH ChenJS and SaierMH Jr (2012)Bioinformatic characterization of the 4-Toluene Sulfonate UptakePermease (TSUP) family of transmembrane proteins BiochimBiophys Acta 1818 703ndash717

37 ChanH BabayanV BlyuminE GandhiC HakKHarakeD KumarK LeeP LiTT LiuHY et al (2010) Thep-type ATPase superfamily J Mol Microbiol Biotechnol 195ndash104

38 RettnerRE and SaierMH Jr (2010) The autoinducer-2 exportersuperfamily J Mol Microbiol Biotechnol 18 195ndash205

39 LamVH LeeJH SilverioA ChanH GomolplitinantKMPovolotskyTL OrlovaE SunEI WelliverCH andSaierMH Jr (2011) Pathways of transport protein evolutionrecent advances Biol Chem 392 5ndash12

40 ZhengWH VastermarkA ShlykovMA ReddyV SunEIand SaierMH Jr (2013) Evolutionary relationships ofATP-Binding Cassette (ABC) uptake porters BMC Microbiol13 98

41 MatiasMG GomolplitinantKM TamangDG andSaierMH Jr (2010) Animal Ca2+ release-activated Ca2+(CRAC) channels appear to be homologous to and derivedfrom the ubiquitous cation diffusion facilitators BMC Res Notes3 158

42 WangB DukarevichM SunEI YenMR and SaierMH Jr(2009) Membrane porters of ATP-binding cassette transportsystems are polyphyletic J Membr Biol 231 1ndash10

43 YeeDC ShlykovMA VastermarkA ReddyVS AroraSSunEI and SaierMH Jr (2013) The Transporter-Opsin-Gprotein-coupled receptor (TOG) Superfamily FEBS J 2805780ndash5800

44 SaierMH Jr (1994) Computer-aided analyses of transportprotein sequences gleaning evidence concerning functionstructure biogenesis and evolution Microbiol Rev 58 71ndash93

45 DoolittleRF (1994) Convergent evolution the need to beexplicit Trends Biochem Sci 19 15ndash18

46 DayhoffMO BarkerWC and HuntLT (1983) Establishinghomologies in protein sequences Methods Enzymol 91 524ndash545

47 CoyneRS HannickL ShanmugamD HostetlerJB BramiDJoardarVS JohnsonJ RaduneD SinghI BadgerJH et al(2011) Comparative genomics of the pathogenic ciliateIchthyophthirius multifiliis its free-living relatives and a hostspecies provide insights into adoption of a parasitic lifestyle andprospects for disease control Genome Biol 12 R100

48 PodarM AndersonI MakarovaKS ElkinsJG IvanovaNWallMA LykidisA MavromatisK SunH HudsonME et al(2008) A genomic analysis of the archaeal system Ignicoccushospitalis-Nanoarchaeum equitans Genome Biol 9 R158

49 ZhaiY and SaierMH Jr (2002) A simple sensitive program fordetecting internal repeats in sets of multiply aligned homologousproteins J Mol Microbiol Biotechnol 4 375ndash377

50 ZhaiY and SaierMH Jr (2001) A web-based program for theprediction of average hydropathy average amphipathicity andaverage similarity of multiply aligned homologous proteinsJ Mol Microbiol Biotechnol 3 285ndash286

51 SilverioAL and SaierMH Jr (2011) Bioinformaticcharacterization of the trimeric intracellular cation-specific channelprotein family J Membr Biol 241 77ndash101

52 GomolplitinantKM and SaierMH Jr (2011) Evolutionof the oligopeptide transporter family J Membr Biol 24089ndash110

53 TsaiJC YenMR CastilloR LeytonDL HendersonIRand SaierMH Jr (2010) The bacterial intimins and invasins alarge and novel family of secreted proteins PLoS One 5 e14403

54 CastilloR and SaierMH (2010) Functional promiscuity ofhomologues of the bacterial ArsA ATPases Int J Microbiol2010 187373

55 PovolotskyTL OrlovaE TamangDG and SaierMH Jr(2010) Defense against cannibalism the SdpI family of bacterialimmunitysignal transduction proteins J Membr Biol 235145ndash162

56 XiaoAY WangJ and SaierMH (2010) Bacterial adaptormembrane fusion proteins and the structurally dissimilar outermembrane auxiliary proteins have exchanged central domainsin alpha-proteobacteria Int J Microbiol 2010 589391

57 TheverMD and SaierMH Jr (2009) Bioinformaticcharacterization of p-type ATPases encoded within the fullysequenced genomes of 26 eukaryotes J Membr Biol 229115ndash130

58 VastermarkA and SaierMH Jr (2013) Evolutionary relationshipbetween 5+5 and 7+7 inverted repeat folds within the aminoacid-polyamine-organocation superfamily Proteins August 28(doi 101002prot24401 epub ahead of print)

59 RenQ ChenK and PaulsenIT (2007) TransportDB acomprehensive database resource for cytoplasmic membrane

Nucleic Acids Research 2014 Vol 42 Database issue D257

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

transport systems and outer membrane channels Nucleic AcidsRes 35 D274ndashD279

60 BroheeS BarriotR MoreauY and AndreB (2010) YTPdba wiki database of yeast membrane transporters BiochimBiophys Acta 1798 1908ndash1912

61 SchwackeR SchneiderA van der GraaffE FischerKCatoniE DesimoneM FrommerWB FluggeUI andKunzeR (2003) ARAMEMNON a novel database forArabidopsis integral membrane proteins Plant Physiol 13116ndash26

62 MiaoZ LiD ZhangZ DongJ SuZ and WangT (2012)Medicago truncatula transporter database a comprehensivedatabase resource for M truncatula transporters BMC Genomics13 60

63 FichantG BasseMJ and QuentinY (2006) ABCdb an onlineresource for ABC transporter repertories from sequenced archaealand bacterial genomes FEMS Microbiol Lett 256 333ndash339

64 BouigeP LaurentD PiloyanL and DassaE (2002)Phylogenetic and functional classification of ATP-binding cassette(ABC) systems Curr Protein Pept Sci 3 541ndash559

65 VasiliouV VasiliouK and NebertDW (2009) Human ATP-binding cassette (ABC) transporter family Hum Genomics 3281ndash290

66 HedigerMA ClemenconB BurrierRE and BrufordEA(2013) The ABCs of membrane transporters in healthand disease (SLC series) introduction Mol Aspects Med 3495ndash107

67 WhiteSH (2009) Biophysical dissection of membrane proteinsNature 459 344ndash346

68 VastermarkA AlmenMS SimmenMW FredrikssonR andSchiothHB (2011) Functional specialization in nucleotide sugartransporters occurred through differentiation of the gene clusterEamA (DUF6) before the radiation of Viridiplantae BMC EvolBiol 11 123

69 ReddyBL and SaierMH Jr (2013) Topological andphylogenetic analyses of bacterial holin families and superfamiliesBiochim Biophys Acta 1828 2654ndash2671

70 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res41 D613ndashD624

71 ChenJ SwamidassSJ DouY BruandJ and BaldiP (2005)ChemDB a public database of small molecules and relatedchemoinformatics resources Bioinformatics 21 4133ndash4139

D258 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

Page 7: The Transporter Classification Database - Nucleic Acids Research

and genome annotation policy Nucleic Acids Res 40D130ndashD135

20 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

21 PuntaM CoggillPC EberhardtRY MistryJ TateJBoursnellC PangN ForslundK CericG ClementsJ et al(2012) The Pfam protein families database Nucleic Acids Res40 D290ndashD301

22 LatendresseM PaleyS and KarpPD (2012) Browsingmetabolic and regulatory networks with BioCyc Methods MolBiol 804 197ndash216

23 KanehisaM GotoS SatoY FurumichiM and TanabeM(2012) KEGG for integration and interpretation oflarge-scale molecular data sets Nucleic Acids Res 40D109ndashD114

24 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for researchand education Nucleic Acids Res 40 D475ndashD482

25 YoumJ and SaierMH Jr (2012) Comparative analyses oftransport proteins encoded within the genomes of Mycobacteriumtuberculosis and Mycobacterium leprae Biochim Biophys Acta1818 776ndash797

26 TamangDG RabusR BaraboteRD and SaierMH Jr (2009)Comprehensive analyses of transport proteins encoded within thegenome of lsquolsquoAromatoleum aromaticumrsquorsquo strain EbN1 J MembrBiol 229 53ndash90

27 PaparoditisP VastermarkA LeAJ FuerstJA andSaierMH Jr (2013) Bioinformatic analyses of integralmembrane transport proteins encoded within the genome of theplanctomycetes species Rhodopirellula baltica BiochimBiophys Acta 1838 193ndash215

28 UniProt Consortium (2013) Update on activities at the UniversalProtein Resource (UniProt) in 2013 Nucleic Acids Res 41D43ndashD47

29 KnoxC LawV JewisonT LiuP LyS FrolkisA PonABancoK MakC NeveuV et al (2011) DrugBank 30 acomprehensive resource for lsquoomicsrsquo research on drugs NucleicAcids Res 39 D1035ndashD1041

30 Rask-AndersenM AlmenMS and SchiothHB (2011) Trendsin the exploitation of novel drug targets Nat Rev Drug Discov10 579ndash590

31 ChenJS ReddyV ChenJH ShlykovMA ZhengWHChoJ YenMR and SaierMH Jr (2011) Phylogeneticcharacterization of transport protein superfamiliessuperiority of SuperfamilyTree programs over thosebased on multiple alignments J Mol Microbiol Biotechnol 2183ndash96

32 YenMR ChoiJ and SaierMH Jr (2009) Bioinformaticanalyses of transmembrane transport novel software for deducingprotein phylogeny topology and evolution J Mol MicrobiolBiotechnol 17 163ndash176

33 YenMR ChenJS MarquezJL SunEI and SaierMH(2010) Multidrug resistance phylogenetic characterization ofsuperfamilies of secondary carriers that include drug exportersMethods Mol Biol 637 47ndash64

34 WongFH ChenJS ReddyV DayJL ShlykovMAWakabayashiST and SaierMH Jr (2012) The amino acid-polyamine-organocation superfamily J Mol MicrobiolBiotechnol 22 105ndash113

35 ReddyVS ShlykovMA CastilloR SunEI andSaierMH Jr (2012) The major facilitator superfamily (MFS)revisited FEBS J 279 2022ndash2035

36 ShlykovMA ZhengWH ChenJS and SaierMH Jr (2012)Bioinformatic characterization of the 4-Toluene Sulfonate UptakePermease (TSUP) family of transmembrane proteins BiochimBiophys Acta 1818 703ndash717

37 ChanH BabayanV BlyuminE GandhiC HakKHarakeD KumarK LeeP LiTT LiuHY et al (2010) Thep-type ATPase superfamily J Mol Microbiol Biotechnol 195ndash104

38 RettnerRE and SaierMH Jr (2010) The autoinducer-2 exportersuperfamily J Mol Microbiol Biotechnol 18 195ndash205

39 LamVH LeeJH SilverioA ChanH GomolplitinantKMPovolotskyTL OrlovaE SunEI WelliverCH andSaierMH Jr (2011) Pathways of transport protein evolutionrecent advances Biol Chem 392 5ndash12

40 ZhengWH VastermarkA ShlykovMA ReddyV SunEIand SaierMH Jr (2013) Evolutionary relationships ofATP-Binding Cassette (ABC) uptake porters BMC Microbiol13 98

41 MatiasMG GomolplitinantKM TamangDG andSaierMH Jr (2010) Animal Ca2+ release-activated Ca2+(CRAC) channels appear to be homologous to and derivedfrom the ubiquitous cation diffusion facilitators BMC Res Notes3 158

42 WangB DukarevichM SunEI YenMR and SaierMH Jr(2009) Membrane porters of ATP-binding cassette transportsystems are polyphyletic J Membr Biol 231 1ndash10

43 YeeDC ShlykovMA VastermarkA ReddyVS AroraSSunEI and SaierMH Jr (2013) The Transporter-Opsin-Gprotein-coupled receptor (TOG) Superfamily FEBS J 2805780ndash5800

44 SaierMH Jr (1994) Computer-aided analyses of transportprotein sequences gleaning evidence concerning functionstructure biogenesis and evolution Microbiol Rev 58 71ndash93

45 DoolittleRF (1994) Convergent evolution the need to beexplicit Trends Biochem Sci 19 15ndash18

46 DayhoffMO BarkerWC and HuntLT (1983) Establishinghomologies in protein sequences Methods Enzymol 91 524ndash545

47 CoyneRS HannickL ShanmugamD HostetlerJB BramiDJoardarVS JohnsonJ RaduneD SinghI BadgerJH et al(2011) Comparative genomics of the pathogenic ciliateIchthyophthirius multifiliis its free-living relatives and a hostspecies provide insights into adoption of a parasitic lifestyle andprospects for disease control Genome Biol 12 R100

48 PodarM AndersonI MakarovaKS ElkinsJG IvanovaNWallMA LykidisA MavromatisK SunH HudsonME et al(2008) A genomic analysis of the archaeal system Ignicoccushospitalis-Nanoarchaeum equitans Genome Biol 9 R158

49 ZhaiY and SaierMH Jr (2002) A simple sensitive program fordetecting internal repeats in sets of multiply aligned homologousproteins J Mol Microbiol Biotechnol 4 375ndash377

50 ZhaiY and SaierMH Jr (2001) A web-based program for theprediction of average hydropathy average amphipathicity andaverage similarity of multiply aligned homologous proteinsJ Mol Microbiol Biotechnol 3 285ndash286

51 SilverioAL and SaierMH Jr (2011) Bioinformaticcharacterization of the trimeric intracellular cation-specific channelprotein family J Membr Biol 241 77ndash101

52 GomolplitinantKM and SaierMH Jr (2011) Evolutionof the oligopeptide transporter family J Membr Biol 24089ndash110

53 TsaiJC YenMR CastilloR LeytonDL HendersonIRand SaierMH Jr (2010) The bacterial intimins and invasins alarge and novel family of secreted proteins PLoS One 5 e14403

54 CastilloR and SaierMH (2010) Functional promiscuity ofhomologues of the bacterial ArsA ATPases Int J Microbiol2010 187373

55 PovolotskyTL OrlovaE TamangDG and SaierMH Jr(2010) Defense against cannibalism the SdpI family of bacterialimmunitysignal transduction proteins J Membr Biol 235145ndash162

56 XiaoAY WangJ and SaierMH (2010) Bacterial adaptormembrane fusion proteins and the structurally dissimilar outermembrane auxiliary proteins have exchanged central domainsin alpha-proteobacteria Int J Microbiol 2010 589391

57 TheverMD and SaierMH Jr (2009) Bioinformaticcharacterization of p-type ATPases encoded within the fullysequenced genomes of 26 eukaryotes J Membr Biol 229115ndash130

58 VastermarkA and SaierMH Jr (2013) Evolutionary relationshipbetween 5+5 and 7+7 inverted repeat folds within the aminoacid-polyamine-organocation superfamily Proteins August 28(doi 101002prot24401 epub ahead of print)

59 RenQ ChenK and PaulsenIT (2007) TransportDB acomprehensive database resource for cytoplasmic membrane

Nucleic Acids Research 2014 Vol 42 Database issue D257

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

transport systems and outer membrane channels Nucleic AcidsRes 35 D274ndashD279

60 BroheeS BarriotR MoreauY and AndreB (2010) YTPdba wiki database of yeast membrane transporters BiochimBiophys Acta 1798 1908ndash1912

61 SchwackeR SchneiderA van der GraaffE FischerKCatoniE DesimoneM FrommerWB FluggeUI andKunzeR (2003) ARAMEMNON a novel database forArabidopsis integral membrane proteins Plant Physiol 13116ndash26

62 MiaoZ LiD ZhangZ DongJ SuZ and WangT (2012)Medicago truncatula transporter database a comprehensivedatabase resource for M truncatula transporters BMC Genomics13 60

63 FichantG BasseMJ and QuentinY (2006) ABCdb an onlineresource for ABC transporter repertories from sequenced archaealand bacterial genomes FEMS Microbiol Lett 256 333ndash339

64 BouigeP LaurentD PiloyanL and DassaE (2002)Phylogenetic and functional classification of ATP-binding cassette(ABC) systems Curr Protein Pept Sci 3 541ndash559

65 VasiliouV VasiliouK and NebertDW (2009) Human ATP-binding cassette (ABC) transporter family Hum Genomics 3281ndash290

66 HedigerMA ClemenconB BurrierRE and BrufordEA(2013) The ABCs of membrane transporters in healthand disease (SLC series) introduction Mol Aspects Med 3495ndash107

67 WhiteSH (2009) Biophysical dissection of membrane proteinsNature 459 344ndash346

68 VastermarkA AlmenMS SimmenMW FredrikssonR andSchiothHB (2011) Functional specialization in nucleotide sugartransporters occurred through differentiation of the gene clusterEamA (DUF6) before the radiation of Viridiplantae BMC EvolBiol 11 123

69 ReddyBL and SaierMH Jr (2013) Topological andphylogenetic analyses of bacterial holin families and superfamiliesBiochim Biophys Acta 1828 2654ndash2671

70 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res41 D613ndashD624

71 ChenJ SwamidassSJ DouY BruandJ and BaldiP (2005)ChemDB a public database of small molecules and relatedchemoinformatics resources Bioinformatics 21 4133ndash4139

D258 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022

Page 8: The Transporter Classification Database - Nucleic Acids Research

transport systems and outer membrane channels Nucleic AcidsRes 35 D274ndashD279

60 BroheeS BarriotR MoreauY and AndreB (2010) YTPdba wiki database of yeast membrane transporters BiochimBiophys Acta 1798 1908ndash1912

61 SchwackeR SchneiderA van der GraaffE FischerKCatoniE DesimoneM FrommerWB FluggeUI andKunzeR (2003) ARAMEMNON a novel database forArabidopsis integral membrane proteins Plant Physiol 13116ndash26

62 MiaoZ LiD ZhangZ DongJ SuZ and WangT (2012)Medicago truncatula transporter database a comprehensivedatabase resource for M truncatula transporters BMC Genomics13 60

63 FichantG BasseMJ and QuentinY (2006) ABCdb an onlineresource for ABC transporter repertories from sequenced archaealand bacterial genomes FEMS Microbiol Lett 256 333ndash339

64 BouigeP LaurentD PiloyanL and DassaE (2002)Phylogenetic and functional classification of ATP-binding cassette(ABC) systems Curr Protein Pept Sci 3 541ndash559

65 VasiliouV VasiliouK and NebertDW (2009) Human ATP-binding cassette (ABC) transporter family Hum Genomics 3281ndash290

66 HedigerMA ClemenconB BurrierRE and BrufordEA(2013) The ABCs of membrane transporters in healthand disease (SLC series) introduction Mol Aspects Med 3495ndash107

67 WhiteSH (2009) Biophysical dissection of membrane proteinsNature 459 344ndash346

68 VastermarkA AlmenMS SimmenMW FredrikssonR andSchiothHB (2011) Functional specialization in nucleotide sugartransporters occurred through differentiation of the gene clusterEamA (DUF6) before the radiation of Viridiplantae BMC EvolBiol 11 123

69 ReddyBL and SaierMH Jr (2013) Topological andphylogenetic analyses of bacterial holin families and superfamiliesBiochim Biophys Acta 1828 2654ndash2671

70 ZhouJ and RuddKE (2013) EcoGene 30 Nucleic Acids Res41 D613ndashD624

71 ChenJ SwamidassSJ DouY BruandJ and BaldiP (2005)ChemDB a public database of small molecules and relatedchemoinformatics resources Bioinformatics 21 4133ndash4139

D258 Nucleic Acids Research 2014 Vol 42 Database issue

Dow

nloaded from httpsacadem

icoupcomnararticle42D

1D2511049640 by guest on 02 January 2022