Large-scale curation of bioactive chemistry from patents and papers: Excelra GOSTAR
-
Upload
chris-southan -
Category
Science
-
view
225 -
download
1
Transcript of Large-scale curation of bioactive chemistry from patents and papers: Excelra GOSTAR
Large-scale curation of bioactive chemistry from patents and papers:
A snapshot of the ‘Excelra GOSTAR’ statistics
www.excelra.com
0 20000 40000 60000 80000 100000
MAPK14
PIK3CA
PDGFR BETA
CANNABINOID RECEPTOR 1
JANUS KINASE 2
C-SRC TYROSINE KINASE
THROMBIN
COAGULATION FACTOR X
EGFR
VEGFR2
All-time Target Ranking, MCD + TCD
0 500 1000 1500 2000 2500 3000
CANNABINOID RECEPTOR 1
MONOAMINE OXIDASE B
VEGFR2
CARBONIC ANHYDRASE I
CARBONIC ANHYDRASE II
CYP3A4
EGFR
POTASSIUM CHANNEL KV11.1
BUTYRYLCHOLINESTERASE
ACETYLCHOLINESTERASE
2015-16 Target Ranking MCD (papers)
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
INTERLEUKIN-1 RECEPTOR-ASSOCIATED …
NUCLEAR RECEPTOR ROR-GAMMA T
BRUTON TYROSINE KINASE
JANUS KINASE 2
PIK3CB
SODIUM CHANNEL NAV1.7
PIK3CG
PIK3CD
NUCLEAR RECEPTOR ROR-GAMMA
PIK3CA
2015-16 Target Ranking TCD (patents)
64263
60722
58123
56276
55714
54210
53844
53277
1364
1325
1190
1090
1064
1050
1043
9079
8094
7948
7838
7354
7076
6558
6487
5754
5590
Christopher Southan, Anil Kumar Manchala and Sreeni Devidas IUPHAR/BPS Guide to PHARMACOLOGY, Centre for Integrative Physiology, University of Edinburgh, EH8 9XD, UK. Excelra Knowledge Solutions (formerly GVK Informatics) Pvt. Ltd., Hyderabad-500039, India http://www.excelra.com/index.php http://www.slideshare.net/cdsouthan
2 2
1
2
Target Ranking:
- “Slice and dicing ” target-to-compound outputs over time o�ers unique insights (PMID 24204758).- As expected, the cumulative ranking (below top) is not that di�erent from 2011 (PMID 21569515).- However, the most recent papers (below centre) show quite di�erent rankings to recent patents (lower chart). - Data mining facillitates high-resolution competitive intelligence and patent > later paper disclosures.- It also provides scienti�c SAR insights (e.g. shifts of a�nity, sca�olds and chemical properties from institutional and collective target validation endeavours).
Database Content:
- By inspecting a document "D" Excelra expert curators identify a bioactivity assay "A" (e.g. for an enzyme) with a quantitative result "R" (e.g. an IC50) for a compound "C" (a de�ned chemical structure) as an activity modulator. (typically inhibition) of protein target "P“, or a cell-based assay.- A useful shorthand for this mapping is “D-A-R-C-P”. - Assays are classi�ed into di�erent types. - Location of structures in documents is speci�ed (e.g. “cpd 5a” from a paper or “Example 102” in a patent).- Starting from 1945, there are 1.34 mill compounds from 112K papers and 3.35 million from 71K patents
- The form of these plots up to 2012 is discussed in PMID 24204758.- The fall in patent med. chem. SAR continues.- Since 2012 literature SAR is now also falling but converging in numbers with patents.- Causes of declines probably dominated by Pharma M&A activity and shift to biologicals but does not preclude improvements in compound quality.
Introduction:
- Excelra has developed a suite of �ve uni�ed database products covering global drug discovery R&D outputs.- These are termed Global Online Structure Activity Relationships (GOSTAR). - This work provides an update, mainly on the largest two components, the Medicinal Chemistry (MCD) and Target (TCD) databases for the extraction of papers and patents, respectively. - Content is derived from expert curation of structure-activity relationships (SAR) from documents.- Details, are described on the Excelra website and historical statistics in “Tracking 20 - Years of Compound to-Target Output from Literature and Patents” Southan et al, 2013, PLoS One, PubMed 24204758.
1
Conclusions:- The human protein target totals have increased to 3383 in MCD, 2431 in TCD , 3882 combined and 546 patent-only. - Exceeds the total combined Swiss-Prot cross-references for the activity-mapped public sources of Guide to PHARMACOLOGY, BindingDB and ChEMBL of 3,272. - Taking the current human Swiss-Prot total of 20,201 , MCD and TCD provide chemical modulation starting points for a druggable proteome of 19% - Compound capture has increased by 27% since 2012 - In addition to MCD and TCD, GOSTAR includes 33,620 compounds in development or approved drugs and Mechanism Based Toxicity data from 28,305 of these. - The GOSTAR subscription resource o�ers one of the largest available D-A-R-C-P compilations- Covers chemical biology as well as drug R&D- Advanced on-line data mining features and options for internal integration for exploitation
Compounds vs year (1970 to 2015) for TCD from patents and MCD from papers
91676
78884
2383
1761
1641