Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting
-
Upload
vivek-krishnakumar -
Category
Science
-
view
79 -
download
3
Transcript of Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting
![Page 1: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/1.jpg)
InterMineIntegrated Data Warehouse
Use Cases: Arabidopsis & Medicago Genome Projects
Vivek KrishnakumarPlant Genomics Group (EUK)
IFX Research WIPS Meeting, 03 October 2014
![Page 2: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/2.jpg)
Overview
• Introduction
• InterMine Integrated data warehouse, Extensible data model,
Flexible query system
Web and Programmatic Interface
Other InterMine instances
• Use cases Arabidopsis Information Portal (AIP)
Medicago truncatula Genome Database (MTGD)
• Summary Advantages
Caveats
![Page 3: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/3.jpg)
Introduction
For genome projects that wish to expose their data via the web (query, visualize, warehouse) to foster scientific collaboration, there are several technologies available:
• JCVI developed software Manatee (backed by an RDBMS)
• Externally developed software BioMart (federated from various databases)
Tripal (powered by Drupal, backed by CHADOdb)
InterMine
![Page 4: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/4.jpg)
InterMine
• Functions as a data warehouse for the integration of complex
biological data. Integration across data types occurs based on
a common identifier (e.g. gene primary ID)
• Uses a flexible and extensible data model, controlled by XML
files, driven by ontologies (Sequence [SO], Gene [SO], etc.)
Genomics, Proteomics, Interactions, Homology,
Expression, Pathways (and more data types)
Parsers for commonly used biological data formats
Provides framework for adding your own data
• Offers a flexible query system, optimized via precomputed
tables (no need for schema denormalization)
Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data
Bioinformatics (2012) 28 (23): 3163-3165
![Page 5: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/5.jpg)
InterMine (contd.)
• Provides a user-friendly web interface exposing powerful features: Analysis of lists (facilitate enrichment studies)
Full-featured report pages (one-stop shop)
Interactive result tables (sort, filter, summarize)
Visual query builder (no need to write SQL!)
Quick search and Region-based search
• Fosters development of external applications using data hosted within InterMine via Application Programming Interfaces (API): RESTful
Perl, Python, Ruby, Java, JavaScript
Kalderimis, A. et al. InterMine: extensive web services for modern biology
Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472
![Page 6: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/6.jpg)
Public “Mines”
• InterMine supports querying across mines
for cross-database integration
• Vast number of warehouses powered by
InterMine already exist
![Page 7: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/7.jpg)
Arabidopsis Information Portal (AIP)
• AIP origins Funded by NSF in response to community needs, following
termination of funding to TAIR
• AIP objectives Develop a community web resource that…
– is sustainable and fundable and community-extensible
– hosts analysis & visualization tools, user data spaces
Federation: integrate diverse data sets from distributed data sources; foster development of tools for and by the community
Maintenance of the Col-0 gold standard annotation
• AIP methods Assimilate TAIR data
Host an InterMine instance devoted to Arabidopsis (thale cress)
Offer and consume RESTful web services
Integrate and utilize iPlant resources
![Page 8: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/8.jpg)
ThaleMinehttps://apps.araport.org/thalemine
• An InterMine interface to Arabidopsis genomic data
• Integrates a wide variety of data types (A-E, H), some of which are warehoused and others are federated via web services
• Embedded elements visualizing gene structure (JBrowse, not shown), interaction networks (F), expression patterns (G)
![Page 9: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/9.jpg)
Visual Query Builder
Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
![Page 10: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/10.jpg)
Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
Inte
racti
ve R
esu
lt T
ab
les
Reg
ion
-based
searc
h
![Page 11: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/11.jpg)
MedicMinehttp://medicmine.jcvi.org
• NSF funded project to assist with the curation of the Medicago truncatula Genome Assembly and Annotation (funding ended August 2014)
• In order to warehouse and prolong the project data, an InterMine interface for Medicago was implemented (backed by a CHADO database)
• Provides similar kind of functionality available via ThaleMine
![Page 12: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/12.jpg)
Summary
• Advantages InterMine is a powerful biological data warehouse
Performs complex data integration
Allows fast and flexible querying
Well documented programmatic interface
Cookie-cutter, user-friendly web interface
Facilitates cross-talk between “mines”
• Caveats Adding more data requires a full database rebuild (incremental loading
is not possible) because of the integration step
• About InterMine: Developed by the Micklem Lab at the University of Cambridge, UK
Written in Java, backed by PostgreSQLdb, deployed under Tomcat. Documentation and downloads available at http://www.intermine.org
![Page 13: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting](https://reader031.fdocuments.us/reader031/viewer/2022032422/55a8e6aa1a28ab21498b45a3/html5/thumbnails/13.jpg)
Chris Town, PI
Lisa McDonald
Education and
Outreach
Coordinator
Chris Nelson
PMJason Miller, Co-PI
Technical Lead
Erik Ferlanti
SE
Vivek Krishnakumar
BESvetlana Karamycheva
BE
Eva Huala
Project lead, TAIR
Bob Muller
Technical lead, TAIR
Gos Micklem, co-PI Sergio Contrino
Software Engineer
Matt Vaughn
co-PI Steve Mock
Advanced Computing
Interfaces
Rion Dooley,
Web and Cloud
Services
Matt Hanlon,
Web and Mobile
Applications
Maria Kim
BE
Ben Rosen
BA