Post on 10-May-2015
Practicing Open Science William J Schroeder, Kitware, Inc. Brian Wylie, Sandia National Labs Marcus Hanwell, Kitware, Inc.
Speakers & Topics
§ William Schroeder, President & CEO, Kitware, Inc. - The whys and hows of Open Science
§ Dr. Marcus Hanwell, R&D Engineer, Kitware, Inc. - Building an open-source research program (in Chemistry)
§ Brian Wylie, Sandia National Labs - Research collaborations from a government perspective
The Scientific Method
• Document • Share
• Data • Methodology
• Archive
Galileo Galilei 1613
Open Science
§ Open Documents - Hypothesis - Descriptions - Results
§ Open Data
§ Open Methodology - Experimental apparatus - Software - Workflow - Parameter Sets
Ensuring reproducibility
If it isn’t reproducible, it isn’t science
REPRODUCIBILITY
Positive Evidence
Accumulate Support
Negative Evidence
Disproof Hypothesis
§ Augmented PDF § Contains links to executable viewer § Downloads data and viewer as necessary to reproduce
paper images (results)
Example: OSA Interactive Science Publishing (ISP)
Example: Insight Journal § Timely publishing of publications, data, and software § Evaluated automatically; further reviewed by community
Code
Input Data
Journal Git Repository
Web Site
Results Data
Author
Build Machines
PDF doc
Benefits of Open Science
§ Collaboration - Leveraging international communities
and expertize
§ Agile Innovation - Facilitate technology mashups - Move science to application faster - More focus on technology; less on protection
§ Business Models - Growing the pie, creating new opportunities - Customization, software integration
“…much of our intelligence and creativity results from interactions with tools and artifacts and from collaborating with other individuals.”
-- Shneiderman
Example: Collaboration § NIH National Center of Biomedical Computing NA-MIC § Developing the OS NA-MIC Kit; 3D Slicer application
Example: Agile Innovation (Open Source for Medical Imaging)
Led to the creation of: - ITK
- VolView
- BioImageXD
- Osirix
- MedINRIA
- VisTrails
- NIH / NCI caBIG – XIP
- VR-Renderer
- IGSTK
- ParaView
- Etc….
Creating VTK (Visualization Toolkit)
and finally…
Example: Business Models
§ Kitware: Building open source collaboration platforms - The usual support and training - Consulting - Engaging in collaborative R&D - Providing technology integration services,
aka creating custom solutions
CMake
CDash
The Open Technology Highway
§ Provide an open infrastructure - Support research, teaching, non-profit
and commercial activities - Any (legal) activity can hang off of the highway
- Spur innovation, create opportunities - Get from idea to product faster
- Do not have to replicate technology - Too many toll gates (i.e., closed systems,
unreasonable IP) slows everything down - Prefer non-reciprocal licenses
Next Up
§ Marcus: Building a research program for chemistry
§ Brian: open science and research collaboration from a government perspective
Open Chemistry Growing a Research Program Through Open Source Dr. Marcus Hanwell, Kitware, Inc.
Grass Roots Effort § Bootstrapped several efforts without funding
- Spare time - Parts of other projects when possible
§ Formed an “unorganization” – Blue Obelisk - Published first article in 2005 - Open data, open standards and open source - Meet at ACS and other conferences when possible - Follow-up article currently in press
§ Quixote collaboration more recently - Provide meaningful data storage and exchange - Principally targeting computational chemistry
The Early Years § Avogadro projected started in 2006 § First funded work in 2007 by Marcus Hanwell
- Google Summer of Code student - Final year of Ph.D. spent the summer coding - Funded as part of KDE project – Kalzium editor
§ Built on several other open source projects - Qt, Eigen, Open Babel, Blue Obelisk Data Repository
§ Also uses open standards, such as OpenGL for rendering § Cross platform, open source stack
Community Tools, Standards and Resources § Make extensive use of Qt for standard GUI elements
- Much more than just GUI – multithreading, web resources - Avogadro chosen as an outstanding example of “Qt in Use” - Marcus Hanwell recently chosen as a “Qt Ambassador”
§ OpenGL for cross platform 3D rendering - Accelerated rendering of 3D molecular geometry - Facilitates interacting with the scene - Use of GLSL for impressive, fast rendering
§ Open Babel for chemical input/output and more - There are a lot of chemical file formats… - Has a lot of chemical knowledge, e.g. bond perception
§ Git for distributed version control - We work across multiple sites, time zones and institutions - Gerrit for code review more recently – improving code quality
Evangelizing: Getting the Message Out § Traditional social media used to communicate
- Blogs, Planets, Twitter, Identi.ca, Friendfeed, Google+
§ Talks and posters at conferences - Open source conferences talking about chemistry - Chemistry conferences talking about open source chemistry
§ Several meetings and workshops about open chemistry - Daresbury Laboratory: Chemical Visualization and Quixote - NIH National Cancer Institute – Databases and Open Chemistry
§ Publications in the traditional journals § Screencasts showing off what the software can do § In person workshops and training sessions
Bringing About Real Change § 2011 is the ”International Year of Chemistry” § Chemistry has been quite closed traditionally § We are working hard to change this § Recently led a Phase I SBIR to develop “open chemistry tools”
- GUI acting as the center of the chemical workflow - Database application using MongoDB, chemically aware - Cluster integration on the desktop – submit, monitor and retrieve
§ Chemical simulation/calculation now biggest HPC user in military § Open tools can use both open and closed computational codes
- Largely written in Fortran to run on clusters - NWChem recently open sourced – PNNL quantum code - Already work with GAMESS, GAMESS-UK, Q-Chem, Gaussian…
§ The time is right for change in chemistry - Opportunity to accelerate the rate of research
Funding Open Chemistry Tools § Kitware’s core business is based on “open collaboration platforms” § Led a Phase I Small Business Innovation Research project (US Army)
- Invited to apply for Phase II funding, currently pending § Make use of Apache and BSD licenses
- Allow for participation of a wider cross-section of the community - Reduced licensing complications - Important for industry and government collaboration
§ Successfully taken part in Google Summer of Code – funded students - Student in 2007 working on Avogadro and Kalzium - Mentor for KDE in 2008-2010 - VTK organization administrator and mentor in 2011
§ Looking to other funding agencies and collaborations in future
Developing in Niche Areas § The population of active researchers in chemistry is relatively small
- The number of those researchers who code is even smaller - Of those, the number that wish to contribute to open source is tiny
§ Developing and nurturing these communities can be challenging
§ Some students develop a feature in a summer and disappear
§ Other professors might develop code over the summers
§ Have to lower the barrier to entry as much as possible
§ Often need to help with tools, build systems, etc
Enabling Technologies in Chemistry § Large number of computational chemistry codes
- Many do not have dedicated user interfaces - Forming a new area enabling chemical workflows - Some of the open source codes that can benefit
- NWChem – quantum chemistry code - Quantum Espresso – plane wave code
- Free for use codes such as GAMESS - Commercial codes such as Molpro, Q-Chem, others - These codes are executed in a separate process
§ Libraries that can be used in the GUI: - The Visualization Toolkit (VTK) provides advanced rendering - ParaView library provides client-server technology for large data
Working With Academia, Industry and Government § In the past licensing has not been ideal
- Some form of GPL or non-commercial only license fine for most academics - Industry and government need more liberal licenses in general, e.g. BSD, Apache 2
§ Can be challenging to ensure everyone gets something out of the deal § Avoiding the trap of dual-licensing – often kills community and shared ownership § Funders can find it harder to understand commercialization § We normally employ a services/consulting role
Government Open Source Collabora'ons
Brian Wylie Sandia National Laboratories
Sandia Na7onal Laboratories is a mul7-‐program laboratory managed and operated by Sandia Corpora7on, a wholly owned subsidiary of Lockheed Mar7n Corpora7on, for the U.S. Department of
Energy’s Na7onal Nuclear Security Administra7on under contract DE-‐AC04-‐94AL85000.
Government Open Source Resources
• GOSCON Government Open Source Conference (goscon.org)
• Open Source Center: Foreign open source intelligence data (opensource.gov)
• Open Source SoQware Ins7tute: Non-‐profit corp/govt/acad (oss-‐ins7tute.org)
• Government Open Source SoQware Resource Centre (gossrc.org)
• Center for Strategic and Interna7onal Studies (tracks open source legisla7on csis.org)
Government Open Source Around the World
Data Courtesy of the Center for Strategic and Interna'onal Studies
0
20
40
60
80
100
120
140
160
180
Europe Asia La7n America
North America
Africa Middle East
Failed Proposed Approved
Open Source Ini'a'ves by Region (2000-‐2009)
Government Open Source Example Projects
Open source data analysis and visualiza7on pla[orm
Sandia Los Alamos
Kitware
University of Utah
Government Open Source Example Projects
Sandia
Kitware
Indiana University Stanford
Government Open Source Collabora'on Benefits
Government
Commercial
Academic
No specific vendor “lock-‐in/out” Allows a diversified development team Known code base (strengths and weaknesses) Typically easier to integra7on with other OS tools Improvement of the OS project Money Leveraging project for other/future work Improvement of the OS project
Student/Professor support Publishing/Sharing Improvement of the OS project
Government Open Source Collabora'on Issues
Need to relax into exis7ng OS license* New projects should pick a liberal OS license Funding source may hesitate on Open Source Proprietary projects / Intellectual Property Government bureaucracy Mixed soQware skill set Deliverables can get distorted * No gov’t sell back clause Work may not be publica7on material If you do publish, it may be a joint publica7on
Government
Commercial
Academic
Government Open Source Ques'ons Sec'on
Contact Information
§ Will Schroeder will.schroeder@kitware.com
§ Brian Wylie bnwylie@sandia.gov
§ Marcus Hanwell marcus.hanwell@kitware.com
(view included video)