Delivering on the promise of a chemistry data repository for the world

Post on 22-May-2015

1.029 views 2 download

Tags:

description

This presentation was given as a part of the Microsoft eScience panel discussion in Sao Paulo, Brazil. The panel discussion was in regards to Going Native, a reference to a quote from Jim Gray along the lines of “in order to really understand the computing needs of a scientist you have to go native”. Jim himself did this, immersing himself in astronomy to build what would become the WorldWide Telescope. Bridging the gap between experimental scientists and the computing that underpins their discoveries is an ongoing challenge for eScience. The panel explored what it means to go native and gave examples of where they have seen this work well and shared lesson’s learned from working in this way.

Transcript of Delivering on the promise of a chemistry data repository for the world

Delivering on the promise of a chemistry data repository

for the world

Antony WilliamsGoing Native Panel Discussion at the Microsoft eScience Workshop

0000-0002-2668-4821

A Question to Start…

• Who in the room has an ORCID?

New Horizons….

• Let’s map together all historical chemistry data and build systems to integrate it

• Heck, let’s integrate chemistry and biology data and add in disease data too

• Let’s model the data and see if we can extract new relationships – quantitative and qualitative

• Let’s take what we learn from historical data and build better solutions for modern data

• Let’s make it all available on the web…

What about this….

• We’re going to map the world

• We’re going to take photos of as many places as we can and link them together

• We’ll let people annotate and curate the map

• Then let’s make it available free on the web

• We’ll make it available for decision making

• Put it on Mobile Devices, give it away…

Chemistry data is of value?

• Reference databases generate hundreds of millions of dollars/euros per year

• So much data generated that could go public

• Maybe 5% of all data generated is published

• There is no “Journal of Failed Experiments”

• Funding agencies start to demand Open Data

• Scientists want funding but also recognition

A shift to Openness

Open Data is here…

Chemistry data is of value?

• Reference databases generate hundreds of millions of dollars/euros per year

• So much data generated that could go public

• Maybe 5% of all data generated is published

• There is no “Journal of Failed Experiments”

• Funding agencies start to demand Open Data

• Scientists want funding but also recognition

• …so who will fund and build the platforms?

Going Native… speaka da lingo

Chemists clearly benefit from accessing data

What we found…

• Data quality on the internet can be very poor

• Everyone wants access to high quality data but very few are willing to contribute

• The primary concerns for contributors• It needs to be easy• Data licensing• Recognition for contributions

Recognition: need to have Impact

Quantitating scientists?

National Information Standards Organization and “Altmetrics”

http://www.niso.org/apps/group_public/download.php/13295/niso_altmetrics_white_paper_draft_v4.pdf

Research Outputs

• Blogs

• Research datasets

• Scientific software

• Posters and presentations at conferences

• Electronic theses and dissertations

• Performances in film and audio

• Lectures, online classes and teaching activities

Recognizing Contribution

• In order to encourage participation maybe we need to provide recognition of impact

• How do we measure impact for:• Performing peer review?• Contributions to more “public platforms”?...

Christmas Curating Wikipedia

Wikipedia Chemboxes

• http://en.wikipedia.org/wiki/Glucose

19

Three days of discussion

Three days of discussion

• If you want to understand Wikipedia definitely Go Native and get involved!

Does ONE bond matter???

A short intro to chirality

A short intro to chirality

Educating chemists in data

• Chemists are more likely to know basic HTML over data formats in chemistry

• Even international standards for data interchange and standardization are unknown

• Standards are ideal for computers to handle

Can we MAKE Quality Data?

• We are building systems for everyone to validate and standardize their data

Where to host research data?

• Containers for chemical compounds, chemical reactions, analytical data, tabular data, etc.

• Algorithms for data validation and standardization

• Domain specific search technologies

• A platform for modeling data

• Progressing the RSC Data Repository…

Compounds

Reactions

Analytical data

Generating models from data

New Horizons….are here

• Let’s map together all historical chemistry data and build systems to integrate it

• Heck, let’s integrate chemistry and biology data and add in disease data too

• Let’s model the data and see if we can extract new relationships – quantitative and qualitative

• Let’s take what we learn from historical data and build better solutions for modern data

• Let’s make it all available on the web…

So we DON’T have to do this…

ORIGINAL FIGURE

EXTRACTED FIGURE

The path forward

• Mesh and aggregate published data

• Encourage deposition of RESEARCH data – that will never be published

• Provide open APIs for data access

• Educate chemists in digital literacy

• Funding agencies should mandate data access

• Collaboration is key – don’t do it alone

Thank you

Email: williamsa@rsc.orgORCID: 0000-0002-2668-4821 Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams