Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the...
Transcript of Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the...
Reverse engineering metadata for the Materials
ProjectShyam DwaraknathResearch Scientist
Lawrence Berkeley Labs
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
What is Materials Project
• Free and open source database of computed materials properties
• Applications to explore data set in ”material science” means
• Design computational workflows based on experimental ground truth
• Focus on property diversity over number of structures
• All the software bells and whistles that go with a large web data project
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
The good old days
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
StructureWorkflowDetermine Material
Structure StructureWorkflow
StructureWorkflow
DataAnalysis Website
What is metadata?
A set of data that describes and gives information about other data.
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Descriptive Structural Administrative
What is metadata?
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
DFT Calculation
Descriptive Structural Administrative
DFT Code Where are the inputs Who computed
Where computed
Who gets to access
Where are the outputs
Units
Composition
Volume Change
Some metadata are constructed implicitly
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
DFT Calculation
Structural Administrative
Where are the inputs Who computed
Where computed
Who gets to access
Where are the outputs
Units
Some metadata are constructed explicitly
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
DFT Calculation
Descriptive
DFT Type
DFT Code
Composition
Intent
Administrative
How do you determine the intent of a calculation?
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
• The purpose of the workflow• The purpose of the database it
was in• The purpose of the computer it
ran on
Why not determinethe intent as theneed arises?
Metadata is technical debt
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Before After
Structures
Calculate
MongoDBDatabase
Material
Calculate Calculate
Calculate Calculate Calculate
MongoDBDatabase
Material
Decoupling Metadata from the workflow
enabled more agile workflow development
Structures
Decoupling metadata makes it declarative
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Structures
Compute
Band structure Path
I changed my path!?
Structures
Compute
New band structure path
Structures Computation
Band structure Path
Build band structure
Declarative metadata gives us connectivity
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Build band structure
Structures Computation
Find key points in Calculations
Group Calculations by Structure
Path DefinitionFind key paths
in Calculations
Band structure Path
Reverse Engineering Provenance
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Structures
Structures
Structures
ICSD
User Submissions
MP
Structures
Equivalent Structures
Structure
Define process metadata in one spot
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Structure
Spacegroup
Lattice
Composition
Atoms
Structure
Spacegroup
Lattice
Composition
Atoms
=
=
=
=
tolerance
tolerance
tolerance
within
within
within
Equivalent Structures
Define process metadata in one spot
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Structure
Spacegroup
Lattice
Composition
Atoms
Calculation Input Structure
Spacegroup
Lattice
Composition
Atoms
=
=
=
=
Equivalent Structures
Codify human processes
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Disordered Structures
ICSD
Enumeration Library
Choose “Best”
Ordered Equivalent
Ordered Equivalent
Compare Compositions
Disordered Structures
ICSD
Compare Disordered
Spacegroups
Compare Anonymous Structures
Disordered Equivalent
Can we generalize this to more than computation?
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Propnet – Connecting Materials Models
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Too many models, now what?
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Metadata lets us augment properties
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Reverse engineering metadata for experiments
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP
Questions?
SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP